Scripting subsets of page contents in Dhall.
On Tue, 03 May 2022, by @lucasdicioccio, 1382 words, 0 code snippets, 11 links, 0images.
Since I’ve started building my own blog engine, I wanted some limited scripting capabilities in the text-processing engine. A specific case I had in mind was to generate some tables or listings for what you currently find on the /readings.html or on the /tips.html page. I would like something like Microformats, but tactically applied to individual articles, with little ceremony.
A more telling example could be a photo-gallery where we list a dozen of images. For each image you’d want a title, an URL, a preferred background color for the frame, maybe a caption etc. Hand-editing such galleries is a lot of work, and the extra ceremony to store less than 100 records in a DB is not worth it. There’s a gap to fill
This article demonstrates and discusses a PoC using the Dhall programming language as a specific step.
What is common in the type of pages where I would like some minimal templating is:
- I want to reference and organize a moderately large amount of items.
- items within a collection have a common structure (e.g., a link to a page would have a title, an URL, some language, and some description from myself).
- I want to present, ideally with some mechanical template to keep the styling/HTML-structure consistent.
Thus I have three distinct characteristics to carve out:
- define the structure of items
- list a collection of items
- implement some template I use to present these
Separating 1. and 2. is a matter of getting some type and some syntax to write values.
Separating 3. requires a novel feature in my blog-engine: to interpret some structure into some HTML chunk to embed in a longer article. Current generators only produce targets as fully-contained output objects (that then must be embedded via other HTML primitives like
Let’s go through my design process. This is a retroactive recount of the process I took rather than an upfront design-doc.
Absent a proper solution for this need, I so far hand-write all these pages. A
downside is that I need to carefully write CommonMark if I want to apply some
CSS uniformly. More annoying is if I want to add a non-trivial and verbose
annotations (e.g., adding a
mailto: link). This process is fastidious.
I do not want to trade fastidiousness for significant extra complexity. Indeed, in these tasks there is a risk to end up for a similarly-fastidious process involving more moving bits… What I want to avoid is:
- a. having a separate database, connectors, and section capable of reading the database at production-time
- b. writing data structures in the Haskell-side of the blog-engine, including templates just for each type of listing
- c. writing some JS that fetches a JSON-list and render only in the client
Adding a database (a.) is the latest thing I would like to do because of all the deployment and extra changes required. I know this time will come if I want to do things like comments, but let me push it further. At this point, even SQLite would be too much overhead to add schemas, populate data etc.
Writing Haskell-code (b.) and would be fine regarding the structure definition, however I do not like to “split” an article between multiple sources. A reason why I wrote a blog-engine was to avoid distractions while focusing on writing an article.
Summarizing, I was searching for some way to embed logic that would return some HTML provided some type locally-defined and locally-filled in an article document.
filling in the gap
To fill the gap, there are a two key decisions to make:
- a. what scripting language(s) to support?
- b. at which point of the computation pipeline should I incorporate this engine?
For the first question (a), I want some minimal and non-trivial language. I do not want to invest much time on the particular choice because I want to try different things rather than do some proper analysis to answer the second (b) question. Indeed, if the cost for demonstrating (a) happens to be small, then I get a pretty good starting point to answer or try tradeoffs in (b).
Rather than providing a lengthy discussion, I’ll leave only bullet points around pro/cons (note that I realize that these bullet points for pro/cons could themselves become microformats).
- incorporating Dhall is a full demonstration of how scripted evaluations (I want some opaque IO returning some CommonMark)
- still a first stone in some more advanced form of pipeline/build-system
- Dhall has the ability to import libraries with little package management pain, I could use this to re-use parts across articles, it’s a good nice to have
- interoperability between Dhall and Haskell is a bliss (the Dhall author is a prolific Haskell engineer)
- so far, no extra environment is passed to the Dhall interpreter (the Dhall code does not know about its surroundings, article title or any other data ➡️ for later)
- no dependency between sections are planned (need to do detection cycles or find other approaches ➡️ for later)
- evaluating Dhall code costs extra time, CPUs, and file-descriptors (especially important while I care about automatic reloads of previews when editing articles)
- intermediary results are opaque and hidden, if some final HTML is wrong, I need to know what has been generated as intermediary
- risk of adding non-deterministic content generation (e.g., breaking because I’ve no network, some hackers inserting duck picks or spam because the evaluator fetches from the Internet)
Dhall still seems a darn-good choice. None of the drawbacks are fatal flaws and can be mitigated. The most dangerous in my opinion is the evaluation costs. I will likely mitigate it using some cacheing in the future. I need to keep in mind that there are two evaluation phases in my blog engine: one computing targets and another one rendering targets. Both have their trade-offs.
Non-determinism is heavily mitigated while sticking to Dhall, so I expect no big surprises. When adapting the pattern to other languages (e.g., if I ever want to run some python) then I will have to be more careful.
I spent less than two hours, including family interruptions 👶 and babbling-around time. The implementation so far is extremely primitive, though.
I decided to interpret the Dhall code while loading the Site targets rather than while generating targets. As much as I wanted to avoid this option, it is the pragmatic choice: on the one hand, I interpret sections’ Commonmark in different places (e.g., for rendering HTML but also when analyzing content or generating a JSON AST). On the other hand, cache control and idempotency is more obvious to control at this early phase: one execution of the script gives one website-worth of recipes.
Overall I had to do the following changes:
- add a new format (defining a new pattern, and parser pattern) in my section-files
- import the Dhall package and runtime (it’s a Haskell library, nothing different from importing an HTTP-client here)
- insert some case-switch on the section format in the code that loads an Article
And that’s all.
You can see for yourself in the commit
diff . And
you can see this whole article
Later, I added a local “cache” of the Dhall prelude so that live-reloading my
Dhall-code does not reload the Dhall prelude from the internet each time.
Since Dhall supports cached import, the local cache requires little extra work: I just have this file with a checksum-verified network import, which I then import as a filesystem-local import with
let prelude = ./dhall/Prelude.dhall.
In the future, the Dhall object to return will be a beefier record rather than just a blob
For instance, we could return extra information as metadata or as extra instructions that do not find their way in the HTML. Also, the Dhall code could return something else than Commonmark, we could directly generate HTML or JSON values.
–– start of generated section ––
this section is generated
<section> is interpreted from Dhall to Cmark to HTML.
The content likely is the boundary at which point I enjoy having some templating mechanism over repeating the same thing many times.
Indeed, if my data-type changes (adding columns) I need some help.
Same if the template changes (adding fields, changing the markup).
In this example, I use two Dhall functions as two templates for a same dataset. Styling is then done in CSS.
|Alice||7/10||secret santa barbara||good to learn about the city|
|Bob||8/10||sponge's den||other cartoons are funnier but okay|
|Cindy||3/10||hello world I'm Cindy||abandonned site|
|Dave||9/10||blog of a developer||he sings so well|
|Emil||2/10||eating some chewing gum||French movies...|
|Felicia||6/10||yet another a website||average|
|Gerard||3/10||tech lead lead leader||lots of words to say nothing|
|Hortense||8/10||lotta tasty recipes||miam miam|
–– end of generated section ––