Scripting subsets of page contents in Dhall.

On Tue, 03 May 2022, by @lucasdicioccio, 1382 words, 11 links, 0images.

Since I’ve started building my own blog engine, I wanted some limited scripting capabilities in the text-processing engine. A specific case I had in mind was to generate some tables or listings for what you currently find on the /readings.html or on the /tips.html page. I would like something like Microformats, but tactically applied to individual articles, with little ceremony.

A more telling example could be a photo-gallery where we list a dozen of images. For each image you’d want a title, an URL, a preferred background color for the frame, maybe a caption etc. Hand-editing such galleries is a lot of work, and the extra ceremony to store less than 100 records in a DB is not worth it. There’s a gap to fill

This article demonstrates and discusses a PoC using the Dhall programming language as a specific step.

high-level need

What is common in the type of pages where I would like some minimal templating is:

  • I want to reference and organize a moderately large amount of items.
  • items within a collection have a common structure (e.g., a link to a page would have a title, an URL, some language, and some description from myself).
  • I want to present, ideally with some mechanical template to keep the styling/HTML-structure consistent.

Thus I have three distinct characteristics to carve out:

    1. define the structure of items
    1. list a collection of items
    1. implement some template I use to present these

Separating 1. and 2. is a matter of getting some type and some syntax to write values. Separating 3. requires a novel feature in my blog-engine: to interpret some structure into some HTML chunk to embed in a longer article. Current generators only produce targets as fully-contained output objects (that then must be embedded via other HTML primitives like <img> or <script> tags).

mini design

Let’s go through my design process. This is a retroactive recount of the process I took rather than an upfront design-doc.

current situation

Absent a proper solution for this need, I so far hand-write all these pages. A downside is that I need to carefully write CommonMark if I want to apply some CSS uniformly. More annoying is if I want to add a non-trivial and verbose annotations (e.g., adding a mailto: link). This process is fastidious.

I do not want to trade fastidiousness for significant extra complexity. Indeed, in these tasks there is a risk to end up for a similarly-fastidious process involving more moving bits… What I want to avoid is:

  • a. having a separate database, connectors, and section capable of reading the database at production-time
  • b. writing data structures in the Haskell-side of the blog-engine, including templates just for each type of listing
  • c. writing some JS that fetches a JSON-list and render only in the client

Adding a database (a.) is the latest thing I would like to do because of all the deployment and extra changes required. I know this time will come if I want to do things like comments, but let me push it further. At this point, even SQLite would be too much overhead to add schemas, populate data etc.

Writing Haskell-code (b.) and would be fine regarding the structure definition, however I do not like to “split” an article between multiple sources. A reason why I wrote a blog-engine was to avoid distractions while focusing on writing an article.

Of the three, I think that writing all the logic client-side (c.) probably would be the least intrusive to my writing flow. However, clients that do not support JavaScript would miss the content. I’m fine with JavaScript when it is required or when it is a nice to have. Usage of JS for just laying-out the meaty-content is something I frown upon.

Summarizing, I was searching for some way to embed logic that would return some HTML provided some type locally-defined and locally-filled in an article document.

filling in the gap

To fill the gap, there are a two key decisions to make:

  • a. what scripting language(s) to support?
  • b. at which point of the computation pipeline should I incorporate this engine?

For the first question (a), I want some minimal and non-trivial language. I do not want to invest much time on the particular choice because I want to try different things rather than do some proper analysis to answer the second (b) question. Indeed, if the cost for demonstrating (a) happens to be small, then I get a pretty good starting point to answer or try tradeoffs in (b).

Working on the Halogen demo article I was reminded about Dhall as an improved YAML/JavaScript for configuration. Configurations are very similar to Microformats, so why no try Dhall first?

Rather than providing a lengthy discussion, I’ll leave only bullet points around pro/cons (note that I realize that these bullet points for pro/cons could themselves become microformats).

Advantages

  • incorporating Dhall is a full demonstration of how scripted evaluations (I want some opaque IO returning some CommonMark)
  • still a first stone in some more advanced form of pipeline/build-system
  • Dhall has the ability to import libraries with little package management pain, I could use this to re-use parts across articles, it’s a good nice to have
  • interoperability between Dhall and Haskell is a bliss (the Dhall author is a prolific Haskell engineer)

Limitations

  • so far, no extra environment is passed to the Dhall interpreter (the Dhall code does not know about its surroundings, article title or any other data ➡️ for later)
  • no dependency between sections are planned (need to do detection cycles or find other approaches ➡️ for later)

Drawbacks

  • evaluating Dhall code costs extra time, CPUs, and file-descriptors (especially important while I care about automatic reloads of previews when editing articles)
  • intermediary results are opaque and hidden, if some final HTML is wrong, I need to know what has been generated as intermediary
  • risk of adding non-deterministic content generation (e.g., breaking because I’ve no network, some hackers inserting duck picks or spam because the evaluator fetches from the Internet)

Summary

Dhall still seems a darn-good choice. None of the drawbacks are fatal flaws and can be mitigated. The most dangerous in my opinion is the evaluation costs. I will likely mitigate it using some cacheing in the future. I need to keep in mind that there are two evaluation phases in my blog engine: one computing targets and another one rendering targets. Both have their trade-offs.

Non-determinism is heavily mitigated while sticking to Dhall, so I expect no big surprises. When adapting the pattern to other languages (e.g., if I ever want to run some python) then I will have to be more careful.

result

I spent less than two hours, including family interruptions 👶 and babbling-around time. The implementation so far is extremely primitive, though.

I decided to interpret the Dhall code while loading the Site targets rather than while generating targets. As much as I wanted to avoid this option, it is the pragmatic choice: on the one hand, I interpret sections’ Commonmark in different places (e.g., for rendering HTML but also when analyzing content or generating a JSON AST). On the other hand, cache control and idempotency is more obvious to control at this early phase: one execution of the script gives one website-worth of recipes.

Overall I had to do the following changes:

  • add a new format (defining a new pattern, and parser pattern) in my section-files
  • import the Dhall package and runtime (it’s a Haskell library, nothing different from importing an HTTP-client here)
  • insert some case-switch on the section format in the code that loads an Article

And that’s all.

You can see for yourself in the commit diff . And you can see this whole article source including the dhall section.

Later, I added a local “cache” of the Dhall prelude so that live-reloading my Dhall-code does not reload the Dhall prelude from the internet each time. Since Dhall supports cached import, the local cache requires little extra work: I just have this file with a checksum-verified network import, which I then import as a filesystem-local import with let prelude = ./dhall/Prelude.dhall.

future

In the future, the Dhall object to return will be a beefier record rather than just a blob List Text. For instance, we could return extra information as metadata or as extra instructions that do not find their way in the HTML. Also, the Dhall code could return something else than Commonmark, we could directly generate HTML or JSON values.

–– start of generated section ––

this section is generated

This whole <section> is interpreted from Dhall to Cmark to HTML. The content likely is the boundary at which point I enjoy having some templating mechanism over repeating the same thing many times. Indeed, if my data-type changes (adding columns) I need some help. Same if the template changes (adding fields, changing the markup).

In this example, I use two Dhall functions as two templates for a same dataset. Styling is then done in CSS.

table layout

author note website personal comment
Alice 7/10 secret santa barbara good to learn about the city
Bob 8/10 sponge's den other cartoons are funnier but okay
Cindy 3/10 hello world I'm Cindy abandonned site
Dave 9/10 blog of a developer he sings so well
Emil 2/10 eating some chewing gum French movies...
Felicia 6/10 yet another a website average
Gerard 3/10 tech lead lead leader lots of words to say nothing
Hortense 8/10 lotta tasty recipes miam miam

tiles layout

Alice 7/10
good to learn about the city
Bob 8/10
other cartoons are funnier but okay
Cindy 3/10
abandonned site
Dave 9/10
he sings so well
Emil 2/10
French movies...
Felicia 6/10
average
Gerard 3/10
lots of words to say nothing
Hortense 8/10
miam miam

–– end of generated section ––