Typst is an awkward choice for static site generators (yet) [000B]
Typst is an awkward choice for static site generators (yet) [000B]
Typst, as I have argued, finds a sweet spot for taking scientific notes . However, it is exactly what made it fit for taking scientific notes that makes it troublesome when using it as the markup language for static site generators. The reason is that, simply put, Typst the language is expressive enough to be hard to analyze, but Typst the compiler is not powerful enough to process a site.
Using a markup language is necessary for sites with mathematical content
As long as you don’t write HTML directly, you need to use a markup language to write contents of your site. Some may say that they could, or wish, or whatever, to write raw HTML for their site, and that is indeed only a personal preference except in one case: where you need to write mathematics, or more generally, formulas. You could write HTML by just typing the tags, but if you have the ability to write MathML or the SVG code for the formula and are willing to do that, please stop reading as this article is for human readers. You could also write the formulas on paper and put a photo of it in an <img> tag, but I doubt that it would be easier to just upload the scanned picture of handwritten or typewritten papers.
What is a static site generator?
So, as long as you are not writing HTML directly, you need some program to translate the content you write in markup languages to HTML. This is what a static site generator does. “Static” here is for the “site”, means that the HTML files for the site is generated before the user visits the site, not dynamically generated by the server and the browser dynamically when the user visits the site.
But, there is more to it. What is the difference between the set of pages of a site and a bunch of individual documents? The answer is that there can be interactions between the pages of a site. You want to show a list of posts on your blog front page, you want to show links and summaries to the previous and next post in the page of one post, you want to show the links and backlinks for the current post, etc. With such interactions present, a post cannot be rendered on its own; it must be provided the information about other posts in the site. This is another important aspect of a static site generator: it provide such information to the document in the rendering process. Different static site generators do this in different ways, and the ways they do this usually depend on the markup language they use.
At this point, this is somehow similar to a classical problem for programming languages: modules. Just like modules for programming languages, a document may expose something as values, and other documents can take these values as input. One problem is the processing order of these documents. Clearly, if one document relies on another, that another document needs to be processed before the former. At the end of the day, documents would form a DAG (directed acyclic graph). But this solution immediately fails considering one example we mentioned: showing links and summaries to the previous and next post in the page of one post. Generally speaking, if two documents both requires the content of another document, which is common for a site, then there is circular reference. Luckily, for a site, there is a way to bypass it. Notice that in the example of links and summaries to adjacent posts, these documents does not have real dependence between their contents, that is, the body of the article instead of all the wrapping parts. We could just extract the contents of all documents first, then, if the wrapping parts requires the content of other documents, we could fill the values in without issues, because the contents is already settled. Technically, by using more fine-grained reference resolution, even some extent of circular reference between contents can be settled (maybe after several passes to reach a fixpoint), but usually requiring the contents to form a DAG is enough. Therefore, such a static site generator would be a two-pass structure: one pass to get the content of all documents, and another pass to put these contents in the wrapper parts.
Two kinds of markup languages
Markup languages are languages, so they have syntax and semantics. The syntax of a markup language tells how you write things in a text editor, and the semantics of it tells how things you write is turned into another lower-level, machine-friendlier representation, HTML in our context of static site generators.
By looking at the distance between their syntax and semantics, markup languages can be put on a spectrum. This is not a concrete definition, but basically I mean how hard it is to turn the syntax into the semantics. Usually, when this distance is small, the syntax and semantics are both rather simple; and when this distance is huge, the syntax will be a bit more complex and the semantics will be much more complex.
To the end of smaller distances, there lies Markdown. The syntax of Markdown is so simple that many could simply parse it with their eyes and brain. And the semantics of Markdown is almost directly corresponded by its syntax. Yes, there are plenty of Markdown implementations with all different behaviors, but the core (and also the mostly used part) of the syntax-semantics correspondence is still so simple: #s for headings, *italic* for italic, and so on and that’s all. I’m not saying that the implementing a Markdown parser is easy — what I mean is that Markdown has a very limited expressive power, and this limited expressive power is based on a limited syntax. This means you cannot have much interactivity within Markdown. Usually, static site generators generate the contents from Markdown documents almost literally, then use some template engine to do more advanced processing.
To the end of larger distances between the syntax and semantics, there lies Org-mode, MDX, and so on. On the surface they seems like Markdown, just with different syntax. But there is a huge difference between them and Markdown: they need to be interpreted rather than simply translated. This is because, almost all these markup languages is programmable. Basically all of them have some kind of template-like features and allows you to customize the behavior of the document with their host language. One benefit of this is, you can usually use the same host language to handle both the passes (source text to contents, and contents to HTML), so the markup plus the host language (and relevant libraries) is the static site generator.
Typst still needs more time
At this point, readers familiar with Typst will put it to the latter group, since Typst the language is basically a full-powered scripting language plus a thin layer of markup language, and the semantics of the markup part can be largely manipulated by the scripting part. So one cannot know the result of interpreting a Typst document without compiling it using the Typst compiler. And Typst has built-in HTML export support, so we don’t even need to bother put the result into HTML templates! So, it should be easy to use it to generate static sites?
The answer is no, or, you can do that but must in an awkward way.
To understand this, we need to think about how to implement the interactions between documents with Typst. In other words, how to give the rendering process of one document the information of other documents in the site.
First, one key fact to keep in mind is that, unlike Org-mode and MDX and similar systems that have both the markup language and the host language (Emacs Lisp and JavaScript, respectively), Typst itself is the language. Without a higher level “host” language, it lacks the ability to process the data of other documents within Typst the language itself. That is to say, for instance, you cannot do this by using #include or #import directly. Consider again the example of links between adjacent posts. So post 1 need to get the title of post 2, and post 2 will also need the title of post 1. Then there is the problem of circular reference, because both document have top-level content referring to each other. Also, #include and #import usually gives you contents, which is very hard to deal with within Typst; what’s more, these contents are not the final result of other documents — they may contain #contexts, which are opaque boxes whose value are not resolved at the time of interpreting documents. So, to use Typst as the markup language for a static site generator, you cannot just write a bunch of Typst code and let the compiler do that for you — you must have another system to orchestrate the generation process, most importantly handle the interaction between documents by providing them the data of other documents.
Nevertheless, it turns out you also cannot do this by providing the information of other documents as input to the Typst document. The first problem will still be the ordering — as long as you want to retrieve the content of other documents instead of just some metadata, an interpreting order must be determined and thus circular reference is not achievable, even though we are sure that these documents only use a small part of the other documents and will converge to a fixpoint. Also, it will be very, very hard to retrieve contents in this way. Typst has no way to serialize and deserialize Typst contents. So, what we get as contents of documents are HTML, that means if we want to use these contents from within Typst files, we will need to turn these HTML back into Typst content. If we do not pose constraints to the flexibility of Typst, then it’s impossible to convert arbitrary HTML back to Typst content. If we regulate that our documents can only output to HTML of certain structure, then using Typst does not bring too much benefits over just using other mature static site generators with Markdown, not to mention how hard it is to do the parsing in Typst.
Finally, you can do this in an awkward way, that is, use Typst as a more expressive version of Markdown which produces only the content part of documents, and use a HTML template engine to handle the interaction part. This is awkward because there is now two things both being able to generate arbitrary HTML, causing conceptual redundancy. But, it seems that this is the best and only solution to this problem right now.
(However, there is one open issue for HTML reader for Typst. Given that Typst already has typed HTML API now, I imagine the final shape for such a reader API would be parsing HTML string and returning html.*. If that is implemented, one could then get rid of a separate HTML template engine.)
One may realized that all the above discussion is only considering using Typst as a separate process instead of integrating it as a library. Technically, one could bypass the above awkwardness by using Typst as a library. For example, one could write their own version of HTML reader that parses HTML string to html.* as mentioned above. The problem with this approach is that you must give up the compatibility with the Typst compiler: your document will be a dialect of Typst, only able to be processed by your static site generator. Is it worth it? For different people answers may vary. For me, compatibility (and thus the Typst ecosystem) is very important.