James Routley

2025-04-24

I share some details of the system I use to generate the HTML of this blog.

This blog was previously generated with a Ruby script but I wanted to make the build faster and by using Make I could achieve both incremental builds and parallelization. This led to a design where a full build runs many short-lived processes and Ruby is not a good fit for that because of its slow startup time. I chose to use shell scripts instead.

The most complex part is generating the Makefile, and creating the tag pages (like music). Tags are stored as metadata of posts and to render the tag pages we must create a reverse mapping from tags to posts. Both the Makefile and the tag index are created by a single Awk script.

Posts

Posts are subdirectories of the posts/ directory. Let's use this post as an example.

% tree posts/0043-nava-dmux/
posts/0043-nava-dmux/
├── README.md
├── assets
│   ├── 808-accent.png
│   ├── 909-accent.png
│   ├── dac-replaced.jpeg
│   ├── nava-bench.jpeg
│   └── nava-dac.png
└── meta.txt

2 directories, 7 files

The bulk of the post is in README.md. The meta.txt file contains metadata. It uses a simple format that can be parsed from a shell script with case ... esac.

% cat posts/0043-nava-dmux/meta.txt 
date 2025-03-08
tag music
tag dmux
uuid e1e1de76-314a-405e-93e8-9cc49f37bf4e

Individual posts are rendered by a script in the root of my blog directory called render. It takes a single argument which is the URL slug of the post. For the example post the invocation would be ./render 0043-nava-dmux. (Strictly speaking the script can render multiple posts in a row but I am not using that feature in practice.)

% cat render
#!/bin/sh
set -e

# Include global helper functions
. ./title
. ./frontmatter
. ./site

# Make sure the shell "read" builtin ignores all whitespace
# except newlines.
IFS='
'

render1() {
  dir=$1
  source_dir="posts/$dir"
  target_dir="public/$dir"
  assets_dir="$source_dir/assets"

  mkdir -p "$target_dir"
  if [ -d "$assets_dir" ]; then
    cp -r "$assets_dir" "$target_dir/"
  fi

  # Read date and tags from meta.txt.
  # We use "exec 0< file" to redirect (open) "file" to stdin.
  exec 0< "$source_dir/meta.txt"
  while read line ; do
    case "$line" in
    "date "*)
      date=${line#date }
      ;;
    "tag "*)
      t=${line#tag }
      tags="$tags "'<a href="../tags/'"$t"'.html">'"$t"'</a>'
      ;;
    esac
  done

  # Redirect stdout to the target HTML file
  exec 1> "$target_dir/index.html"

  frontmatter "$(read_title "$source_dir")" ""
  echo '<body><main><p><i><a href="../">'"$SITE"'</a></i></p>'

  # The original Markdown.pl Perl script creates the HTML body.
  # We inject the post date just below the title h1.
  perl Markdown.pl < "$source_dir/README.md" | sed 1a'\
  <p>'"$date"'</p>'

  if [ -n "$tags" ]; then
    echo "<p>Tags: $tags</p>"
  fi

  echo '<p><a href="../">Back</a></p></main></body></html>'
}

while [ $# -gt 0 ]; do
  render1 "$1"
  shift
done

Render includes a couple of shell functions and definitions that are re-used elsewhere in my site, such as read_title which reads the title from the first line of a markdown file, or frontmatter which outputs the HTML <doctype> and <head>.

The frontmatter shell function takes two arguments: the HTML document title and optionally, some extra text to be inserted into the HTML <head> section. The output goes to stdout; the caller must redirect it to the right place.

% cat frontmatter
frontmatter() {
echo '<!doctype html><html lang="en"><head>
<meta charset="UTF-8"/>
<meta name="viewport" content="width=device-width"/>
<title>'"$1"'</title>
<style>
      main {
            max-width: 75ch;
            padding: 2ch;
            margin: auto;
          }
      img { width: 100%; }
</style>
'"$2"'
</head>
'
}

Incremental builds

To speed up generating the site I use Make. Because I don't like maintaining Makefiles, I have a script that generates the Makefile for me based on the posts/ directory contents. The final site is built up in the public/ subdirectory.

% cat build
#!/bin/sh
set -e

# To make the build faster we run this Rsync job in parallel
# with Awk below.
rsync -a --exclude=.DS_Store static/ public/ &
rsyncpid=$!

# Sort pages by reverse lexicographic order. The output of Awk
# is the Makefile for the site.
(cd posts && ls | grep -v '^.DS_Store$' | sort -r | awk '
BEGIN {
  stderr="cat 1>&2"
}

# Build an array of posts in new-to-old order.
# Also create an index "tags" which maps tags to posts.
{
  posts[++np]=$1
  meta="./" $1 "/meta.txt"
  while(getline <meta > 0)
    if($1 == "tag")
      tags[$2]=tags[$2] " " posts[np]
  close(meta)
}

END {
  if(NR==0) {
    print "error: no posts found" | stderr
    exit(1)
  }

  # Declare the top-level "all" Make target
  printf "all: public/atom.xml public/index.html"
  for(i=1; i<=np; i++) printf " public/%s/index.html", posts[i]
  for(t in tags) printf " public/tags/%s.html", t
  print "\n"

  # Atom feed
  printf "public/atom.xml: atom site title"
  for(i=1; i<=np; i++)
    printf " posts/%s/README.md posts/%s/meta.txt",
      posts[i], posts[i]
  print "\n\t./atom > \$@\n"

  # index.html
  printf "public/index.html: index site title frontmatter"
  for(i=1; i<=np; i++)
    printf " posts/%s/README.md posts/%s/meta.txt", posts[i], posts[i]
  printf "\n\t./index"
  for(t in tags) printf " %s", t
  printf " > \$@\n\n"

  # Declare a target for each post
  for(i=1; i<=np; i++) {
    printf "public/%s/index.html: render title frontmatter site " \
      "posts/%s/README.md posts/%s/meta.txt\n",
      posts[i], posts[i], posts[i]
    printf "\t./render %s\n", posts[i]
  }

  printf "public/tags:\n\tmkdir -p \$@\n\n"

  # Declare a target for each tag
  for(t in tags) {
    printf "public/tags/%s.html: tag frontmatter site public/tags " \
      "posts/*/meta.txt", t
    nps=split(tags[t], ps, " ")
    for(i=1; i<=nps; i++)
      printf " posts/%s/README.md", ps[i]
    printf "\n\t./tag %s > \$@\n\n", t
  }
}
') > Makefile.site

wait $rsyncpid
if [ $? != 0 ]; then
  echo 'error: rsync failed'
  exit 1
fi

make -f Makefile.site -j 12

This is the Makefile target for the example post:

public/0043-nava-dmux/index.html: render title frontmatter site \
  posts/0043-nava-dmux/README.md posts/0043-nava-dmux/meta.txt
    ./render 0043-nava-dmux

Note that if I change the page formatting in the render script itself, then Make will automatically regenerate the index.html file of this post (and all others).

The incremental builds work well for the current size of the blog; running ./build when there are no changes takes less than 100ms. A full rebuild of public/ takes about 400ms. Making changes to just the README.md file of this post causes build to take about 150ms.

UUID generator

This is very important, I absolutely had to make my own UUID generator. I'm kidding, I don't remember why I did this but it's here now and it works. I'm sure it had something to do with wanting to have fewer dependencies.

% cat uuid.c
#include <err.h>
#include <stdio.h>

int main(void) {
  unsigned char buf[16];
  int i;
  FILE *f = fopen("/dev/random", "r");
  if (!f)
    err(-1, "fopen");
  if (fread(buf, 1, sizeof(buf), f) != sizeof(buf))
    errx(-1, "fread short read");
  buf[6] = 0x40 | (buf[6] & 0x0f);
  buf[8] = 0x80 | (buf[8] & 0x3f);
  for (i = 0; i < sizeof(buf); i++) {
    if (!(i % 2) && i > 2 && i < 12)
      putchar('-');
    printf("%02x", buf[i]);
  }
  putchar('\n');
}

The index

The index script renders the root index.html to standard output. It takes a list of tags as command line arguments so that it knows what links to tag pages to render.

In this script you can see the "hidden post" feature I use to preview pages on my phone, or share them with friends for proofreading, prior to publishing them to the whole wide web.

#!/bin/sh
set -e

exec 0< /dev/null

. ./site
. ./frontmatter
. ./title
. ./date

# Use the second argument of the frontmatter function to inject
# some extra stuff into the HTML <head> section.
frontmatter "$SITE" '<link rel="alternate" type="application/atom+xml" '\
  'href="atom.xml" title="'"$SITE"'" /> <meta name="description" '\
  'content="A blog about technology and music">'

echo '
<body>
<main>
<h1>'"$SITE"'</h1>
<p>Homepage: <a href="https://jacobvosmaer.nl">jacobvosmaer.nl</a></p>
<p>Tags:'

# Loop over our command line arguments to create the list
# of tags.
while [ $# -gt 0 ]; do
  tag=$1
  shift
  echo '<a href="tags/'"$tag"'.html">'"$tag"'</a>'
done

echo '</p><p>'

cd posts
# Use "grep -L" to print the names of the files that do not match the regex,
# i.e., the complement of "grep -l". Posts with "hidden true" in their meta.txt
# are hidden from the index, tag pages, and atom.xml.
grep -L '^hidden true$' */meta.txt | sort -r | sed 's|/meta.txt$||' | while read dir; do
  echo '<small>'"$(read_date $dir)"'</small> '\
    '<a href="'"$dir"'/">'"$(read_title $dir)"'</a><br>'
done

echo '
<p>
<p>RSS: <a href="atom.xml">atom.xml</a></p>

<!-- Recurse logo and webring elided for brevity -->

</main>
</body>
</html>
'

Deploying

I deploy the site to an S3 bucket using s3cmd. Cloudfront takes care of HTTPS and caching.

% cat deploy
#!/bin/sh
set -xe
./build
s3cmd sync --cf-invalidate --delete-removed -P --no-preserve \
  --exclude=.DS_Store public/ s3://$BUCKET/

Conclusion

This wasn't everything but you have now seen most of my artisanal blogging framework. I hope you found it interesting. Thanks for reading!