Baking webmentions into the build

Back in 2022 I added webmention support to this blog. The implementation worked but it was entirely client-side, so the page would load, JS would fire, fetch from webmention.io, and render the results into the DOM. With JS disabled, on a slow connection, or in an RSS reader, the discussion section was just empty.

It worked well enough that I didn’t touch it for three years, but for the last couple of weeks I’ve been doing some yak shaving.

The original setup fetched webmentions from webmention.io on every page load, caching the results in localStorage for 30 minutes. GitHub issue comments (my comment system) were fetched the same way. The result was a discussion section that would flicker in after a moment, or not appear at all if the API was rate-limited or slow.

The GitHub API rate limit is 60 requests per hour per IP for unauthenticated requests. That’s per visitor, so anyone browsing through a few posts in quick succession would hit it pretty fast and suddenly find comments not loading. There was also no record of what mentions existed at build time. Search engines, feed readers, and anyone without JavaScript saw nothing.

Fetching at build time

I run a small self-hosted service on my VPS called Morris that mirrors webmentions from webmention.io. (webmention.io receives and stores mentions sent to your site; brid.gy bridges social platforms like Mastodon and Twitter so their replies show up as webmentions too. Both are free and genuinely brilliant bits of IndieWeb infrastructure.) It indexes mentions by target URL and stores them as individual JSON files, so I’ve got a copy I control rather than depending on webmention.io directly.

A fetch script reads Morris’s index, fetches each mention, and writes the lot out to _data/webmentions.json; a second script reads the comments_issue front matter from every post and writes GitHub comments to _data/github_comments.json. A GitHub Actions workflow runs both at 4am daily and commits any updated data files. The webmention fetch checks the Morris index hash first and skips re-fetching if nothing has changed.

Fetching GitHub comments via Actions rather than client-side also means the requests go out authenticated with GITHUB_TOKEN, which has a rate limit of 1,000 requests per hour per repository. That’s more than enough for a daily cron job, and since the results are committed to the repo, visitors never touch the API at all unless new comments have arrived since the last build.

With the data files in the repo, Jekyll renders the mentions at build time using standard Liquid templates. The discussion section is in the HTML before anything loads. The JS still runs to pick up anything that’s arrived since the last build, tracks which IDs were already rendered to avoid duplicates, and re-sorts the feed chronologically. The page passes the pre-rendered IDs to the JS as two Sets:

const preRenderedWmIds = new Set([/* ids baked in at build time */]);
const preRenderedCommentIds = new Set([/* ids baked in at build time */]);

One stream instead of three

Previously the discussion section had separate blocks for comments, replies, and mentions. I merged them into a single feed sorted by date, mixing GitHub comments and webmentions together. Archived comments (old ones I’d manually added to post front matter) slot in at the top since they’re always the oldest.

Items are colour-coded by source, GitHub comments getting one accent and webmentions another, with further tints for Reddit, Mastodon, and Twitter. My own entries flip to a reversed bubble, like a sent message.

Likes, reposts, and bookmarks are gone from the stream entirely. They used to show as a grid of avatars, which was mostly noise, and a row of ten identical silhouettes doesn’t tell you much. The counts still show up in the interactions badge at the top of the post.

Interaction counts also appear on the post listing pages now. Since the data is baked into the build, the counts render statically alongside each post title with no JS involved.

A few other things

Webmention.io matches mentions against exact URLs, and I was losing some because people link with or without a trailing slash even though the canonical URL always has one. Webmention.io also has a .txt version of each post it can check against (I added .txt support back in January). I added both variants to the set of URLs the JS uses when looking for matches, which recovered a few mentions that had been silently missing.

I swapped the various hand-rolled inline SVGs scattered through the interactions markup for a Lucide icon sprite. One <svg> block in the page head, then <use href="#icon-name"> anywhere an icon is needed:

<svg viewBox="0 0 24 24" width="16" height="16" fill="none"
     stroke="currentColor" stroke-width="2"
     stroke-linecap="round" stroke-linejoin="round" aria-hidden="true">
  <use href="#icon-message-circle"/>
</svg>

Easier to maintain and a bit less markup noise. The icons used across the interactions section are: comments, likes, reposts, bookmarks, and mentions.

Webmention.io’s own blocklist doesn’t catch everything. I found a cluster of scraper sites that had all syndicated the same “budget smart home” article and sent webmentions to an old post of mine. They were easy to spot since they all had the same URL path across different junk domains. I added a BLOCKED_DOMAINS list to the fetch script and purged them from the existing data.

Recent posts:

Read next

[Discussion]

This post is also available in plain text