Added Webmention support to the blog

A discussion on Mastodon convinced me to take a look at the Webmention standard, and I even implemented a receiver for this blog. Essentially, this is a newer variant of the Pingback mechanism: when one blog links to another, the software behind one blog will notify the other. For my blog, I implemented this as part of the commenting mechanism, and approved Webmentions will appear as comments with minimally different representation.

Given that this website is built via the Hugo static site generator, the commenting system is rather unusual. Comments received are added to the pre-moderation queue. Once approved, they are added to the blog’s GitHub repository and will be built along with the other content. Webmention requests are handled in the same way.

Security considerations

You might have heard about Pingback being misused for DDoS attacks. A naive Webmention implementation could be misused in the same way. The issue is documented and one mitigation is even listed in the standard itself: don’t initiate verification requests immediately, spread them out randomly. Pre-moderation does exactly that as a side-effect: when receiving a Webmention request, my server only performs some basic verification and saves the URL to the queue. Downloading this URL to verify that it actually contains a link to my site and to extract metadata only happens during review.

There are also some concerns when processing data from an untrusted server. I made sure to set a timeout so that this request doesn’t take too long, and I also won’t download more than 1 MB of data, to limit the memory usage of it.

Finally, spam is also a concern. Manual moderation should also help here, I won’t approve Webmentions from link farms or misleading articles of course. Still, automated spamming via a standardized interface like Webmention is easier than abusing blog-specific comment functionality. If this becomes an issue, I might disable this functionality as a last resort.

Metadata processing

A Webnotification request only contains two pieces of data: the URL containing the link and the URL being linked to. When displaying this in the comments section, one would ideally have more data: an author, a title and maybe even a text excerpt. The standard doesn’t say how one is supposed to get those, but it does refer to the h-entry microformat.

I looked at three existing implementations to get some inspiration. Two tried to extract h-entry data from the page, but they weren’t terribly consistent. For example, they assumed that the first h-entry on the page is the relevant one.

For my implementation, I decided to look for the h-entry containing the link to my article. If I can find one, I will get author, title and URL from its metadata. I will also take its content and shorten it – this becomes the “comment” text.

As a fallback, if no relevant h-entry is found or if it doesn’t contain the necessary metadata, I’ll also process the document’s <meta> tags and similar information. The description field will be used as comment text if present.

The actual code

The change to the comment server increased the code size by around 160 lines – much of the existing logic could be reused here. As to the actual website, I had to add the required link tag and adjust comment display slightly. This seems to work but I suspect that further adjustments will become necessary once real Webmention requests start coming in. Not that I’m confident to ever receive one, the standard not being widely adopted yet.

I haven’t implemented a sender yet, maybe I will at some point. It would have to run when changes are deployed to the website, detecting new articles and notifying link targets.

Comments

  • Awesome, @WPalant@infosec.exchange is giving Webmentions a try and that’s totally not my fault 😁

    Hope this works out for you too 👍 It’s intriguing how your focus is on the security side of things. Most people I know use it simply for easier interactions and backfeeding. Let me know if you’re interested in a tour/demo 😉

    Wladimir Palant

    Oook, there comes the first issue already – I clearly have to restrict title length. Who would have guessed that some implementations will set a title but make it the same as the content.

  • superkuh mentioned this article in It's good to see webmention spreading.

    I subscribe to your RSS feed and saw you implemented webmention into your neat static comment system. I figured I'd give you a webmention reply so you can test how things work in the wild.

    I'm submitting it via a curl command. I initially tried to use HTTP but apparently you've disabled that with a 301 redirect to HTTPS. HTTPS only does block some older machines and means you need to get a third party's permission (the benign dictator cert authority) to host your website. Upsides and downsides... anyway,

    curl -A "superkuh flashing a laser pointer by hand down a fiber optic cable" https://palant.info/mention/submit -d "source=http://superkuh.com/welcome-to-webmentions-wladimir.html" -d "target=https://palant.info/2020/09/03/added-webmention-support-to-the-blog/"

    I review my webmention batches manually too. But since I knew it was going to be manual I figured why implement anything and instead I just log all POSTs to the webmention endpoint (with disk space and rate filtering via nginx config). It's easy to scan them manually by eye and sort out the Pingback and other POST spam. Some of it is interesting in a car-wreck sort of way too.