fragments in gemini

december 11, 2021

One of the oldest issues on the Gemini specification's gitlab is about the semantics of fragments in Gemtext (if fragments should have semantics at all). I know I personally find myself wishing for fragments probably the most.

Gemini Specification gitlab: Semantics of fragments in gemtext

I believe allowing linking to headers within a document using fragments would be massively useful; is in line with the ethos for Gemini—linking to parts of a text document isn't any different than any other link to a text document; and (can be) implemented extremely simply.

The biggest stopping point seems to be that no one has implemented it. When Gemini was initially being created and people were debating whether such and such should be in the spec, a few people like Sean Conner just blazed ahead and actually implemented the darn thing and did a lot more to refine the spec and discard the bad ideas then the endless debating ever did. So I'm mostly intending this post as a call to Gemini client implementers to just implement fragments! If it doesn't fit in properly and/or it's not included in the final spec then you can always remove it again.

Edit: also according to § 1.2 of the Gemini Spec: “[Fragments] are allowed and have no special meaning” so implementing this is entirely in-line and compliant with the specification (as of version v0.16.0).

In fact, I've already contributed an algorithm (see below) to Kineto and it's been working great so far on the web version of my site:

kineto: Generate anchor links for headings

sample link to the web version of my capsule with a fragment

One thing I would do would make it entirely optional to support fragments, and just say that if a client doesn't support fragments it must strip the fragment and load the page as normal.

my proposed algorithm

There should be a universally recognized way to generate fragments to ensure that they work across all clients. Here's a very simple algorithm I'm proposing (I believe Makeworld also mentioned something similar on the gitlab issue).

It's a modified version of the method used by gitlab and github for their markdown:

gitlab's page about header ids in markdown

You take a header, strip the leading #s, and apply a set of rules to transform it into a fragment:

All text is converted to lowercase (e.g. "HelloWorld" → "helloworld")
Any non-alphanumeric characters excluding hyphens are removed (e.g. "hello,-world!" → "hello-world"). "Alphanumeric characters" should include Unicode characters in general categories L and N (your programming language should take care of this for you).
All whitespace is replaced with hyphens (e.g. "hello world " → "hello-world-")
Consecutive hyphens are replaced with a single hyphen (e.g. "hello---world" → "hello-world")
The fragment should be percent-encoded in accordance with RFC 3986

Appending a unique ID to the fragment wasn't used to make the header->fragment converter stateless, so the headings don't have to be parsed linearly and it doesn't have to keep a history of previously encountered fragments. This does mean that if you have duplicate headers or headers that map to the same fragment (e.g. "hello world" and "hello-world") that you can't link to one of them, but requiring that the fragment parser be stateful is a *substantial* increase in complexity and doesn't seem in line with the rest of Gemtext—"the only parser state is the preformatting toggle" is emphasized heavily.

I'd recommend saying that when searching for a fragment, the client should stop on the first matching header it sees and ignore all later ones, even if they match. Sourcehut actually has that limitation and functionality for its markdown fragments, and it's never a real hardship since you usually have unique headings anyways. When you don't have unique headings it's usually because they're not significant where you'd want to actually link to them (see the date subheading below the title on this post).

Pros:

Very simple implementation
Not that fragile: headers can be rearranged without breaking existing fragments

Cons:

Can't link to a duplicate or collided header further down in the document
Lots of potential collisions, any number of sequential hyphens and whitespace all map to the same thing, and case is insignificant

The algorithm is so simple I actually just threw together a few implementations:

The algorithm in ISO C99 (not Unicode-aware)

The algorithm in R7RS Scheme

The algorithm in Go

Here's the algorithm using sed and tr, it works with any POSIX 2001 compatible sed and tr:

sed 's/[[:space:]]/-/g; s/[^[:alnum:]-]//g; s/--\+/-/g' | tr '[:upper:]' '[:lower:]'

(note that it will sometimes produce incorrect fragments if your sed implementation is not Unicode-aware)