β’ ~2,000 words β’ 8 minute read
If you're in the business of building things that run on computers long enough, I think you will eventually acquire a favorite bug story. This is a short story about mine. I've also built an interactive tool where you can explore the concepts underpinning the heart of this bug.
The bug: two emoji enter, none leave
I was working on migrating a legacy editor to a more collaborative experience with my team. TipTap on top (itself a wrapper around ProseMirror), Yjs underneath handling the CRDT magic for real-time syncing. It worked well! Mostly.
In our alpha/early release days, when it was still mostly internal and/or early rollout users, sometimes the editor would just stop saving your content. Silently. You'd keep typing and everything looked fine, but your edits stopped syncing to the Yjs document. The next time you opened the page, everything you'd written after the failure point was gone.
It was utterly terrifying, very rare and almost impossible to diagnose because we could never recreate it. We really tried! My early suspicions generally revolved around shaky wifi connections and wonky websocket behaviors, but no amount of throttling or turning my wifi on and off seemed to recreate the issue. The experience was surprisingly resilient in those scenarios, in my memory. It felt like it happened randomly, never when anyone was looking. No obvious errors picked up in the console, no stack trace, no crash. Just... "Hey, I think my changes didn't save."
Then one day our product manager cracked it. This was not a trivial thing to find. He'd been experiencing it more than anyone else (probably because he was the best at dogfooding our product) and had been methodically narrowing it down.
"I feel like I'm going crazy, but I think it's when I type specific characters together, go back and insert a character between them..."
He'd been using π’ and π΄ in his weekly project status emails to communicate general health. Green for on-track, red for at-risk. Every week the template he was using had both characters already present and he would simply remove the one he didn't need (Generally the red one, I am happy to say!).
On this occasion he'd copied the green circle and pasted it in front of the red one at some point, or maybe vice versa. That specific operationβ inserting one multi-byte emoji adjacent to anotherβ was triggering a splice in the underlying CRDT library, which split a surrogate pair down the middle.
I remember being on the call when he showed this to me and one of my direct reports who'd been toiling away at the collaborative editing transition. I must've gotten a little too excitedβI live for esoteric bugsβ"I feel like you got energized by this," he said. He wasn't wrong.
Adding to the fun, not every emoji triggered it. Only the ones above U+FFFF that required surrogate pairs. And not all edits resulted in the problem eitherβonly the ones that caused a splice at exactly the wrong byte offset. It was a wild one to debug before we knew what was going on.
Code units, code points, and grapheme clusters
So what was going on? What does "ones above U+FFFF" in that last paragraph even mean? What byte offsets?
To understand this bug we need to introduce three pieces of vocabulary:
Code Units β Code Points β Grapheme Clusters
Code units are the raw 16-bit values that JavaScript uses to store strings internally (UTF-16). This is what .length counts. This is what .slice() and .charCodeAt() operate on as well. JavaScript operates at the code unit level by default
Code points are what Unicode actually defines as a single character. A code point like U+1F920 (π€ ) is one character in Unicode's view, but it's too big to fit in a single 16-bit code unit. So UTF-16 splits it into two code units called a surrogate pair: a high surrogate and a low surrogate. Simple ASCII characters and a lot of common symbols fit in one code unit, so the distinction doesn't matter for them. Emoji, though? Almost always two.
Grapheme clusters are what a human perceives as "one character." The female astronaut π©βπ looks like one character but is actually three code points glued together: π© (woman) + a zero-width joiner + π (rocket). Five code units, three code points, one grapheme. The deceptively simple π¨βπ¨βπ§βπ§ (Family: Man, Man, Girl, Girl) emoji is an impressive eleven! The enigmatic β is 1.
Here's how those numbers diverge:
| Code units | Code points | Graphemes | |
|---|---|---|---|
| A | 1 | 1 | 1 |
| π€ | 2 | 1 | 1 |
| π©βπ | 5 | 3 | 1 |
| π¨βπ¨βπ§βπ§ | 11 | 7 | 1 |
I will pause to once again plugin the interactive surrogate explorer I alluded to at the top. You can type any emoji and see this breakdown yourself!
How .slice() breaks things
The cowboy π€ is one code point stored as two code units (a surrogate pair). If you slice between them:
"π€ ".slice(0, 1); // β '\uD83E' (lone high surrogate)
"π€ ".slice(1, 2); // β '\uDD20' (lone low surrogate)
Those fragments aren't valid characters. They're half a pair with no partner. On their own they render as replacement characters (οΏ½) or get silently swallowed. But the real problem comes when you try to encode one:
encodeURIComponent("π€ ".slice(0, 1));
// URIError: URI malformed
That's what was crashing our tool.
What was actually happening
Yjs depends on a utility library called lib0. The lib0 splice method used JavaScript's .slice() internally. When a CRDT operation happened to land between the two halves of an emoji's surrogate pair, lib0 would produce a string with an orphaned surrogate. That string would eventually get passed to encodeURIComponent during sync, which threw an uncaught URIError.
The error was uncaught. Nothing in the Yjs or TipTap error handling caught it. So sync just... stopped. The editor kept working locally, giving you every indication that things were fine, while your changes silently went nowhere.
It only showed up on pathological edits: replacing one emoji with another, or inserting a character right between two emoji.
The hack we shipped
We couldn't fix lib0βthough I'm happy to report it did eventually get fixed! We couldn't patch Yjs. We needed to ship something.
So we did two things:
- Although we didn't initially care about offline support for our product, adding it was pretty trivial. Our thinking was it could save us in a future situation should the user get disconnected and keep typing. We would continue to update the CRDT locally, and the next time they came back to the document their changes would be updated and merged with the current state of things. This was a hedge and leaned into what CRDTs are actually good at and designed for.
- An embarrassingly nuclear option (my call, with my fingerprints all over): we attached a global
window.addEventListener("error", ...)listener that regex-matched forURIError: URI malformed. When it caught one, it logged the event for tracking and set a piece of state that our editor would check. If we saw the error, we'd throw up a modal telling the user something went wrong and asked them to reload the page. I watched this metric like a hawk and was relieved with how rare it ended up being.
We weren't the only ones. The upstream issues (yjs#303, tiptap#3020) had other editors reporting the same problem with similar workarounds.
The real fix
Two things eventually fixed it for real:
lib0 got patched. The upstream fix was to detect if the first character of a sliced string was a high surrogate without a matching low surrogate, and replace it with U+FFFD (the Unicode replacement character, οΏ½). Not perfect, but it stopped the URIError from happening and prevented sync from dying.
We made emoji an atomic node type. In ProseMirror (and by extension TipTap), you can define custom node types. We setup an extension that made emoji their own node, which meant the editor treated each one as an indivisible unit. Cursor movements and editing operations couldn't split an emoji in half. This didn't fix the lib0 bug, and there were some other side-effects here that were challenging, but it eliminated most of the editing patterns that triggered it.
I'm happy to report that the bug popped up very rarely during this hacky interim phase... but I was pretty happy when the patched version of lib0 finally landed.
The modern answer
If you're doing string manipulation in JavaScript and you care about not corrupting characters, use Intl.Segmenter:
const seg = new Intl.Segmenter(undefined, { granularity: "grapheme" });
const segments = [...seg.segment("π©βπAπ")].map((s) => s.segment);
// β ['π©βπ', 'A', 'π']
This splits by grapheme clusters rather than code units. No orphaned surrogates, no split emoji. It's what .slice() should have been doing all along, but of course UTF-16 predates emoji by decades.
Infamy
After we shipped the fix I wrote about it in my internal newsletter.

The bug became a bit of an inside joke. Coworkers would ping me with π’π΄βthe emoji combo that broke everything.
Years later, I still get memes and messages from former coworkers out of the blue about this. Some bugs you fix and some bugs fix... you?

Hard to unsee unicode problems
Once you know about it, you start seeing it in the wild. Any code that does str.slice(0, 1) or str[0] to get "the first character" is potentially broken. The most common offender: tools that generate initials from a user's name. Try putting an emoji as the first character of your first or last name in any app that displays your avatar as initials. Most of them will do something like firstName[0] + lastName[0] and end up with half a surrogate pair. Some render garbage. Some crash.
It's the same class of bug every time. JavaScript gives you code units when you wanted characters, and nobody notices until someone types something outside the Basic Multilingual Plane.
I repeat the truth I hold dearest: it is remarkable anything works at all.
Parting links
Monica Dinculescu has a great post on how emoji work under the hood if you want to go deeper. I highly recommend it!
And I'll end with one more plug for my interactive surrogate pair explorer where you can type any emoji and see this breakdown yourself, in case you missed the link up above! I think it's a nice way to visually see and interactive with the concepts discussed here.
--If you enjoyed reading this consider sponsoring my work on GitHub, subscribing to my newsletter or sharing it on Hacker News.
Published on Thursday, May 14th 2026. Read this post as plain-text.



