Back Original

Writing my own text editor, and daily-driving it

A programmer's text editor is their castle


Toy software performing a task when the stars align and the bytes hold their breath is one thing. Ingesting whatever freakish data the real world has to offer and handling it gracefully is another.

For a while I’ve been dissatisfied with my text editor. I settled on Howl about a decade ago; it’s lightweight and efficient to use, but it falls down in a number of areas:

For the past few years I’ve been shopping around for alternatives. Here follows an inexhaustive list of editors I’ve tried:

They’ve all got their own strengths, but none of them possess the fingerspitzengefühl I’m looking for. I stuck with Helix the longest, but after a month of use I fell out of love with it. I don’t have any specific criticism of it, it’s a very good editor: it just didn’t possess that ineffable quality that I care about.

And so, for the past 2 years, I’ve been working on my own. Here are a few reflections on different aspects of its development: feel free to skip to whichever section personally interests you the most.

Starting out

To begin with, I kept my scope small. Here are some things I kept off my feature list:

Progress was slow at first. It’s really quite demotivating to stare at a blank screen while you’re hacking together basic terminal rendering and I’d often go for weeks without working on it at all.

As it happens, this was my second shake at writing a text editor and I elected to build a simplistic if reasonably composable TUI framework that simplified event and state logic. In hindsight, much of this early effort was overkill and I ended up incrementally tearing it out as time went on in favour of a more literal, fine-grained approach.

Eating the dogfood

At some point in time I finally reached a critical threshold: my editor was just featureful enough to open a single text file, perform some simple work on it, then save the changes. I decided to start practicing three things that, in hindsight, have been critical to the project not becoming another dead entry in my ~/projects directory:

  1. My editor replaced nano. Any time I needed to edit a system file or quickly make notes, I’d force myself to use it - no matter how painful.

  2. Every time I hit a missing feature, bug, weird behaviour, or limitation I would document it in the project README.md, no matter how small. Knowing what to hack on in my free time to make progress was essential.

  3. If any issue bothered to me to the point of annoyance, I had to fix it: right there and then.

In combination, these practices pushed my work on the project from an hour a month to several hours a week. Almost all of the ~10k lines of code have appeared in the past 6 months alone.

A filled line graph from March 2024 to March 2026. Almost nothing occurs until June 2025, at which point the graph climbs rapidly

Cursor manipulation

Cursor manipulation is difficult! When you’re using a text input widget, much of the behaviour you expect as table-stakes isn’t something you’re even conscious of. Exactly what happens when you hold a keybinding like ctrl + shift + left is probably muscle memory but the logic required to getting it all playing together nicely is not fun to write.

The best advice I can give is to try implementing high-level inputs in terms of more primitive ones. Want to implement word-wise backspace? Implement it in terms of word-wise cursor movement, selection of the range between the start and end position, and then deletion. When you get round to implementing undo/redo, you’ll have to ensure that these 3 actions get grouped together as one to avoid an undo producing really unintuitive results. It’s not surprising that modal editors simply skip the middleman and just expose these primitive operations directly to the user, allowing them to chain them together.

File browser

There was one feature that kept me coming back to Howl, and it’s one that it does shockingly well: its file browser.

Howl’s file browser isn’t much to look at, but using it is joyous. It has an instantly-updating fuzzy filter for files, and the filter is good; it’s very common for it to get a positive hit on the file I’m looking for on the first keystroke or two, and exceedingly rare that it doesn’t get it by the fourth or fifth. If a file doesn’t exist and you want to create it, you can do so inline without needing to switch to another menu. Typing ~/ will automatically switch you to your home directory, no matter how deep you are in another path. The main editing window displays a preview of the file you’re about to open.

Howl’s file browser does so many things right and it confuses me deeply that so many other editors elect to have such lackluster solutions to the problem of opening files. A great number of editors in the list either force me to reach for the mouse, pull me out of the editing experience by showing me GTK’s default opener dialogue, or ask me to guess the name of the file I want to open instead of showing me which options are available.

Reimplementing it - with my own personalised quirks - wasn’t a particularly complex task. For the file filter, I briefly considered doing something complex like using the Levenshtein distance between the filter and candidate entries, but a valuable lesson was realising that all of this effort is pointless overkill because in practice only three things are required to provide excellent predictive search for a closed set of items:

  1. Whether the entry starts with the filter phrase

  2. Whether the entry contains the filter phrase

  3. The time of the entry’s most recent modification/access

Rank your items by these criteria. Allow case-insensitive matches, but rank the entry a little higher if the case matches.

That’s it! Even when performing whole-project file searches in projects with tens of thousands of files, these criteria place the file I’m looking for within 2 places of the top of the list after only 2 keystrokes in approximately 95% of cases.

Regex

Regex support is used in a number of ways:

All 3 require a reasonably performant implementation, but for the first two it is critical. I briefly considered integrating a pre-built solution like the regex-automata crate but one of my requirements is that context-sensitive edge cases like Rust-style raw string syntax is handled correctly by the highlighter, something that vanilla regex syntax can’t handle.

On top of that, the whole project is an exercise in building and understanding my own stack, so I elected to implement my own regex engine, complete with extensions to support context sensitivity and arbitrary nesting of patterns for convenience.

My first attempts were, uh, not quick. I wrote a parser for the regex syntax using my parsing crate, chumsky, and walked the resulting AST for every character in the input to discover matches.

Over time I built several optimisations on top:

  1. A single-pass optimiser that walks the AST, commuting common patterns that I’d keep see appearing again and again in flamegraphs. For example, a group of character matches will be optimised into a single String node which searches for an exact string, without any indirection.

  2. Walk the AST in an attempt to discover a common prefix shared by all matches. For example, hel[(lo)p] matches both hello and help, but both cases are always prefixed by hel - so only perform matching on locations that start accordingly. This ended up being a huge win for project-wide search.

  3. Reimplement the AST walker into a simple threaded code VM implemented via Rust dynamic calls. You can find out more about this technique here.

  4. Convert the threaded code VM to CPS form, where each VM instruction tail-calls its successor, allowing the compiler to optimise each one into a tail call.

  5. Wrap Rust’s (slow) dynamic function calls in a manner that avoids needing to perform a vtable lookup on call. You can read more about this technique here. After doing this, codegen for many regex instructions was much improved, often no more than a few machine instructions each.

  6. Implement as many of the regex instructions in terms of bytes rather than unicode codepoints as possible. One of the most brilliant things about the design of UTF-8 is that many of the techniques that make manipulating ASCII strings fast still work just fine when the string contains multi-byte codepoints!

I did make an attempt at compiling regex to a chain of jump LUTs, but I struggled to justify the added complexity after benchmarking it and finding it to only be about 20-30% faster than the threaded code approach even in the best cases while severely hurting the flexibility of the implementation.

As a result of this work I found I was able to fully highlight the largest Rust (my editor’s most complex highlighting lang) sample file I had (50,000 lines of autogenerated bindings) from a clean state in less than 10 milliseconds: faster than my screen can refresh itself. I’m sure I could push beyond this with some more work, but I know only too well just how deep the optimisation rabbit hole goes and have no desire to turn this into an exercise in regex engine design.

Highlighting

My initial approach was to rehighlight the file on every change. This does the job, but performance degredation becomes visible on larger files.

To improve things, I wrote a cache that highlights tokens on-demand in roughly equally sized chunks (very large tokens can sometimes cause chunks to be larger). When a change (‘damage’) occurs to the buffer, all chunks that overlap or come after the damage location are invalidated.

It turns out that this approach is very fast in practice. The most pessimistic case is that you’re editing text right in the middle of a very large file. In this case, you get to keep all of the highlighting state that comes before the damage area, and the editor simply doesn’t bother to highlight anything that comes much after the bottom of the screen: because highlighting information is never requested for it! Because the approach is demand-driven, it works even when multiple panes are focussed on different parts of the same buffer.

There’s nothing much to say here: when searching a project, I:

  1. Walk back from the current directory until I find a .git/ directory, indicating the project root

  2. Recursively walk all directories in the project root, matching the search needle regex pattern against file contents

  3. Extract a file snippet from each positive match and syntax-highlight it for the results preview

  4. Rank the results according to their traversal distance from the current path (nearby files are more likely to be relevant)

There are some basic filtering rules to avoid traversing things like build directories.

One slightly interesting detail is that this process is multi-threaded, and work is allocated between threads using a basic work-stealing approach. An interesting problem I faced was determining when all threads had stopped generating new work (every thread is both a consumer and a producer of work, which is unusual for work-stealing). I settled on a design where threads waiting for work would enter a critical section in which they’d increment an atomic counter and sleep for a short time in a loop. If the atomic counter ever reaches a value equal to the number of worker threads and the work queue is empty, that means that all workers have finished producing work and all of them can stop.

In practice I’ve found that the regex optimisations I mentioned above, combined with the speed of a modern SSD, mean that full project searches for trivial patterns resolve almost instantly, even on fairly large codebases like Veloren. Flamegraphs tend to show that I’m at least mostly IO-bound, although I’m sure there are cleverer approaches to batching file access that would improve things further.

I can’t express how much of a win this is for my productivity: being able to search a large codebase for answers to a question, within the editor, at the speed of thought and as those questions pop into my head, is a lovely feeling and it’s something I’d not previously experienced with any editor I actually enjoyed using.

Terminal emulator buffers

My editor is pane-based and supports many buffers open next to one-another. It quickly became clear that the convenience of having a pane be a terminal window was enormous, as opposed to relying on my terminal emulator to provide this functionality (and hence, using a different set of keybindings to manage both).

I briefly looked into implementing an ANSI parser by hand, but supporting newer terminal rendering features like OSC52, Kitty keyboard extensions, etc. quickly become an overwhelming (and, more to the point, not especially interesting) mountain that I had no desire to climb, so I opted to built on top of the alacritty_terminal crate, which implements the core escape sequence parser and terminal state management logic for the Alacritty terminal emulator project. As a result, implementing this feature proved to be fairly trivial, although I don’t have much to say about it owing to my use of third-party libraries.

My text editor is now perfectly usable as a replacement for the core functionality of screen/tmux, with richer escape sequence support to boot, so that’s nice.

Rendering

My editor is TUI-based, but just because it’s using a terminal doesn’t make it automatically fast! I still care about bandwidth when using the editor remotely over a mobile connection, and scrolling through a large file can still present a significant problem.

To minimise this, my editor has a double-buffered internal copy of the terminal screen. When a redraw occurs it compares the new frame to the previous one and only emits ANSI escape sequences for cells that have been damaged, and only emits things like cursor movement, style mode changes, etc. sequences when it actually needs to do so.

The result is that on the vast majority of terminal emulators (excluding perhaps Ghostty) it’s actually faster to open my editor into a terminal pane, cat a large file, then close my editor than it is to cat the file in the host terminal since my editor shields the host terminal emulator from the cost of processing all of those extra stdout bytes (thanks, alacritty_terminal!).

Conclusion

Common knowledge (and perhaps common sense) would have you believe that writing your own editor / tools is an exercise in pointless pain. After giving it a go, I firmly disagree: for the motivated engineer, there are a lot of advantages:

So: go make your own tools! It needn’t be a text editor. And, for god’s sake, enjoy the challenge and resist the urge to push the difficult bits off to a box of statistics. There is joy in struggle.

A screenshot of the editor in action, working on this very blog post