Back Original

Fast concordances in Go

home blog portfolio Ian Fisher

fast-concordance is a Go program that creates instant concordances from a corpus of over 1,200 classic books, courtesy of Standard Ebooks. It's live at https://iafisher.com/concordance/, and you can read the source code at https://github.com/iafisher/fast-concordance.

A concordance shows every instance of a word in a corpus along with its immediate context. It's a useful tool for linguistic and literary research. fast-concordance uses a few tricks to return results as quickly as possible – in most cases, instantaneously.

The core algorithm

It's just string search. fast-concordance finds each occurrence of a keyword and filters out when it's a substring of a larger word.

Getting the context before and after the word is slightly tricky because Go represents Unicode strings as a flat array of bytes, so slicing text[index - 40:index] might land you in the middle of a multi-byte character at index - 40. A small subroutine detects this case and backtracks to the beginning of the character.

Trick #1: Pre-load into memory

The server reads all the documents into memory at start-up. The corpus occupies about 600 MB, so this is reasonable, though it pushes the limits of what a cloud server with 1 GB of RAM can handle. With 2 GB, it's no problem.

This is the simplest but also the biggest single optimization: queries that take 2,000 milliseconds from disk can be done in 800 milliseconds from memory. That's still too slow, though, which is why fast-concordance uses…

Trick #2: Parallel search with goroutines

Since fast-concordance does not read from disk after start-up, it is purely CPU-bound and a great candidate for parallel execution.

Upon each request, the server spawns a goroutine (a lightweight thread in Go's runtime) for each text in the corpus. The goroutines write to a single, buffered output channel that the HTTP handler reads from.

Parallel execution is twice as fast as sequential on my cloud server (with 2 cores) and 8–9x as fast on my laptop (with 10 cores); most queries start returning results within a handful of milliseconds.

I expected more goroutines to boost performance only up to the number of CPU cores, but in fact it was faster with over 1,000 goroutines (one per document). I thought lock contention was the culprit, but I got the same result in a version without locks or channels. If you think you know why, please write.

SIMD optimization

Modern CPUs have SIMD (single instruction, multiple data) instructions that can apply the same operation to multiple values at once. SIMD vectors on Intel processors are as wide as 64 bytes, so the potential speed-up compared to search byte-by-byte is significant.

I spent a while trying to get my SIMD implementation to outperform my naive one, without success. Eventually, while writing a C benchmark, it occurred to me that strstr, the C standard library's string-search function, was probably using SIMD already, and indeed it was. When I compared my SIMD search to an un-optimized search, it was indeed much faster, which at least validated my hypothesis.

I can't prove that Go's regex library uses SIMD, though its performance suggests it does.

Streaming HTTP response

Results are streamed back as the server finds them. It's not necessary to use something heavy-weight like WebSockets: it can be done with plain HTTP. On the server side: write to the HTTP writer and flush. On the client side: call result.body.getReader() and read incrementally.

Semaphore to limit concurrency

My server is not a very powerful machine: it has 2 virtual CPU cores, and 2 GB of memory. If many people try to run a concordance at once, it will be slow and they will be disappointed.

So, the server has a global semaphore limiting concurrency. An incoming request attempts to acquire the semaphore without blocking. If it fails, the server pushes a status message to the HTTP stream and the frontend displays "Your request has been queued" until the server is ready.

Of course, this does not make it any faster in reality, but it still feels snappy because the server responds instantly.

Time-out and user cancellation

Queries may be terminated early for two reasons:

Both these cases are handled by a 'quit' channel which is closed when the time-out expires or the request is cancelled (which the net/http library signals via req.Context().Done()). Each goroutine receives the read end of the channel, and after each match, the goroutine checks and exits if the channel has been closed.

IP rate limiting

The site limits an individual IP to a certain number of requests in a certain time interval. Further queries are rejected with an HTTP 429 error. I'm aware that this does not work well with NAT gateways – if you know a better way, let me know. ∎