Monday, February 2, 2026
I'm working on a silly word game right now, which means I need a list of words for it. I can't rely on the system word list, since this is for a web project. I'd also like, ideally, multiple lists of words with different criteria: most common words, all words, some subsets.
It's not really clear how many words there are, because what is a word? And how do you find what words are in use? Or decide that something's used enough that it merits being put in a list, rather than just being a misspelling or something used once or twice? You probably don't want every mashing of keys that someone's done in there, or adjklfalkjsdfaklsd and variations are going to take up some considerable space. So that results in a lot of different word lists being available!
With that in mind, here are the two best word list sources I found and some of their properties.
And there are two honorable mentions, which I want to call some attention to. These are useful, but not useful for this project.
/usr/share/dict/words: Unix systems come with word lists installed, which is handy for things like spellcheckers. On my system, this file contains about 480k words. This is accessible through various libraries, too. Mine does contain things like "1080", so I want something a little cleaner. It's nice to have available! Licensing isn't super clear to me, but it can be figured out and the word list could (with appropriate licensing) then be distributed with the application.The WordNet data and Leipzig frequency lists both need to be loaded and processed, but the formats are documented and can be implemented pretty easily, especially if you have a specific subset of the data you need. I'll be using the Leipzig data most likely for my silly little word game. I might combine it with WordNet to be able to pull up definitions, but we'll see!
Someday I'd like to pull some of the Wikitionary data, because it's really cool and has a lot of different frequency lists. Like the one with the 2000 most common words in contemporary English poetry. That might not make the cut for this project, but that's just crying to be used for something else.
If you're looking for help on a software project, please consider working with me!
Please share this post, and subscribe to the newsletter or RSS feed. You can email my personal email with any comments or questions.
Want to become a better programmer? Join the Recurse Center!