Wordle, the daily word guessing game, is taking Twitter, the world, and my relationship by storm.
If you haven’t heard of it—how?—the rules are simple. Every day there’s a new five-letter word (a Wordle.) You get six attempts to guess it, and after each one the color of the tiles change to tell you whether a letter was in the word and in the right place (green), in the word but not in the right place (yellow), or not in the word (gray.)
These limitations are what make Wordle so fun. Everyone in the world (and, in particular, in my house) is attempting to guess the same word in the fewest number of guesses. What’s clever is you can share your progress after you succeed—but all the letters are disguised as colored blocks. So you can gloat without giving anything away.
But never let it be said there’s a game that can’t be beaten (or, according to my girlfriend, ruined) with a bit of research, analysis, and time. So, if you’ve ever wondered what the best strategy is for winning at Wordle, let’s break it down.
Letter frequency analysis is the study of how often and where letters occur in words. It’s pretty foundational to cryptography, because if you have to decode a secret message like we’re kind of doing with Wordle, it’s useful to know that you are more likely to see an E than a Q.
While exact letter frequency distribution changes based on the source text, the most common letters don’t really change.
Peter Norvig, director of research at Google, used the data from Google Books to come up with this list of the top 12 most common letters in the English language:
- E (in 12.49 percent of words)
- T (9.28 percent )
- A (8.04 percent)
- O (7.64 percent)
- I (7.57 percent)
- N (7.23 percent)
- S (6.51 percent)
- R (6.28 percent)
- H (5.05 percent)
- L (4.07 percent)
- D (3.82 percent)
- C (3.34 percent)
There’s one issue with this list for us Wordlers, though. It’s based on a natural-language source text, which means the word the kind of messes things up for us. The is by far the most common word in the English language, representing 7.14 percent of all words in the Google Books source text, followed by of (4.16 percent), and (3.04 percent), and to (2.6 percent). This means the position of T and H in the list are higher than they should be.
Another option is to just look at the distribution of letters in dictionary words. An analysis of the Concise Oxford Dictionary (9th Edition, 1995) found the 12 most common letters were: