Test your vocab: perguntas frequentes

Esta página ainda não foi traduzida para português. Se puder nos ajudar, por favor nos mande uma tradução de qualquer parte, e incluiremos.

Frequently Asked Questions

For details on how the test works and was put together, see the nitty-gritty details page.

In the final count, what is actually being counted as a word?

We count headword entries in a standard English dictionary. This means the standard word derivations are not counted (for example, "quickly," derived from "quick," does not count as a separate word). And while compound words are counted (like "air conditioning"), phrases and expressions are not (like "food for thought").

I want to look up the meanings of the really hard words on the test. Do I have to take the test again to see them?

We've listed the hardest words here with links to their definitions.

Why do I have to click so many checkboxes? Can't you check them all by default, and then I uncheck the words I don't know?

The test works in two steps. The first step contains 40 words which determine your rough level—do you have the vocabulary of a 3-year-old or a 20-year-old? Since we don't know who you are, we can't assume you'll check most of the boxes.

Then, with the approximate level determined, the second step shows around 120 words in four columns, which are selected in the general area around where we think your vocabulary level is. If we guessed perfectly, the first column should be almost entirely words you know, the last column should be almost entirely words you don't know, and the middle two columns should be a gradient between the two. If we didn't guess perfectly, we still have a good buffer to give you an accurate result.

So some users will find they check more boxes overall, and others will leave more unchecked, so we leave the default to be unchecked. Of course, if you're a linguistic genius and know all the words, then there's nothing we can do, and you'll just have to exercise your finger more!

The words seem to jump quickly from really easy to really hard. Shouldn't you have a smoother transition for more accurate measurement?

They jump from easy to hard because that's where the limit of your vocabulary is. It's not a feature of our test, it's a feature of vocabulary learning. And while it might seem to you that all the easy words are equally easy, we can assure you that other people would disagree.

Your results seem low—doesn't the average adult know more than 27,000 words?

There are many different factors that go into measuring someone's vocabulary size—for example, do you count "quick" and "quickly" as two words or just one? After all, "quickly" is just a simple and predictable derivation of "quick."

Depending on how different questions like these are decided, vocabulary tests can give widely different results. We take a conservative approach, and count the number of headwords (not derived words) which you are estimated to know in a standard dictionary. In the end, what really matters is not your absolute number, but rather your score relative to others who take the same test, no matter how the test is put together.

For a much more detailed discussion of how the test works, see our nitty-gritty details page.

Your results seem high—I'm well-educated and well-read, and I'm pretty sure my vocabulary is in a higher percentile that the results you've listed!

You're probably right. The percentiles listed so far are of the people who have taken the quiz, not of the population as a whole. And their average self-reported verbal SAT score, so far, is around 700 (out of a perfect 800 score). Compare that to the average US population score of around 500, and it's clear that our test-takers are far more literate than average.

As the number of participations increases, there should be more data to separate out percentiles based on different self-reported SAT scores, for example, and we'll be able to use this to generate comparison scores that are more representative of the population as a whole.

TL;DR: Basically, we need more YouTube commenters participating. :)

What about non-native speakers? English isn't my first language, and I want to know how I compare.

We're planning on launching a sister project to measure English acquisition among Brazilian learners, which will have different survey questions (number of years learning English, academic performance, time spent abroad, etc.). Current statistics for non-native speakers are relatively meaningless in aggregate, because they vary enormously. (NEW: regardless, you can now view them in this blog post).

However, based on our own limited initial testing, we can give you a vague idea of foreign language acquisition for Brazilians enrolled in private English courses (generally meeting for around 3 hours a week):

  • 1,500–3,000 words: a couple of years of English courses
  • 4,000–6,000 words: intermediate English (4–6 years)
  • 8,000–10,000 words: advanced English (8 years) for a particularly good student

Anything much beyond 10,000 words generally only comes from living abroad in an English-speaking country for a significant period of time, or else spending tremendous amounts of one's own time exposed to English media (books, sitcoms, movies, etc.).

Will you release resulting charts/graphs relating vocabulary size to age/education/etc. when the survey is done?

Absolutely, that's the whole idea! Our results are completely dependent on sample size, so as more people take the quiz over the next coming months, we'll be adding results as they become statistically reliable. Check our blog for updates.

I want to use your original list of words to study from. Can I get it?

All our frequency information came from Adam Kilgarriff's analysis of the British National Corpus, which you can access here.

You'll probably want the lemma.num file, for the first 6,000 or so words. If you're looking for more advanced English, you'll probably want the all.num.o5 file, which contains more than 200,000 entries (although many of them are redundant).

How do you make sure people aren't cheating? Won't this affect your results?

We try. On the last survey page (which asks about age, eduction, etc.), we specifically ask people not to fill it out if their answers have been less than truthful — and our results are based only on native English speakers who complete the survey.

But in any case, since our statistics are based on aggregate results, we expect that any level of "exaggeration" will affect all results, in a roughly equal or proportional way. And since it's not so much the absolute vocabulary numbers we're interested in, as much as the differences in results among groups, cheating shouldn't have too much of an overall effect. It might change the slopes or offsets of resulting graphs slightly, but shouldn't be expected to produce any qualitative differences.

Of course, your personal level of under- or over-confidence in word knowledge, compared to the public's as a whole, will affect your score comparison. But we're trying to make the survey easy, fun, and five minutes long, and therefore worth spreading around — instead of being a rigorous half-hour exam that guarantees no cheating. In the latter case, we might not have gotten any results at all.

PS: A lot of people have suggested that we include "fake" words to try to catch cheaters. We considered this, but we believe this creates more problems than it solves. And after all, nothing in real life depends on your score!

Why do you only test up to 45,000 words?

Because honestly, there really aren't any more generally-used words than that. The Oxford English Dictionary may list 300,000 words, but after 45,000, they're pretty much all either archaic, scientific/technical, or otherwise inapplicable to any kind of "general" vocabulary test. In fact, finding such general words beyond 35,000 was a real challenge.

How accurate is my score?

Each person's vocabulary count has a margin of error of approximately ±10%. We also round results above 10,000 to the nearest 100, and results above 300 to the nearest 10.

How scientific will your results be?

We will produce results for vocabulary correlations according to age, education level, and SAT score for native speakers of English, possibly divided by country. However, these results will be based solely on people who have chosen to take the test, which is not a controlled, representative sample of the general population.

We have chosen not to ask about race, income level, religion, or a variety of other factors that could be produce a more scientific survey, in the interests of keeping the survey popular and quick to take. However, controlling for age, education and gender already goes a long way towards producing reliable results, and we plan on investigating to what extent we can control for geographic location and income level in the US by using reported ZIP codes together with US census data. In summary: the survey isn't perfect, but should nevertheless still provide quite meaningful data. And it should certainly be able to indicate what areas are worth pursuing (in a subsequent rigorous, controlled scientific survey, for example).

Why do you ask for the month I was born? Is this some kind of astrology nonsense?

Astrological signs don't coincide exactly with the months, so it's nothing as interesting as that! It's just so we can calculate the ages of younger children to a level more precise than just the year they were born.

Can I take the test more than once?

Yes. However, the wordlist we use stays relatively the same, each time you take the test. So if you look up any words you're being tested on, during or after the test, then future test results will be artificially high, and will no longer be accurate.

So if you'd like to use this test to track your vocabulary growth, we recommend only taking it once per year at the most, and only if you're very careful never to look up or ask about words you saw on the test.

Also, if you take the test multiple times, please only fill out the research survey portion the first time.

Can you expand to test other languages besides English?

We'd like to eventually, but there's an awful lot of work involved in creating a word list that is scientifically accurate, because it has to be built on top of solid data. Specifically, for any language, we need to find:

  • An available corpus, with both a sufficiently large spoken component (for accurately measuring lower vocabulary levels) and written component (for higher levels), with sufficient variety in source material
  • A reliable automatic method for "uninflecting" corpus words into dictionary words — e.g., turning "cupcakes" into "cupcake", and "wrote" into "write", so they don't count as separate words
  • An electronically available comprehensive dictionary, which can be used to filter out proper nouns and non-words from the corpus
  • A native speaker with a programming or data background, who can implement all this accurately, and then go through to manually select appropriate testing words, according to various criteria
  • Someone to translate the whole site into the new language

If you have access to the kinds of resources above for a particular language, then please contact us. We're not currently planning on launching any new languages within the next year, but it certainly is a long-term goal.