(Please also see the main blog page, with all entries.)

Native speakers in greater detail

08 May 2013

We've previously posted on the relationship between age and vocabulary size for native speakers. But now that it's almost two years later, and we've collected more than five times as much data, it's time to revisit the graph. The overall shape hasn't changed, of course (it's just smoother), but with so much more participation, we can now calculate not just median vocabulary levels per age, but also various percentiles as well. This gives a better idea of the distribution of vocabulary sizes among survey participants. So here is the crown jewel of our results:

To give you an idea of just how much data is compressed into this single graphic, there are over 20,000 respondents which alone make up just the single age of 21, with over 2,000 for each point in the percentile lines.

Now, remember that these percentiles are not for the population as a whole, but rather just those who have taken the test online. Comparing with self-reported SAT scores from previous analysis, overall participation is in roughly the 98th percentile of the American population as a whole — it is apparently a very "elite" group of people who spend their time taking vocabulary tests on the Internet!

But regardless, it's fascinating to see how test-takers age 50, for example, range from slightly over 20,000 words (10th percentile), to slightly over 30,000 words (median), to nearly 40,000 words (90th percentile).

(You'll also notice how we've cut off some of the percentile lines, restricting them to only a subset of ages. This is because data was too noisy in these areas.)