10 May 2013
Now that we've reached over two million completions of our vocabulary survey, we thought it would be a good time to consolidate our biggest findings in a single post. All numbers here are rounded to make for nice "factoids", but link to the original blog entries with the raw data.
(Note that "native" here is short for "native speakers of English", or people who speak English as a first language, and "foreign" is short for "non-native learners of English." And remember that these statistics are based on self-selected survey participants who are the kind of people who take vocabulary tests on the Internet, and not necessarily representative of the population as a whole.)
Thanks to everyone who's taken the survey, shared it with others, and helped make this all possible! But we're not done yet — so please continue to share the site, and we'll continue to perform new analysis as more data comes in.
09 May 2013
We now have data to answer another one of the new questions we introduced almost two years ago. This time it's correlation between reading habits and vocabulary size for native English speakers, by age:
This is an extremely fascinating chart, because it reveals what we've thought all along — reading builds your vocabulary. However, we never dreamed that the final chart would look quite so "clean"! (At least, ignoring the bottom-most line for now.) Of course, when you have more than a quarter million respondents, that lets you produce pretty accurate results.
We have three main findings from this chart. The first is that while increasing your reading matters, increasing your reading of fiction, specifically, matters equally as much. That fiction reading would increase vocabulary size more than just non-fiction was one of our hypotheses — it makes sense, after all, considering that fiction tends to use a greater variety of words than non-fiction does. However, we hadn't expected its effect to be this prominent.
The second finding is that, for people who already read "somewhat", then for each level of "bumping up" their reading in general, or of fiction specifically, their vocabulary will be roughly 2,000 words larger. Indeed, the difference between someone who reads "somewhat" and fiction "not much", and someone who reads "lots" and fiction "lots", is approximately 8,000 words, regardless of age level. That's a huge difference.
[The numbers for people who read "not much" are far lower, and seem to follow a different curve entirely. Indeed, the lowest percentiles of vocabulary size in our survey overall show drastically different results. We believe this is due to a combination of 1) insufficient data (we need more participants), 2) greater variation within the group (the single category "not much" is too broad), and 3) participant selection (what kinds of people have taken the test, vs. what kinds haven't). In other words, more study is needed.]
And the third finding, completely unexpected, is that the difference between those who read "somewhat" and those who read "lots" doesn't appear to change with age — the difference at 15 years old is essentially the same difference at 60, which means that this life-long difference is already present by age 15. The previous chart doesn't show data before age 15, because with far less participation, it's much noisier. However, if we ignore the fictional literature question, and focus entirely on overall reading ability, we can get a rough idea of what's going on:
The data is still pretty noisy because of fairly low participation levels (between the ages of 4 and 8, there are still only between 140 and 260 responses per age, while by age 12, there are already more than 1,500).
However, the overall picture is still pretty clear: at around age 4, when children are only first starting to read (at best), average vocabulary levels are roughly equivalent in relation to reading habits (as one would expect) — at around 6,000 words. Then, it's between the crucial ages of 4–15 where reading makes all the difference in the rate at which children increase their vocabulary. We can calculate the differences, although these should be taken as "ballpark approximations" at most, given the noisiness of the data:
This is a fascinating finding, as it tells us that vocabulary growth is drastically affected by the amount children read. By age 15, this has resulted in a difference of 5,000–6,000 words between each level, and children who read "lots" have almost double the vocabulary of children who read "not much". Obviously, this will affect school performance, SAT scores, and so on — and it's a difference accumulated throughout all of childhood.
(However, keep in mind that these learning "rates" are based only on averages across all our survey respondents — they are not representative of the population as a whole. Also, the rates are "average" only — we also don't have enough data to separate them out into percentiles yet. 4-year olds who start out with larger vocabularies of 7,000 words, or smaller vocabularies of only 3,500 words, could likely show different growth rates, and other factors besides reading habits surely affect vocabulary growth as well.)
So, what's the biggest takeaway from this? If you're a parent, make sure your kids read, and read all they can — it will last them the rest of their lives.
08 May 2013
We've previously posted on the relationship between age and vocabulary size for native speakers. But now that it's almost two years later, and we've collected more than five times as much data, it's time to revisit the graph. The overall shape hasn't changed, of course (it's just smoother), but with so much more participation, we can now calculate not just median vocabulary levels per age, but also various percentiles as well. This gives a better idea of the distribution of vocabulary sizes among survey participants. So here is the crown jewel of our results:
To give you an idea of just how much data is compressed into this single graphic, there are over 20,000 respondents which alone make up just the single age of 21, with over 2,000 for each point in the percentile lines.
Now, remember that these percentiles are not for the population as a whole, but rather just those who have taken the test online. Comparing with self-reported SAT scores from previous analysis, overall participation is in roughly the 98th percentile of the American population as a whole — it is apparently a very "elite" group of people who spend their time taking vocabulary tests on the Internet!
But regardless, it's fascinating to see how test-takers age 50, for example, range from slightly over 20,000 words (10th percentile), to slightly over 30,000 words (median), to nearly 40,000 words (90th percentile).
(You'll also notice how we've cut off some of the percentile lines, restricting them to only a subset of ages. This is because data was too noisy in these areas.)
07 May 2013
IELTS is used particularly for admissions to British and Commonwealth universities, while TOEFL is used more for American purposes. Both tests are geared towards advanced learners of English, since they're designed especially to gauge ability to perform at university level. And here is how their scores correlate with vocabulary sizes:
Note that TOEFL scores are rounded to the nearest 10, so a score of 570 on the paper exam actually includes scores in the range 565–574 (and the "impossible" score of 680 is actually 675–677). In all three tests, we've cut off lower levels of scores where we had less data (they also tended to be relatively flat). Finally, note that the very highest score of each test "jumps" quite far up relative to the rest — this is to be expected, since the highest score includes people who would score even higher if the test examined even more difficult English.
While there's nothing particularly unexpected from this data, it is interesting to see that the three exams basically cover vocabulary ranges from lower-advanced (around 6,000) to lower-native (around 20,000) — which makes sense, since the tests are geared towards university admissions. Of course, the tests evaluate far more than just vocabulary, and include things like grammar, listening, and writing — which are not reflected in the TestYourVocab quiz.
06 May 2013
It's been over a year and a half since we last posted, and while we've been working on other projects, we're pleased to announce that, since we started, we've had over two million completed vocabulary surveys from around the world! (As of this exact moment, it's 2,196,128.)
And so we have some announcements to make:
10 September 2011
As previously promised... we are releasing the average English vocabulary levels per-country, for non-native speakers. Please keep in mind, this is not scientific in the slightest, but rather just for fun.
First, the map and ranking:
It is clear that the real "winner" here is Northern Europe: the first four places are Denmark, Norway, Finland and Sweden. After that, Europe as a whole has a relatively strong showing, along with Mexico, Argentina, Israel, Chile, and Indonesia.
But, what does this chart really mean? Besides being based normally on the vocabulary test results, there are three limits placed on the data:
Non-native. It is based on speakers self-identified as non-native speakers only, so an American living in China should not affect the data for China.
No English as an official language. We have not included results for countries where English is one of the official languages. This means no US, Canada, Australia, New Zealand, India, Pakistan, Philippines, or Singapore. While many of these countries had high numbers of respondents who self-identified as non-native English speakers, comparison with other countries would not be very meaningful.
At least 300 respondents. Less than this, and the data for the country is really not very meaningful at all. (Interestingly, China, Iran and Russia are by far our largest participants, with over 60,000 test-takers each. Next comes Ukraine at 23,000, and then Germany at 12,000.)
And then, there are two big caveats to keep in mind:
Internet participation. It is based only on people who took the survey, without any kind of scientific control, or guarantee that the participants are representative of the overall population as a whole. In fact, they almost surely aren't, since Internet users tend to be better educated and fall into particular age groups. Strictly speaking, this means the comparative data is totally useless, because it is theoretically possible that, for example, the test was popular among top students in Denmark, and among low-performing students in Iran. Such an extreme example is probably not the case, but whether participation in a particular country came via an article on a high-brow news site, or was spread by a particular group of people on a certain social network, could certainly have an influence.
IP addresses. Countries were calculated automatically from the IP addresses of test-takers. These are mostly accurate, but not perfect. Some respondents also provided their nationality in the survey. Many countries show self-reported nationalities and IP addresses matching up over 95% of the time, while other countries have a somewhat lower rate. So country identification, while good, is not absolutely perfect.
So have fun with the rankings, but don't take them too seriously. And if you have further interest, check out the EF English Proficiency Index, which shows very similar results which come from a completely different survey, and also has a PDF report with interesting profiles of English usage in a number of countries.
28 July 2011
Participation has continued to build far beyond our expectations, with over a third of a million unique visitors to the site (and, interestingly, almost everybody who visits takes the test). We realize the potential for collecting more detailed data, and have received lots of suggestions from participants as well. So today we've added several additional survey questions which will hopefully allow us to produce more interesting results:
We're looking forward to being able to publish new findings here. So thanks to everyone who's participated so far, and please keep spreading the word!
25 July 2011
Over the past week, we've had a flood of participation, with over 200,000 visitors. (If you haven't taken the quiz, you can do it here.) So now it's about time we gave something back! Previously, we'd only been able to calculate a very narrow range of statistics.
But now, we've been able to produce two great new charts which give a much fuller picture of the vocabulary sizes of native English speakers. First, since we've had much more participation from both younger and older speakers, we can calculate meaningful average vocabulary sizes from native English speakers who took the test, for ages 3–71:
This is a fantastic chart, because it shows the speed at which our vocabulary really grows. Between the ages of 3 and roughly 16, our vocabulary explodes at an average rate of almost 4 new words a day (3.8, to be more exact). Then, between the ages of 16 and 50, our vocabulary growth is slower, but still fairly consistent: around 1 new word a day (0.85, to be precise). Finally, beyond 50, vocabulary size appears to remain fairly constant. (Note that the data is still a bit jagged both for younger and older participants — we still need more data at both ends to smooth things out.)
However, note that these average vocabulary sizes of our respondents are significantly higher than those of the overall population at large. How do we know this? Because our American participants' self-reported verbal SAT scores hover at around a constant 700 (out of 800 maximum) at all applicable age levels, while the median score of SAT test-takers is only 500. And these test-takers themselves are a more educated subset of the American population as a whole. To put things in perspective, it has been estimated that if the whole US population took the verbal SAT, it would have a median verbal score of around 350. Our average respondent's verbal SAT score of 700 places him or her in the 95th percentile among SAT takers, and above the 99th percentile in the American population as a whole. This means that the average American's vocabulary size would be significantly lower than the chart above shows.
But with the new data we've acquired, we've been able to produce a second chart, more limited in age range, but which shows the levels of vocabulary growth among respondents of different verbal SAT scores:
Note: V-SAT scores are as reported, and have not been adjusted to account for the 1995 test re-centering.
First, this appears to tell us that the verbal SAT really is measuring something real, which is always good to know! But more interestingly, the data so far suggests that vocabulary growth over the years is independent of SAT score—all the slopes are essentially the same, with everyone learning the same "one new word a day" of vocabulary growth. It also shows that, at least in this SAT range of 500–800, each 50-point increase in score is equivalent to knowing roughly 1,500 more words, regardless of your age. The difference between a 500-scoring adult and an 800-scoring adult is roughly an extra 10,000 words in their vocabulary.
You may notice that, while the top-scoring lines are smooth and don't touch (because we've got lots of data), the lower-scoring lines become increasingly jumpy. This is because we still need more participation at these SAT levels — in fact, we don't have enough data to draw any reliable lines for SAT scores under 500. So, we're still waiting for more participation in order to calculate the lifetime vocabulary curve of an "average" American, with a verbal SAT score of 350.
25 July 2011
We knew that our new exposure would give us lots of data in the U.S. and other English-speaking countries. What we didn't expect was that the quiz would have so much international popularity!
In fact, Russia has generated almost half as much participation as the U.S., Germany has produced nearly 6,000 visits, China over 5,000, and Iran exactly 4,000 as we write this. We've even had solitary visits from Namibia and Zambia.
And probably half the e-mails we've received have been from non-native speakers asking for numbers to compare their personal results with. This presented us with a problem: how to produce numbers that are meaningful enough to compare one's results with?
The obvious answer would be vocabulary correlated to the number of years people have studied English. We tried that, but these numbers have turned out to be all over the place. There are so many different types of courses, and so many ways of informally learning English, that the data has been fairly meaningless so far. Plus, most people taking the test and filling out the survey already have a decent level of English, since the survey itself is in English — so lower levels are probably severly underrepresented.
We'll be surmounting many of these problems with our future launch of this test in Brazil, where styles of English learning are much more uniform, and easier to compare and correlate. (Plus, the interface will be available in Portuguese, so we can include more introductory students.) But we don't want to keep everyone waiting until then.
So we've decided to make available a chart simply showing the overall distribution of vocabulary scores, with the percentage of respondents which fall into the range centered on each score:
The largest proportion of respondents (4.7%) know 4,500 words (or are in the range from 4,250–4,749, technically). Looking at it another way (not displayed on the chart), the median vocabulary size for all respondents is 7,826 — half know more, half know less. But remember that these statistics, while they might be fun to compare yourself against, merely reflect the people who have taken this online quiz from all over the world, and is in no way representative of the world population as a whole. Percentages for people who know less than 1,000 words are not even shown on the chart (the data is too spotty/erratic so far).
However, this doesn't mean we haven't found anything else interesting. To the contrary! Because while our data isn't fine enough to find much correlation between vocabulary size and the number of years of English classes taken, we did find great differences in average vocabulary results for the following questions:
Academic performance: On average, in the English course(s) you took, how would you judge your performance in the classroom, relative to the other students you studied with?
Participation: In class, how much did you participate, talking and asking questions, compared to other students?
Natural ability: In class, compared to other students, how much do you feel, or did you feel, that learning English, and speaking it, was easy or difficult for you?
Outside of class: How much did/do you use English in "real life", learning things outside of the classroom? (Watching TV, listening to songs, writing, travelling, etc.)
Time spent abroad: Have you ever travelled to a country, or to countries, where English is spoken? If the answer is 'yes', how much time did you spend?
So, summing it up for non-native learners of English, what does this mean?
While the charts above should not be interpreted as a scientifically controlled survey, and represent only the voluntary responses of a self-selected Internet survey group, they do suggest a few things:
Academic performance helps, up to doubling your vocabulary size. But that doesn't tell us what helps academic performance.
Classroom participation matters too, but it's not the top factor. It appears to give you up to a 50% boost in vocabulary.
Outside of class is the biggest difference. Students who do "lots" of things in English outside of class have more than twice the vocabulary of those who "don't do much."
Living abroad gets you to and beyond 10,000. Up to one year abroad brings the average student from around 7,000 to around 10,000 words. After that, every year abroad gives you around 850 more words, or around 2.35 per day. (Compare that to the average American adult who learns 0.85 per day.)
But be aware that the results above are suggestive only—we have not separated out the different factors from each other statistically. So, for example, higher vocabulary sizes among people with lots of English activity outside of class might not actually be due to their learning at the time, but the fact that it made them more likely to live abroad afterwards. Or higher vocabulary sizes for top academic performers might simply be due to the fact that they took more years of classes, while others dropped out sooner. Or indeed, causation might run in the opposite direction — students whose English is already better might be more inclined to participate in class, and engage in extracurricular English activities. More research will tell.
18 July 2011
Out of nowhere, we woke up Sunday morning to discover that, instead of the slow trickle of participants we'd become accustomed to... the site was booming!
All thanks to "mike_esspe" who posted the survey to Hacker News. This small action has generated over 50,000 new completed surveys so far — over 3,000 an hour yesterday, which is almost one new visitor per second. And it's still going strong.
So a big thank you to everyone who participated, and a big welcome to everyone arriving. As soon as this new wave of participation slows down, we'll take another look at the data and see what new trends we can tease out of it, so bookmark this page!
02 December 2010
We've made two discoveries so far. The first is that, for native speakers age 18+, most people (74%) have a vocabulary size between 20,000 and 35,000 (13% below, and 13% above). Of course, this is for the specific subset of people who are Internet users and have taken our test so far.
Our second discovery is much more interesting, a statistic we haven't come across anywhere before. We calculated average vocabulary sizes for native English speakers for ages 15–32, which is the range of ages for which we have at least 100 respondents per year of birth, and discovered there is a remarkably linear progression from 23,303 words (age 15) to 29,330 words (age 32), which works out to an average increase of 355 words per year, or almost exactly one new word a day (0.97 words to be precise).
Now, this increase could be due to some kind of age-education bias among the test-takers, but we ran an analysis of average self-reported verbal SAT score per age as well in the same range, and it hovers around a constant 700 ±15 points, so the increase in vocabulary with age appears to be quite real—at least for people who originally scored quite well on their high school verbal SAT, who appear to be our main respondents so far.
This is actually quite fascinating—the fact that people don't simply stop learning once they're out of school, but that their vocabulary appears to be growing just as much at age 30 as it was at age 16.
And it leads to further questions: how much faster is vocabulary growth below age 15 (as it necessarily must be)? And to what extent, and when, does vocabulary growth start tapering off? More participation in the survey should tell us.
24 November 2010
With just a day before Thanksgiving, we're about collect our first data. The final vocabulary list has been selected, the website code has been written, and a nice, bright yellow graphical theme was put together last night.
By asking a few friends to spread the link around to friends and family, we plan to build up enough preliminary statistics on how vocabulary levels relate to education and age so that, when we finally go public with the site, we'll already have some meaningful data for users to compare their own vocabulary level with. So, here we go!
“One forgets words as one forgets names. One's vocabulary needs constant fertilizing or it will die.”
— Evelyn Waugh
|Home - About - FAQ - Hard Words - Nitty-Gritty - Blog - Related - Contact|