The English language by the numbers
The first surprise, says Pinker, is that books contain “a huge amount of lexical dark matter.” Even after excluding proper nouns, more than 50% of the words in the n-gram database do not appear in any published dictionary. Widely used words such as “deletable” and obscure ones like “slenthem” (a type of musical instrument) slipped below the radar of standard references. By the research team’s estimate, the size of the English language has nearly doubled over the past century, to more than 1 million words. And vocabulary seems to be growing faster now than ever before.
First published May 10, 2011