Early last year I wrote about Google’s Ngram Viewer, a tool based on its books corpus that allows you to graph the use of words and phrases over time. For example, you can see at a glance how references to Plato and Aristotle compare over the last few centuries. (I get the impression they’re often mentioned together.) After its launch the Ngram Viewer quickly became popular with word history researchers and also with casual users.
Now Google has relaunched the Ngram Viewer in a more powerful and versatile form. It has improved the datasets and publisher metadata and added many more books to the corpus, so the results are more accurate and comprehensive than before. The interface remains much the same – you can modify searches by timeframe, degree of detail, and corpus type, including several different languages – but it comes with a whole new bag of tricks.
A significant innovation is the ability to search by part of speech. Say you want to look for a word as a verb, but it also functions as a noun. Just append “_VERB” to your search term – the capital letters are essential – and the Ngram Viewer filters accordingly. We can see, for instance, that experience as a verb has more than quadrupled in popularity since 1900 whereas the noun has risen more gradually. (Moving the cursor along the graph opens a temporary window with numerical data.)
Notice that I modified the verb experience with “* 10” to make the two curves more directly comparable. Ngram counts can now be added, subtracted, multiplied and divided to various ends. In other words, you can treat phrases “like components of a mathematical expression” to generate what Google calls Ngram compositions.
As well as verbs and nouns you can also search for adjectives, adverbs, pronouns, determiners, prepositions and more, using the tags listed on this helpful page of tips. Google estimates the accuracy of this tagging at 95%. Ben Zimmer, writing in The Atlantic, says “this kind of grammatical annotation greatly enhances the utility of the corpus for language researchers” – be they professional or casual.
Another development attracting immediate enthusiasm from commentators is the facility to compare British English directly with American English. You can do this by adding “:eng_us_2012” and “:eng_gb_2012” to your terms. In this plot, for example, we can compare color vs. colour on both sides of the Atlantic. Notice in U.S. texts the significant rise of color and corresponding dip in colour in the late 1820s and again in the 1840s – a result, at least in part, I imagine, of the publication of the first two editions of Noah Webster’s American Dictionary of the English Language. What else can we find?Email this Post