Google’s Ngram Viewer 2.0 – a new bag of tricks

Posted by on October 30, 2012

Early last year I wrote about Google’s Ngram Viewer, a tool based on its books corpus that allows you to graph the use of words and phrases over time. For example, you can see at a glance how references to Plato and Aristotle compare over the last few centuries. (I get the impression they’re often mentioned together.) After its launch the Ngram Viewer quickly became popular with word history researchers and also with casual users.

Now Google has relaunched the Ngram Viewer in a more powerful and versatile form. It has improved the datasets and publisher metadata and added many more books to the corpus, so the results are more accurate and comprehensive than before. The interface remains much the same – you can modify searches by timeframe, degree of detail, and corpus type, including several different languages – but it comes with a whole new bag of tricks.

A significant innovation is the ability to search by part of speech. Say you want to look for a word as a verb, but it also functions as a noun. Just append “_VERB” to your search term – the capital letters are essential – and the Ngram Viewer filters accordingly. We can see, for instance, that experience as a verb has more than quadrupled in popularity since 1900 whereas the noun has risen more gradually. (Moving the cursor along the graph opens a temporary window with numerical data.)

Notice that I modified the verb experience with “* 10” to make the two curves more directly comparable. Ngram counts can now be added, subtracted, multiplied and divided to various ends. In other words, you can treat phrases “like components of a mathematical expression” to generate what Google calls Ngram compositions.

As well as verbs and nouns you can also search for adjectives, adverbs, pronouns, determiners, prepositions and more, using the tags listed on this helpful page of tips. Google estimates the accuracy of this tagging at 95%. Ben Zimmer, writing in The Atlantic, says “this kind of grammatical annotation greatly enhances the utility of the corpus for language researchers” – be they professional or casual.

Another development attracting immediate enthusiasm from commentators is the facility to compare British English directly with American English. You can do this by adding “:eng_us_2012” and “:eng_gb_2012” to your terms. In this plot, for example, we can compare color vs. colour on both sides of the Atlantic. Notice in U.S. texts the significant rise of color and corresponding dip in colour in the late 1820s and again in the 1840s – a result, at least in part, I imagine, of the publication of the first two editions of Noah Webster’s American Dictionary of the English Language. What else can we find?

Comments (6)
  • Nice summary, Stan, and I love the color/colour Ngram: a very graphic illustration of the power of a dictionary to influence language, the likes of which we will never see again (and I agree with you that Mr. Webster’s work was the cause of it).

    Posted by Orin Hargraves on 30th October, 2012
  • Once again, those who no longer value physical face-to-face interaction with other humans, but prefer to text. An interesting territory, where, lazy spelling has led to lazy speech. What need of meaningful thought given in well spoken conversation? Just saying.

    Posted by Zena Putnam on 31st October, 2012
  • Thanks, Orin. The effect Webster’s dictionary had is striking – and, as you say, not to be repeated, at least as far as English is concerned.

    Zena: I’m afraid I don’t know what you mean. There’s no reference to texting or suggestion of “lazy spelling” in the article. Could you clarify, or give examples?

    Posted by Stan on 31st October, 2012
  • [...] Dictionary Blog, Orin Hargraves delved into miscreant word behavior. Stan Carey updated us on Google’s Ngram Viewer 2.0, and on his own blog, explored would of, could of, might of, must of and ancient Irish names. Kory [...]

    Posted by This Week’s Language Blog Roundup: Superstorm, Romensia, and more | Wordnik on 2nd November, 2012
  • Hi. I love the new features, especially being able to compare British and English corpora. But we seem to have lost the ability to right click and save the graph as an image, which made it really easy to use on blogs, etc. Or am I missing something? I haven’t been able to find any information about this anywhere. Do you happen to know if there is any way round this (other than screen-printing and cropping)?

    Posted by Warsaw Will on 20th November, 2012
  • Hi, Will. I’ve always used a separate screengrab program for this (one that doesn’t require cropping), so I didn’t know the right-click-and-save option had been removed. Maybe a reader can help us out on this.

    Posted by Stan on 21st November, 2012
Leave a Comment
* Required Fields Notify me of follow-up comments via email