How to Use Google Ngram More Effectively

An example of a Google Ngram

Language and linguistic studies will often need data on how words are used, especially over time. While research is a necessity, having tools to hand you the data you need are welcome. The Google Ngram Viewer is a great way to find word trends throughout the Google Books library quickly.

In this post, we show you how to use Google Ngram more effectively. First, let’s introduce you to the tool.

Introducing Google Ngram

Google maintains a multilingual database of published language. By scanning books en masse, the search giant is able to process the text and provide statistics based on the frequency of words.

With the Google Ngram Viewer search tool, you can search through this data. By comparing the relative popularity of words, you can map how language and culture have changed over time.

However, the Google Ngram tool can do much more than simply report word frequency, as we’ll get onto.

Before we get into advanced “tactics,” let’s run through how to carry out a basic search. From the Google Ngram page, type a keyword into the search box.

Google Ngram's search text.

If you want to include all capitalizations of a word, tick the Case-Insensitive button. This search would include “Tech” and “tech.”

Below the search box, you can also set parameters such as the date range and “smoothing.” The latter value removes atypical spikes and dips from your data. Lower smoothing values are more precise, while higher values reveal deeper trends only.

How to Select a “Corpus”

The corpus is the text collection that the Ngram Viewer will examine. The default of “English” is acceptable for casual browsing but can be highly academic.

Choosing a corpus in Google Ngram.

“English Fiction” will more closely reflect common language. The standard “English” corpus can be non-fiction heavy with plenty of technical words.

While the deeper meaning behind your choice of corpus is beyond the scope of this piece, Google offers a brief insight into the right choice for you.

Carrying Out Advanced Searches

By using additional search words, you can create complex comparisons. To do this, separate each term with a comma.

Searching for multiple keywords.

The Ngram Viewer will display the relative frequency of your search terms in a single graph. Here, you can hover over the graph’s lines to see precise data points.

Targeting a single data point.

You can also use an asterisk in your search terms as a wildcard. For example, “Bachelor of *” would return results for many Bachelor’s degrees.

Using wildcards in search terms.

To find all the inflections of a term, append the “_INF” modifier.

Finding inflections in Google Ngram.

If a word includes many parts of speech, you can be more specific using text operators. The valid parts of speech in Google’s database include all of the following:

  • _ADJ_: adjective (fast, large, smart)
  • _ADV_: adverb (quickly, later, always)
  • _PRON_: pronoun (their, it, we)
  • _DET_: determiner or article (a, an, the)
  • _ADP_: adposition (prepositions and postpositions)
  • _NUM_: numeral (first, second, fifth)
  • _CONJ_: conjunction (and, nor, but)
  • _PRT_: particle, which is a catchall, rarely-used category for other word functions

Each of these can be combined into phrases. For example, “_ADJ_ boy” would return word pairs for the adjective and “boy.”

To specify a specific part of speech for one search term, append it to the end. For example, “water_VERB” without a trailing underscore. To include every part of speech for a given word, use the wildcard operator after the underscore.

Functional Variables, Compositions, and Dependencies

Using functional variables in Google Ngram.

Functional variables let you search by the function or placement of words.

  • _ROOT_ is a placeholder for the root of the sentence’s parse tree. This is typically the primary subject or the word modified by the verb.
  • _START_ indicates the beginning of a sentence. (“_START_ President Obama” returns only sentences that start with the phrase “President Obama.”)
  • _END_ indicates the end of a sentence. (“_ADP_ _END_” returns sentences that end in prepositions.)

By combining search terms with arithmetic operators, you can perform simple mathematical analysis with values for term frequency:

  • + adds multiple expressions into one search term
  • subtracts the expression on the right from the expression on the left, providing a quick way to compare the relative use of two search terms.
  • / divides the expression on the left by the expression on the right
  • * multiplies the expression to compare ngrams of widely varied frequency. Make sure to enclose the whole ngram in parentheses to avoid having the asterisk parsed as a wildcard character.
  • : (a colon) searches for the ngram on the left within the corpus on the right.

Finally, you can set dependencies with “=>” to search linguistic relationships.

Using dependencies in Google Ngram.

For example, “car=>fast” would return results where “fast” was grammatically dependent on, or modifying, the word “car.” This can be mixed freely with any of the advanced search operations.

Conclusion

Searching for word trends has many academic applications. A quick way to find the information you need is Google’s Ngram tool. The good news is that it not only lets you conduct basic searches. You can apply powerful modifiers to hone in on the information you need.

None of Google Ngram’s functionality would be possible without the search engine’s advanced grunt under the hood. Are you impressed by what the Google Ngram tool can do? Let us know in the comments section below!

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Tom Rankin Avatar

Read next

Suzanne Simard sealed paper birch and Douglas fir seedlings inside plastic bags, fed them carbon-14 and carbon-13 dioxide, and nine days later found carbon had crossed between species through fungal threads in the British Columbia soil beneath her boots
A species of jellyfish called Turritopsis dohrnii can revert its adult cells back to a juvenile polyp stage when injured or starving, effectively restarting its life cycle, and biologists have so far failed to identify any natural limit to how many times it can do this.
A Japanese man named Jiroemon Kimura, who lived to 116, was born in 1897 when Queen Victoria still ruled and died in 2013, meaning a single human life personally overlapped with the invention of the airplane, the atomic bomb, the internet, and Instagram
The Hollywood sign originally read HOLLYWOODLAND when it was built in 1923 as a real estate advertisement for a housing development, and it was only meant to stand for 18 months, but nobody ever got around to taking it down and the city eventually adopted it as a landmark
Almost all of the world’s internet traffic does not travel by satellite but through fibre-optic cables lying on the ocean floor, a hidden web of wires crossing the deepest parts of the sea to connect the continents.
People who flip their phone face down on every table aren’t being secretive. They figured out that staying interruptible meant handing their time to whoever rang first
Twitch vs. Facebook Gaming vs. YouTube Gaming: What’s the Best Live Game Streaming Platform?
Chrome Extensions Ownership Transfer is a Direct Threat to You: How to Stay Safe