Dec 17 2010

Google Books Ngram Viewer

This is my new favorite internet toy – the Google Books Ngram Viewer. You may already know that Google Books is a project to digitize as many books as possible. Many of the recent books have copyright, so they cannot be made available online for free (authors and publishers have to eat too). But the words can be made available. The Ngram viewer is a great example of the power of computers and the internet to facilitate research and human knowledge.

Google currently has 5,195,769 books digitized – this is a massive storehouse of knowledge about 400 years of human culture. What the Ngram Viewer allows you to do is to search on words, and it will print a graph of how many times those words appear in the books it has digitized. This allows you to see trends over time. Obviously this can be used to track word usage, and is a boon to etymologists, but those words also have meanings, and that can be tracked also. Obviously there are multiple variables involved, but still this is a powerful window into the reflection of human culture in the written word.

This is already the source of a great deal of research – which goes beyond simply searching on words to comparing multiple searches looking for trends. For example, one researcher compared the use of names of famous people to track their fame. He found that over time people tend to become famous quicker and younger, and their fame fades faster also.

I decided to do a little quick research myself. Here’s what I found:

First – how popular is skepticism in the last two hundred years?

It seems that people are becoming more skeptical over time, with a huge uptick since 1900. Skeptics emerge in 1820 and then remain fairly steady. Meanwhile, writing about the paranormal and pseudoscience make their appearance around 1940, and steadily increase but are still dwarfed by references to being skeptical. What does this actually mean? That’s the hard part – who knows. That would take much more triangulation and investigation. But these raw numbers are still interesting.

How have our favorite pseudosciences fared over the last 200 years? Let’s see:

Not surprisingly, references to UFOs appear around 1950 and steadily increase to the present time. I would have thought there would be more ups and downs, but it appears publishing books about UFOs continues to be a increasing market. Around the same time Bigfoot comes into popular culture. Interest in Bigfoot is much less than UFOs, and has seemed to level off.

ESP has a blip around 1900 but interest does not take off until around 1930. Interestingly, the popularity of ESP seems to peak in the late 1970s and has dropped off considerably since then.

Next up – evolution vs creationism:

Not surprisingly, reference to “evolution” takes off after the publication by Charles Darwin of Origin of the Species in 1859. Writing about evolution dipped perceptively after the Scopes “Monkey trial” in 1925, but then recovers in the 1940s. Today it continues to increase steadily. By contrast references to “Creation” have been flat, and losing to “evolution” since 1859. The search is case-sensitive. I also searched on “creation” with a small “c”, but obviously this word can refer to much more than the origin of life so really is not meaningful. I also searched on “creationism” and “intelligent design” and these terms barely register in the last 30 years.

Finally, I ran the ultimate cultural death match – science vs faith.

Interest in science has slowly but steadily increased over the last two hundred years. Faith, on the other hand, has plummeted since around 1845. I don’t have the background to put that into any scholarly context – but it is interesting. The streams cross around 1925 (ironically, also the date of the Scopes trial – not sure of the significance), and science has been beating faith for the last 85 years – go science!

Obviously my quickie searches are of limited value as actual research, because there is no context or statistical analysis. But I did this in about 5 minutes, just to show the power (and fun) of this new tool. In the hands of actual researchers who have hours to spend on thoughtful analysis, imagine what this research tool can be used for.

This new tool represents the power of the digital information age – increasingly the drudgery and cost of time and other resources of doing stuff is being lowered, freeing individuals to utilize their pure creativity. Anyone can use this sight to do research – all you need is an idea.

21 responses so far