ergophobe - 5:37 pm on Dec 17, 2010 (gmt 0)
Interesting tool. Some random thoughts.
I've used the vaguely similar ARTFL project tools for 15 years or so to get the equivalent in French.
The ARTFL text database is tiny in comparison - just a few thousand texts. However, they are verified and based on good editions. So searches tend to be more high culture biased than the ngram project.
For cultural studies, I generally find the ARTFL tool more interesting for many reasons
- regular expression searches
- small, medium and broad context (line, paragraph, page; work for out of copyright works)
- full attribution (author, work, date)
- filtering (by author, date)
I guess the ngram tool ultimately has "context" by taking an ngram and putting it into Google Books.
I spend hours per day in Google Books and doing text-based searches is a real art. As books get older, especially yellowing books from the 1850-1950 period (the worst for that problem) and with older fonts, the OCR tends to be poor. As you go even further back and spellings get less consistent, the OCR tends to mangle words more often.
So it makes me think the data gets worse as you get back in time. Words that commonly had a lot of kerning or letters that were easily confonded would, I think, have a certain evolution even without change in usage, just because of changes in printing techniques.
Nevertheless, I do find it interesting to be able to do searches like this
[ngrams.googlelabs.com...] (be nicer, lose weight)
[ngrams.googlelabs.com...] (liberty, safety)
[ngrams.googlelabs.com...] (f**k - very interesting and something I've seen a lot as a historian - people were more prudish between 1850 and 1950 than they were before or since)
For those of us who are colorblind, you can't really do searches with more than three terms.