Princetonbot

I found this in logs while looking for something else. It was active a little over a year ago, from 29 January through 18 March 2016, disappearing as quickly as it had appeared. During that time it only picked up images--a variety of them, on two different sites, but it had one particular favorite directory that it visited especially often.

IP: 128.112.155.170-173 (128.112 is Princeton, and what do you bet 128.112.155 is the Computer Science department? The 170-173 is odd, since it's not a /22 block, but too many to be coincidental.)
Requests: assorted image files
Referer: as if human (that is, whatever page the image belongs to--but they never got the page itself)
UA:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/600.1.4 (KHTML, like Gecko) Safari/600.1.4 (compatible; Princetonbot/1.0; +http://http://tigress-web.princeton.edu/~fy/bot.html)

(Can you guess what I was searching for that led to this accidental find?)

Further pawing through logs reveals that for a couple weeks earlier in January 2016 they used our old friend Chrome 34 for similar requests:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36

Although I haven't seen them in over a year, the page in the UA is still there, telling me--no surprises here--that

Princetonbot is an image crawler from Princeton University. It collects Internet images for an on-going research project to further our understanding in big data and deep learning.

We obtained the list of image URLs from popular image search engines, so we are not going to crawl web pages and we don’t download images with prohibited access from search engines. Also, we randomize our image access to a particular website to avoid peak traffic to the remote image server.

There's also an “Opt-Out” paragraph, but they don’t seem to have heard of robots.txt. Apparently they think that if Googlebot-Image is allowed to crawl it, then so are they.

:: insert nasty crack about The Princeton Personality here ::

Princetonbot

lucy24

keyplyr

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week