brandwatch magpiecrawler

Just out of curiosity...

185.25.32.33 - - [03/Jan/2017:16:28:28 -0800] "GET /robots.txt HTTP/1.1" 200 1420 "-" "robots" 
185.25.32.33 - - [03/Jan/2017:16:28:30 -0800] "GET /ebooks/ HTTP/1.1" 200 10993 "http://example.com/2012/05/page-name/" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"

(Note the slightly irritating quirk of using a different UA for robots.txt. They appear to be compliant--I wouldn't have let them in otherwise--though it's hard to be sure when they only request individual pages which happen to be crawlable anyway.)

Looking it up, I find only some very old discussions, notably
[webmasterworld.com...]
and in this slightly more recent thread
[webmasterworld.com...]

It seems to be a distributed crawler; I've seen them regularly from 5.102.174.abc, 94.228.34.203 (their favorite), and 185.25.32.abc.

What prompted my curiosity is the referers. In the past they have almost always come in from a particular site's RSS feeds, generally requesting some newly added file. The latest ones are from a site I visit, requesting the page linked from that site's Forums--but the exact pages named in the referers aren't ones I've ever visited, let alone posted to. Has it got something to do with advertising on the site? (A final interesting detail, whether relevant or not, is that the referring site is the only site that has ever persuaded me to poke a hole in my adblocker. Literally the only one.*)

* Details available on request. I was impressed, frankly.

brandwatch magpiecrawler

lucy24

keyplyr

lucy24

keyplyr

lucy24

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week