Forum Moderators: open

Message Too Old, No Replies

brandwatch magpiecrawler

         

lucy24

7:30 pm on Jan 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just out of curiosity...
185.25.32.33 - - [03/Jan/2017:16:28:28 -0800] "GET /robots.txt HTTP/1.1" 200 1420 "-" "robots" 
185.25.32.33 - - [03/Jan/2017:16:28:30 -0800] "GET /ebooks/ HTTP/1.1" 200 10993 "http://example.com/2012/05/page-name/" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
(Note the slightly irritating quirk of using a different UA for robots.txt. They appear to be compliant--I wouldn't have let them in otherwise--though it's hard to be sure when they only request individual pages which happen to be crawlable anyway.)

Looking it up, I find only some very old discussions, notably
[webmasterworld.com...]
and in this slightly more recent thread
[webmasterworld.com...]

It seems to be a distributed crawler; I've seen them regularly from 5.102.174.abc, 94.228.34.203 (their favorite), and 185.25.32.abc.

What prompted my curiosity is the referers. In the past they have almost always come in from a particular site's RSS feeds, generally requesting some newly added file. The latest ones are from a site I visit, requesting the page linked from that site's Forums--but the exact pages named in the referers aren't ones I've ever visited, let alone posted to. Has it got something to do with advertising on the site? (A final interesting detail, whether relevant or not, is that the referring site is the only site that has ever persuaded me to poke a hole in my adblocker. Literally the only one.*)


* Details available on request. I was impressed, frankly.

keyplyr

8:41 pm on Jan 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I block them. They offer me nothing while using my resources to enhance their product.

lucy24

10:29 pm on Jan 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Even after looking at their website-- which I did before posting-- I'm darned if I can figure out what their product is :(

keyplyr

11:25 pm on Jan 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Brandwatch watches brands (ah 'em) by scraping data from sites & social resources about those brands (mentions, clicks, ads served, purchases, etc) then aggregates that data into products sold to their customers.

Product being a service (api) displaying that information & tools.

But you're correct in their vagueness. That seems to be a business model in many niches.

lucy24

5:32 am on Jan 6, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, if they're poring over my site to see whose ads I carry, they can continue poring*

::snrk::

* Vague mental association here with P. G. Wodehouse:
“I don’t get your drift.”
“I will continue snowing.”