Forum Moderators: open

Message Too Old, No Replies

Garlik? A UK outfit

         

tangor

4:39 am on Jun 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



New one for me... info:


178.17.32.nn GET /robots.txt "GarlikCrawler/1.1 (http*//garlik*com/, crawler@garik*com)"
(edited with asterisks but the quotes do exist)

Garlik was founded by Mike Harris, founding CEO of Egg plc, former Egg CIO Tom Ilube, and former British Computer Society president Professor Nigel Shadbolt. As the first company to develop a web-scale commercial application of semantic technology, Garlik enables consumers to find and understand what personal information is in the public domain about them and manage how their identities appear online.


Crawler does observe robots.txt, might be a non-issue, but do we need another "industry" of bandwidth suck-our-dollars?

dstiles

10:12 pm on Jun 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



First seen 4th June.

178.17.32/20 blocked as servers. 178.17.32/24 treated as a "bot" (because of high-ish activity) with the instruction "kill".

They can't spell, either. :)

Pfui

4:46 pm on Jun 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've only seen the following string from 178.17.32.0/20 --

GarlikCrawler/1.1 (http://garlik.com/)

-- but by any name Garlik is absolutely block-worthy, imho, because it does not routinely honor robots.txt (thus it only gets to see a full Disallow robots.txt).

The crawler/company also uses fake, quasi log-spamming, and outright suspicious referrers. For example:

A. They use fake referrers with their robots.txt requests:

URI: /robots.txt
REF: http://somesitenamehere.blogspot.com/robots.txt

URI: /robots.txt
REF: http://www.anothersitenamehere.blogspot.com/robots.txt

B. They call it a crawler, but additional fake referrers are in query format:

URI: /dir/filename.html (...disallowed in robots.txt, ahem)
REF: http://subd.yetanothersitenamehere.com/bsearch/bsearch.do?q=[ASCI omitted]

Last but not least...

C. They've used a URI=REF pattern I've only ever seen with compromised machines/botnets/spambots:

URI: /dir/filename.html
REF: /dir/filename.html

[ Insert witticism about Garlik having bad breath here:) ]

walkman

4:53 pm on Jun 22, 2011 (gmt 0)



Banned them too.

Pfui

2:52 am on Jul 13, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA change/edit/whatever. OP's reported quote marks gone, but tpyo'd @garik still there...
(insert barely stifled guffaw here)

178.17.32.73
GarlikCrawler/1.1 (http://garlik.com/, crawler@garik.com)

Also gave a fake ref, some blog's privacy policy, for robots.txt and an html page. Go 'way.