Welcome to WebmasterWorld Guest from 54.156.36.82

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Fake Google or code.google.com ?

89.164.163.72

     
8:26 am on Nov 22, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 26, 2006
posts:1619
votes: 0


GET /robots.txt -

crawler4j+(http://code.google.com/p/crawler4j/)

89.164.163.72HRZagreb, Grad Zagreb, Croatia45.8, 16ISKON INTERNET d.d. za informatiku i telekomunikacIskon Internet d.d.iskon.hr
8:44 pm on Nov 23, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3148
votes: 4


DSL line in Croatia? Hmm. Probably not G. :)
12:44 am on Nov 24, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8958
votes: 409


Not Googlebot, but could very well be a private customer developing something on the Google platform. The URL matches.
12:56 am on Nov 24, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5460
votes: 3


crawler
5:12 am on Nov 24, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 26, 2006
posts:1619
votes: 0


Yea, but if they're developing something on the google platform .. what exactly does that mean... a crawler for their own benefit?

Can we block [code.google.com...] ?
5:49 am on Nov 24, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:8958
votes: 409


Crawler4j is an open source Java crawler which provides a simple interface for crawling the Web. You can setup a multi-threaded web crawler in 5 minutes!

So you can download it, and use it to crawl web documents. This does nothing in itself. The data you retrieve still needs to be processed. This tool does not do that for you.

I block many terms found in UAs including: spider, crawler, scrape, download, etc. But there are some beneficial actors that may also include some of these terms, so you need to allow the ones you like.
9:56 pm on Nov 24, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 26, 2006
posts:1619
votes: 0


Yea, didn't like them snooping on our ecommerce site. So many competitors use this type of stuff to grab our pricing then beat us by a penny it's not even funny.
10:35 pm on Nov 24, 2012 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14663
votes: 99


Just block "code.google.com" found in any user agent and you'll solve this problem once and for all.

I actually block anything with "http" or "www" in the user agent, post processing beyond the initial whitelist of course, which stops just about everything that actually advertises who they are.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members