homepage Welcome to WebmasterWorld Guest from 54.197.110.151
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Strange hits from Google's IP range trip my anti-scraper
bcc1234

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3408263 posted 5:21 am on Jul 30, 2007 (gmt 0)

I'm getting hits from 72.14.192.0/18, which seems to belong to Google. But what's weird is that reverse DNS doesn't respond with the usual PTR record "*.googlebot.com."

On top of that, the request headers sent by the client usually have an X-FORWARDED-FOR header with some Comcast IP.

The clients from that range don't break the robots.txt restrictions but do hit hidden links on occasion.

Because DNS is not set up as it is with the usual Googlebot, such hits trip the anti scraping protection.

Is there a way for Google to either confirm or deny that it is their range?

I wouldn't mind adding it to the white list, but would like to make sure those are read Google-related hits.

 

Bones

10+ Year Member



 
Msg#: 3408263 posted 10:51 am on Jul 30, 2007 (gmt 0)

You might want to do a search for Google Web Accelerator.

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3408263 posted 12:39 pm on Jul 30, 2007 (gmt 0)

Is the agent really Googlebot?

That is a Google IP and the Web Accelerator would cause a prefetch of hidden links, because of that it basicly acts like a bot and thus your defense system jabbered at you.

The prefetch can be turned off, it takes just a few lines in .htaccess you should be able to find it if you do a search.

bcc1234

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3408263 posted 12:57 pm on Jul 30, 2007 (gmt 0)

Not Googlebot user agent.
That's what I thought, it's some kind of Google proxy or something like that or a gateway.

At first, I thought those are human reviewers working for Google, but because of the fact that they seemed to hit the trap urls all to often, I wasn't sure.

Do you know if it's possible at all to use that accelerator as a proxy? In other words, can a scraper use it somehow to copy content?

If not, then I'll just add the whole range to the whitelist and be done with it. If yes, then it gets more tricky.

bcc1234

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3408263 posted 1:09 pm on Jul 30, 2007 (gmt 0)

That's the latest user agent I'm seeing:

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; (R1 1.5); .NET CLR 1.1.4322)"

So blocking by UA won't work.

Also, it seems like all hits are fetches through those IP's, not just pre-loading stuff. I'm not sure, maybe that's how it's supposed to work.

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3408263 posted 1:12 pm on Jul 30, 2007 (gmt 0)

That is exactly how prefetch works.

Search for it on WebmasterWorld there are ways to turn it off at the server end.

Here ya go:

[webmasterworld.com...]

You may want to search a bit further depending on what you want to do with prefetch there are other way to handle it.

But the information is out there.

[edited by: theBear at 1:20 pm (utc) on July 30, 2007]

bcc1234

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3408263 posted 1:27 pm on Jul 30, 2007 (gmt 0)

OK, thanks.

Do you know if it can be used by scrapers in some way?
Like faking requests to the accelerator and pretending to be a toolbar so that Google does the fetching?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved