Forum Moderators: Robert Charlton & goodroi
On top of that, the request headers sent by the client usually have an X-FORWARDED-FOR header with some Comcast IP.
The clients from that range don't break the robots.txt restrictions but do hit hidden links on occasion.
Because DNS is not set up as it is with the usual Googlebot, such hits trip the anti scraping protection.
Is there a way for Google to either confirm or deny that it is their range?
I wouldn't mind adding it to the white list, but would like to make sure those are read Google-related hits.
That is a Google IP and the Web Accelerator would cause a prefetch of hidden links, because of that it basicly acts like a bot and thus your defense system jabbered at you.
The prefetch can be turned off, it takes just a few lines in .htaccess you should be able to find it if you do a search.
At first, I thought those are human reviewers working for Google, but because of the fact that they seemed to hit the trap urls all to often, I wasn't sure.
Do you know if it's possible at all to use that accelerator as a proxy? In other words, can a scraper use it somehow to copy content?
If not, then I'll just add the whole range to the whitelist and be done with it. If yes, then it gets more tricky.
Search for it on WebmasterWorld there are ways to turn it off at the server end.
Here ya go:
[webmasterworld.com...]
You may want to search a bit further depending on what you want to do with prefetch there are other way to handle it.
But the information is out there.
[edited by: theBear at 1:20 pm (utc) on July 30, 2007]