Forum Moderators: open
My other question is if do the forward/reverse lookup as Google suggested, we identified that from a few days' data, all Google accesses seem to be coming from 66.249.x.x. Can I trust that? We don't want to just allow this IP range, and then Google comes from a different IP range, and we disallow it.
Fake Googlebot IPs we detected in a few days' log:
<IP list removed>
[edited by: incrediBILL at 2:27 pm (utc) on July 14, 2009]
[edit reason] Obscured IPs, too many to edit so removed list [/edit]
Just validate Google via the round trip DNS like they suggest and all the fakes are a non-issues, discarded at the door.
FYI, often the fake googlebots are real googlebots trying to crawl via a proxy server after being tricked by the proxy owner trying to hijack your content.
So you are saying to block anything outside this 66.249.x.x range?
so if it is "googlebots trying to crawl via a proxy server", do you mean that is these spam websites that try to steal our contents, and pretend the contents are sitting at these other IP locations, right? So that sound like definitely good to block these, right?
123(Class A).456(Class B).789(Class C).012(Class D)
So you are saying to block anything outside this
I'm saying I do!
Rather you choose to do so depends upon what is beneficial or detrimental to your own website (s).
wizboy, you're right in your assumption about what these fake googlbots typically do.
Personally, I deny access to anything claiming to be googlebot unless it passes a full round-trip DNS lookup. Whether anyone else does that is up to them, as Don so rightly stated.