Forum Moderators: coopster

Message Too Old, No Replies

How to allow googlebot to spider?

         

bhuether

2:46 pm on Mar 13, 2008 (gmt 0)

10+ Year Member



I have a bad bot script. There is some code to allow whitelisted IPs to spider. It reads

/* whitelist: end processing end exit */

if (preg_match("/10\.22\.33\.44/",$REMOTE_ADDR)) { exit; }

if (preg_match("/Super Tool/",$HTTP_USER_AGENT)) { exit; }

/* end of whitelist */

In my blacklist log I saw an entry for googlebot. I looked up the IP and it really does appear to be google. I imagine googlebot uses many IPs. So how do I prevent blacklisting googlebot? For that matter, why is googlebot going to a folder that I exclude in robots.txt?

thanks,

brian

whoisgregg

5:09 pm on Mar 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google publishes a procedure for confirming a visitor is truly googlebot at:
[google.com...]

To do this with PHP just takes gethostbyaddr() [php.net] and gethostbyname() [php.net]. :)

why is googlebot going to a folder that I exclude in robots.txt

Probably isn't googlebot. Have you tried validating your robots.txt with them?

bhuether

6:30 pm on Mar 13, 2008 (gmt 0)

10+ Year Member



What is best way to extract hostname from the UA string? Or is it a server variable? I am new to this stuff...