Welcome to WebmasterWorld Guest from 54.157.222.62

Forum Moderators: goodroi

Message Too Old, No Replies

Google, robots, disallow

   
3:00 pm on May 3, 2011 (gmt 0)

10+ Year Member



One entry in my robots.txt is shown below, yet today, two Google IP addresses (64.233.172.18, 74.125.75.17) visited [u]only[/u] two files in a directory named /jscript/ .

There was no Google-related U-A in the log entry (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7), but shouldn't Google be adhering to the below rules regardless of the IP or U-A that they use?

User-agent: *
Disallow: /j


Thanks in advance.
5:44 pm on May 3, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



but shouldn't Google be adhering to the below rules regardless of the IP or U-A that they use?

No actually and you cannot tell if it was human or bot just because the IP is allocated to google. Robots.txt are "guidelines" and there are ways to force even the popular spiders to go through restricted folders and scripts. They are also various google services regular visitors could use to retrieve stuff from your site (eg translation tools) and even automate them.

One way to get around it - to a certain extend - is setup a cookie and check it on the server end by having a redirect or something along these lines. If no cookie is present don't allow access to these scripts. If they're js files you may have to wrap them with a server script to check the cookie value.
6:01 pm on May 3, 2011 (gmt 0)

10+ Year Member



OK, not sure how I would go about all that but I'll look into it. Thanks for your reply.