homepage Welcome to WebmasterWorld Guest from 54.161.191.154
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Google, robots, disallow
cyberdyne

5+ Year Member



 
Msg#: 4307351 posted 3:00 pm on May 3, 2011 (gmt 0)

One entry in my robots.txt is shown below, yet today, two Google IP addresses (64.233.172.18, 74.125.75.17) visited [u]only[/u] two files in a directory named /jscript/ .

There was no Google-related U-A in the log entry (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7), but shouldn't Google be adhering to the below rules regardless of the IP or U-A that they use?

User-agent: *
Disallow: /j


Thanks in advance.

 

enigma1

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4307351 posted 5:44 pm on May 3, 2011 (gmt 0)

but shouldn't Google be adhering to the below rules regardless of the IP or U-A that they use?

No actually and you cannot tell if it was human or bot just because the IP is allocated to google. Robots.txt are "guidelines" and there are ways to force even the popular spiders to go through restricted folders and scripts. They are also various google services regular visitors could use to retrieve stuff from your site (eg translation tools) and even automate them.

One way to get around it - to a certain extend - is setup a cookie and check it on the server end by having a redirect or something along these lines. If no cookie is present don't allow access to these scripts. If they're js files you may have to wrap them with a server script to check the cookie value.

cyberdyne

5+ Year Member



 
Msg#: 4307351 posted 6:01 pm on May 3, 2011 (gmt 0)

OK, not sure how I would go about all that but I'll look into it. Thanks for your reply.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved