Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot: robots.txt & trailing slashes

         

Makaveli2007

8:57 am on Oct 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi,

I just came across a very old post (from 2005!) that basically stated googlebot does whatever it likes and that it ignored robots.txt. On the other hand, I remember reading a Google spokesperson saying they did respect robots.txt (and other SEOs I know are saying theyve never had a problem with that). So..does googlebot respect the robots.txt? or not really?

Another thing I read in this thread is this: if you link to /folder/index.html (for example), then googlebot will automatically request /folder/ to see if it's there...so that one should always add trailing slashes when linking to folders and not link to /folder/index.html but always to /folder/ ...is there any truth to this? Does this still hold true 3 years later?:)

thanks!

tedster

11:40 am on Oct 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Googlebot does usually respect robots.txt - but it doesn't always check robots.txt before every crawl so there can be a time lag if you make changes.

Yes, the trailing slash problem can still cause difficulties. Google is working to prevent some of the most common canonical url problems [webmasterworld.com], but they still can cause non-optimal indexing. It is the webmaster's responsibility to understand their website's technology and eliminate as many of the potential sources of trouble as they possibly can.

g1smd

6:57 pm on Oct 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you disallow /folder/ you need to check what your website does when someone requests /folder

The correct response is that it should 301 redirect to /folder/ but that isn't always the case - sometimes content is returned with "200 OK" status - and will be indexed.