Forum Moderators: open

Message Too Old, No Replies

Google ignores my robots.txt file.

Which format is correct?

         

salmo

11:28 am on May 15, 2003 (gmt 0)

10+ Year Member



Checking the backlinks on some pages on a variety of sites it has become apparent that despite certain pages being excluded by a robots.txt file, they are still being indexed. Currently the format that is used is:

User-agent: *
Disallow: /folder/
Disallow: /page.htm/
Disallow: /another_page.htm/

On checking Googles own robots.txt file, it reads as follows (I haven't used the entire file, just a part of it)

User-agent: *
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalog_list
Disallow: /news
Disallow: /pagead/
etc, etc, etc, etc, more stuff etc........
Disallow: /microsoft?
Disallow: /unclesam?

I notice the lack of trailing / on most (but not all) entries. Which is the correct format I wonder?

TallTroll

11:35 am on May 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> I notice the lack of trailing / on most (but not all) entries

Any entry WITH a trailing / is interpreted as a directory. Therefore, disallowing /page.htm/ will not work, because Gbot is looking for a directory called page.htm. To disallow certain pages, just dump the /