Newbie Robots.txt file Q

Forum Moderators: goodroi

Message Too Old, No Replies

Newbie Robots.txt file Q

CoffeeMan

12:47 am on Oct 2, 2005 (gmt 0)

Hi Folks,

I am a newbie to Robots.txt files and would like opinions on my file:

# All robots will spider the domain

User-agent: *

Disallow: /cgi-bin/
Disallow: /images/
Disallow: leftpage.html
Disallow: header.html
Disallow: footer.html

I have just recently moved from mostly Frame webpages to mostly non-frame web pages.

Your comments are appreciated.

Regards,

Tom

[edited by: Woz at 1:04 am (utc) on Oct. 2, 2005]
[edit reason] No URLs please, see Tos#13 [/edit]

Lord Majestic

12:50 am on Oct 2, 2005 (gmt 0)

You may want to remove your domain name as its against TOS to post it here.

Your robots.txt is fine apart from:

Disallow: leftpage.html
Disallow: header.html
Disallow: footer.html

Every single URL has to start with / otherwise you should not count on it being matched.

CoffeeMan

1:12 am on Oct 2, 2005 (gmt 0)

Lord Majestic,

Sorry about the URL - it was an honest mistake - also, it looks like the eraser has already taken it out.

#1 So... my R.txt file should look like this?:

User-agent: *

Disallow: /cgi-bin/
Disallow: /images/

#2 - can you elaborate further on:
Every single URL has to start with / otherwise you should not count on it being matched.

TIA,

Tom

Lord Majestic

3:57 pm on Oct 2, 2005 (gmt 0)

Every single URL has to start with / otherwise you should not count on it being matched.

URLs in Disallow statements should start with / because robots.txt standard requires trying to check if actual URL starts with that value -- since all urls will start with /, it means that if you have not got it specified there then it won't be matched and thus won't be disallowed, and technically it will be all your fault.

CoffeeMan

12:34 pm on Oct 3, 2005 (gmt 0)

LM...

Thanks for your help,