Welcome to WebmasterWorld Guest from 54.147.10.12

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.text question, disallowing files

     
7:08 pm on Oct 29, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Feb 25, 2002
posts:311
votes: 0


is it ok disallow files such as,

Disallow: widgets.php

in your robots.txt, just in a syntax checker it said should only disallow from directories.

8:54 pm on Oct 30, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


For starters you should always have a leading / as all requested URL's will start with this, so it should be

Disallow: /widgets.php

You can disallow any resource, in fact what you are disallowing is any URL starting with that item so you could have

Disallow: /widgets

and it would disallow widgets.php, widgets.html, widgets.gif etc.

1:56 am on Oct 31, 2005 (gmt 0)

New User

10+ Year Member

joined:Sept 5, 2005
posts:39
votes: 0


Would this line:

Disallow: /widgets

also disallow something like widgets-and-stuff.php In other words is putting /widgets in robots.txt is equivalent to ls -l /widgets* at the os prompt?

2:21 am on Oct 31, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


Yes.
10:41 pm on Nov 1, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 23, 2005
posts:194
votes: 0


do you need to close it with a / too? like Disallow: /widget.php/
11:02 pm on Nov 1, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


do you need to close it with a / too? like Disallow: /widget.php/

No - it would not be correct since filename can't really end with / but even if its directory it is wise to NOT include last /'.

11:37 pm on Nov 1, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> [even if its a] directory it is wise to NOT include last /'.

An interesting comment. What is the reason for this recommendation?

Jim

1:00 am on Nov 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


I think he is recomending this just in case a bot finds a link going to /adirectory without a trailing slash and so it won't match disallow: /adirectory/ and so the bot will then request the URL and then will either get given a redirect to /adirectory/ or would actually be served contents from that directory. It is possible some bots might actually request a URL given in a redirect without checking this new URL against robots.txt
1:21 am on Nov 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


An interesting comment. What is the reason for this recommendation?

Dijkgraaf is spot on this one - to add some webservers seem NOT to issue redirect so bot won't get a chance to re-check new url (with slash) against robots.txt and thus unintentionally "violate" robots.txt. I had a few of these and ended up removing end slashes from robots.txt's disallow directivies to ensure that my bot won't crawl urls that webmaster clearly wanted not to be crawled even though technically it would have been webmaster's fault.

Not specifying slashes is the wisest way because it catches all possibilities.

1:37 am on Nov 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Thanks for the very good points.

Fault-tolerant is good!

Jim