| 8:54 pm on Oct 30, 2005 (gmt 0)|
For starters you should always have a leading / as all requested URL's will start with this, so it should be
You can disallow any resource, in fact what you are disallowing is any URL starting with that item so you could have
and it would disallow widgets.php, widgets.html, widgets.gif etc.
| 1:56 am on Oct 31, 2005 (gmt 0)|
Would this line:
also disallow something like widgets-and-stuff.php In other words is putting /widgets in robots.txt is equivalent to ls -l /widgets* at the os prompt?
| 2:21 am on Oct 31, 2005 (gmt 0)|
| 10:41 pm on Nov 1, 2005 (gmt 0)|
do you need to close it with a / too? like Disallow: /widget.php/
| 11:02 pm on Nov 1, 2005 (gmt 0)|
|do you need to close it with a / too? like Disallow: /widget.php/ |
No - it would not be correct since filename can't really end with / but even if its directory it is wise to NOT include last /'.
| 11:37 pm on Nov 1, 2005 (gmt 0)|
> [even if its a] directory it is wise to NOT include last /'.
An interesting comment. What is the reason for this recommendation?
| 1:00 am on Nov 2, 2005 (gmt 0)|
I think he is recomending this just in case a bot finds a link going to /adirectory without a trailing slash and so it won't match disallow: /adirectory/ and so the bot will then request the URL and then will either get given a redirect to /adirectory/ or would actually be served contents from that directory. It is possible some bots might actually request a URL given in a redirect without checking this new URL against robots.txt
| 1:21 am on Nov 2, 2005 (gmt 0)|
|An interesting comment. What is the reason for this recommendation? |
Dijkgraaf is spot on this one - to add some webservers seem NOT to issue redirect so bot won't get a chance to re-check new url (with slash) against robots.txt and thus unintentionally "violate" robots.txt. I had a few of these and ended up removing end slashes from robots.txt's disallow directivies to ensure that my bot won't crawl urls that webmaster clearly wanted not to be crawled even though technically it would have been webmaster's fault.
Not specifying slashes is the wisest way because it catches all possibilities.
| 1:37 am on Nov 2, 2005 (gmt 0)|
Thanks for the very good points.
Fault-tolerant is good!