Who guarantees me that disallow keyword isn't ignored
bkausbk
5:49 pm on Apr 20, 2005 (gmt 0)
Who guarantees me that search engines don't ignore "Disallow" keywords but using "Disallowed" paths explicitly to grab as many information as possible?
bkausbk
Lord Majestic
6:10 pm on Apr 20, 2005 (gmt 0)
Who guarantees me
Nobody.
bkausbk
9:52 pm on Apr 20, 2005 (gmt 0)
This means, one should use "Disallow" keyword as less as possible and don't specify files which may contain additional references to disallowed files, don't you also think so? I've idea. I'll create a domain and a web site with 1 html file (no index.html but something that will not be Searched for by spiders etc. usually) which shouldn't be followed. I'll create one entry in robots.txt to explicitly disallow this file. Then I'll create an index.html and I'll submit this link to various search engines. After 1-2 months I'll search for this site in various search engines just to find out which search enigine can be trusted. Yes, nice idea ;)
bkausbk
Lord Majestic
10:09 pm on Apr 20, 2005 (gmt 0)
No - it means that Disallow is pretty much the best you can get, but nobody gives any guarantees: robots.txt is a voluntary convention that ethical search engines follow.
Reid
9:02 am on Apr 24, 2005 (gmt 0)
Disallow is only a tool for controlling goodbots. You are not blocking them you are just telling them what not to crawl. For badbots you need to use other tools like .htaccess if you are on apache.