Alternative Future

msg:1528515 | 4:33 pm on Feb 19, 2004 (gmt 0) |
Hiya, Is it not Disallow: / / rather than the *? The * might work am open to be corrected on this one! -gs
|
Essel

msg:1528516 | 4:35 pm on Feb 19, 2004 (gmt 0) |
You're correct according to [robotstxt.org...] Thanks. The bit I'm unsure about is if i can do User-Agent: this, that, google, lycos, another Disallow: /
|
Alternative Future

msg:1528517 | 4:37 pm on Feb 19, 2004 (gmt 0) |
Don't think so just checked WW one and some other larger websites and they list each one on an individual basis... edit according to link you gave you could use the * for all known robots i.e. User-agent: * Disallow This would ban all known robots that obey the robots.txt -gs
|
bakedjake

msg:1528518 | 4:43 pm on Feb 19, 2004 (gmt 0) |
AF, According to A Standard for Robot Exclusion [robotstxt.org], you are correct. It should be: User-agent: googlebot Disallow: / User-agent: scooter Disallow: / User-agent: lycos Disallow: /
|
pageoneresults

msg:1528519 | 5:04 pm on Feb 19, 2004 (gmt 0) |
It really should be... User-agent: googlebot Disallow: / User-agent: scooter Disallow: / User-agent: lycos Disallow: /
|
Essel

msg:1528520 | 5:04 pm on Feb 19, 2004 (gmt 0) |
"This would ban all known robots that obey the robots.txt" Is it possible to ban everything except Examplebot? Does this work? Allow: Examplebot
|
bakedjake

msg:1528521 | 5:07 pm on Feb 19, 2004 (gmt 0) |
por: that's what i meant. ;-) It depends if Examplebot honors the allow directive. robots.txt, don't forget, is not access control. It's a voluntary thing that the robots do. Not all spiders read robots.txt, and some spiders accept proprietary parameters in robots.txt.
|
pageoneresults

msg:1528522 | 5:11 pm on Feb 19, 2004 (gmt 0) |
Here's a great topic from jdMorgan in regards to the robots.txt file... Put your robots.txt on a diet [webmasterworld.com]
|
|