Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- Yahoo! Slurp Now Supports Wildcards in robots.txt


StupidScript - 7:23 pm on Nov 7, 2006 (gmt 0)


1) What is anyone's guess how Y's spider would behave by default? If it's not Disallowed, and it's not Allowed ... would the spider crawl it? Wouldn't that make Allow pretty meaningless? After all, if it's not Disallowed ...

2) Can anyone explain how/why the example for /*?sessionid would work? Does anyone have filenames that include a query string on their server? What's the point, and why is this instructions a useful addition to robots.txt, which is meant to instruct spiders/bots in where they can and can't crawl?

Thanks. It seems like a lot of noise about some fairly useless proprietary modifications to the standard. (I know GGL and MSN have their own "standards", too, but that doesn't make it less irritating.)

[edited by: StupidScript at 7:27 pm (utc) on Nov. 7, 2006]


Thread source:: http://www.webmasterworld.com/robots_txt/3144662.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com