Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- Yahoo! Slurp Now Supports Wildcards in robots.txt


StupidScript - 11:05 pm on Nov 6, 2006 (gmt 0)


Disallow: /*?sessionid

Is that supposed to keep Slurp from crawling based on links they find hard-coded in someone else's site?

AFAIK, this type of dynamic parameter would be appended to the URI during the visit, and not hard-coded into the filename, so I wonder where it would come into play.

I ask because I regularly see various robots coming in from a link where an excited visitor has added a link to my site on their site ... including this type of tracking parameter, and it messes with a couple of things: bot identification and session management.

<edit>A personal note: I don't mind added functionality, but this strikes me as a little political. Why would Yahoo implement their own set of codes? Why not go through the proper channels and get ALLOW and this type of wildcard use into the standard [robotstxt.org]? It's really irritating when companies start to roll out their own personal extensions to any standard. It's almost as if they don't care about the infrastructure, they just want some press. We should expect to see threads in here about "Hey ... bots are crawling my site even though I used ALLOW and wildcards to limit them!"</edit>

[edited by: StupidScript at 11:13 pm (utc) on Nov. 6, 2006]


Thread source:: http://www.webmasterworld.com/robots_txt/3144662.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com