Forum Moderators: goodroi
Now that Yahoo recognizes wildcards I have a few annoying inbound links that I’d like to get removed from being indexed.
The webmaster has added a referrer on to my domain name, as this URL gives a 200 response yahoo has indexed it.
So the question is:
User-agent: Slurp
Disallow: /?$
Disallow: /*.htm$?
Will this stop this:
Example.com/?ref=anothersite
Example.com/index.htm?ref=anothersite
From being indexed in yahoo.
Obviously I don’t want my home page to be disallowed, or the files ending in .htm and I do have some asp pages that I want followed that contain a query.
Just to make it clear,
Example.com/
Example.com/index.htm
Example.com/index.asp?id=120
Should be crawled.
Any advice would be greatly appreciated
Vimes.
Exactly but they do state that a robots file should also be implemented so these url's aren't crawled/indexed again.
Delete URL: Delete URL works independently of the other two options. Use it, and pages will continue to be crawled. However, similar to the meta robots tag using noindex, they won't get indexed.
The other two options being bot file and Meta tags. But they don't specify a length of time and what if they remove my homepage as one of these url's is the homepage and a /?ref=anothersite
to risky for me at the moment with very little background to go on from Tim and Yahoo, i'd prefer to remove it by the more orthodox method of a bot file, as I’ve got control over the bot file, and the tool is actually aimed at website that don’t. Danny's post clears some of the apprehension, this will sound horrible but i'd prefer to read about somebody else's mistake rather than have my website removed, i'm sure you'll understand that one.
Will this be correct for the bot,
User-agent: Slurp
Disallow: /?$
Disallow: /*.htm$?
or should that last line be entered like this
Disallow: /*.htm?$
will it disallow these
Example.com/?ref=anothersite
Example.com/index.htm?ref=anothersite
and allow these
Example.com/
Example.com/index.htm
Example.com/index.asp?id=120
Vimes.
for yahoo slurp the "*" is used to identify the string (start and finish) and the "$" is used to tell slurp that it should be the end of the URL string
User-agent: Slurp
Disallow: /*.htm?* - will tell slurp to skip all urls with .htm? anywhere in its URL string
Disallow: /*?ref=* - will tell slurp to skip all urls with?ref= anywhere in its URL string
Disallow: /*?ref=example.com$ - will tell slurp to skip all urls with?ref=example.com at the END of its URL string