Slurps and wildcards

Forum Moderators: goodroi

Message Too Old, No Replies

Slurps and wildcards

Vimes

3:54 am on Feb 1, 2007 (gmt 0)

Hi,

Now that Yahoo recognizes wildcards I have a few annoying inbound links that I’d like to get removed from being indexed.

The webmaster has added a referrer on to my domain name, as this URL gives a 200 response yahoo has indexed it.

So the question is:

User-agent: Slurp
Disallow: /?$
Disallow: /*.htm$?

Will this stop this:

Example.com/?ref=anothersite
Example.com/index.htm?ref=anothersite

From being indexed in yahoo.

Obviously I don’t want my home page to be disallowed, or the files ending in .htm and I do have some asp pages that I want followed that contain a query.

Just to make it clear,

Example.com/
Example.com/index.htm
Example.com/index.asp?id=120

Should be crawled.

Any advice would be greatly appreciated

Vimes.

goodroi

1:15 pm on Feb 1, 2007 (gmt 0)

Another way to do this is with Yahoo Site Explorer. It now allows you to delete URLs from your site in their index.

Vimes

3:07 am on Feb 2, 2007 (gmt 0)

Hi,

Exactly but they do state that a robots file should also be implemented so these url's aren't crawled/indexed again.

Delete URL: Delete URL works independently of the other two options. Use it, and pages will continue to be crawled. However, similar to the meta robots tag using noindex, they won't get indexed.

The other two options being bot file and Meta tags. But they don't specify a length of time and what if they remove my homepage as one of these url's is the homepage and a /?ref=anothersite
to risky for me at the moment with very little background to go on from Tim and Yahoo, i'd prefer to remove it by the more orthodox method of a bot file, as I’ve got control over the bot file, and the tool is actually aimed at website that don’t. Danny's post clears some of the apprehension, this will sound horrible but i'd prefer to read about somebody else's mistake rather than have my website removed, i'm sure you'll understand that one.

Will this be correct for the bot,
User-agent: Slurp
Disallow: /?$
Disallow: /*.htm$?

or should that last line be entered like this
Disallow: /*.htm?$

will it disallow these

Example.com/?ref=anothersite
Example.com/index.htm?ref=anothersite

and allow these

Example.com/
Example.com/index.htm
Example.com/index.asp?id=120

Vimes.

goodroi

2:23 pm on Feb 2, 2007 (gmt 0)

sorry vimes didnt pay close enough attention

for yahoo slurp the "*" is used to identify the string (start and finish) and the "$" is used to tell slurp that it should be the end of the URL string

User-agent: Slurp
Disallow: /*.htm?* - will tell slurp to skip all urls with .htm? anywhere in its URL string

Disallow: /*?ref=* - will tell slurp to skip all urls with?ref= anywhere in its URL string

Disallow: /*?ref=example.com$ - will tell slurp to skip all urls with?ref=example.com at the END of its URL string

Vimes

1:36 am on Feb 3, 2007 (gmt 0)

Ah Ok,
Thanks for clearing that up Goodroi.

Vimes.