Forum Moderators: mack

Message Too Old, No Replies

MSNbot Does Not Support Wildcards

so what will it do when it sees this?

         

WiseWebDude

3:09 pm on Nov 21, 2007 (gmt 0)

10+ Year Member



Say I have this in my robots.txt file for Yahoo and Google because they DO support wildcards:

User-agent: *

Disallow: /*?p=

What would MSN do with this? Would they just try to match www.example.com/*?p= instead and just ignore it then OR would it stop at the * and disallow the entire site? Here is what MSN robots validator tells me when I check:

Line #3: Disallow: /*?p=
Warning: MSNBOT doesn't support wildcard characters.

encyclo

2:52 am on Nov 22, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't risk it. Use separate entries for Google/Yahoo and an entry without wildcards for msnbot.

blend27

3:27 pm on Nov 23, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am totally confused here and this is getting a bit frustrating and here is why:

1. Several pages have this on the page:

<meta NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta NAME="ROBOTS" CONTENT="NOARCHIVE">

Yet are listed when site::mydomain.tld COMMAND IS ISSUED.

2. Robots.txt clearly states:

User-agent: *
Disallow: /page1.html
Disallow: /page2.html

Yet both of those links are on the first page when site::mydomain.tld COMMAND IS ISSUED.

3. Robots.txt clearly states:

User-agent: *
Disallow: /folder1/

Yet, pages in folder are attempted to be crawled and then listed as links but with no snippet and under HTTPS. These pages also contain:

<meta NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta NAME="ROBOTS" CONTENT="NOARCHIVE">

4. A URI that has been removed from our site more than 3 years ago and responds ”410 GONE” for the request and has not been linked from our site is listed listed when site::mydomain.tld COMMAND IS ISSUED.

5. Robots.txt clearly states:

User-agent: *
Disallow: /ShopingCart.html # Disallowed Since November 20th of 2004

Why is ShopingCart.html listed in results?

6. When site::mydomain.tld COMMAND IS ISSUED the results returned says 3010 pages. I don’t have that many pages and NEVER DID.

None of this issues are happening with googlebot or slurp (well slurp has its own freaky moments too, but not this bad – not like this)

This is All as of a month or two Ago.

WiseWebDude

9:36 pm on Nov 28, 2007 (gmt 0)

10+ Year Member



Blend27, yea that would be frustrating. I actually haven't seen ANY disallowed content from our site in MSN, but Yahoo is chock FULL of disallowed content. Also, I went ahead and tested wildcard effects in MSN, Yahoo, and Google. The site was crawled and no problems at all so it looks like MSN doesn't handle wildcards wrong even though they don't "recognize" them (they just ignore those). Good news. I sent them an e-mail to Live Search Webmaster Center requesting that they start recognizing the wildcards though and showed them why it is good for them and us...hopefully they will someday.