Forum Moderators: mack
User-agent: *
Disallow: /*?p=
What would MSN do with this? Would they just try to match www.example.com/*?p= instead and just ignore it then OR would it stop at the * and disallow the entire site? Here is what MSN robots validator tells me when I check:
Line #3: Disallow: /*?p=
Warning: MSNBOT doesn't support wildcard characters.
1. Several pages have this on the page:
<meta NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta NAME="ROBOTS" CONTENT="NOARCHIVE">
Yet are listed when site::mydomain.tld COMMAND IS ISSUED.
2. Robots.txt clearly states:
User-agent: *
Disallow: /page1.html
Disallow: /page2.html
Yet both of those links are on the first page when site::mydomain.tld COMMAND IS ISSUED.
3. Robots.txt clearly states:
User-agent: *
Disallow: /folder1/
Yet, pages in folder are attempted to be crawled and then listed as links but with no snippet and under HTTPS. These pages also contain:
<meta NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta NAME="ROBOTS" CONTENT="NOARCHIVE">
4. A URI that has been removed from our site more than 3 years ago and responds ”410 GONE” for the request and has not been linked from our site is listed listed when site::mydomain.tld COMMAND IS ISSUED.
5. Robots.txt clearly states:
User-agent: *
Disallow: /ShopingCart.html # Disallowed Since November 20th of 2004
Why is ShopingCart.html listed in results?
6. When site::mydomain.tld COMMAND IS ISSUED the results returned says 3010 pages. I don’t have that many pages and NEVER DID.
None of this issues are happening with googlebot or slurp (well slurp has its own freaky moments too, but not this bad – not like this)
This is All as of a month or two Ago.