Forum Moderators: mack
In all cases, I've been able to locate backlinks for these pages... In one case, it was an old affiliate link to what a client had turned into a test page (something he shouldn't have done, but he figured it was a blocked page, so 'why not?'). On another "blocked" page, there was a link to one of our PPC landing pages.
Because Google will index urls to pages blocked by robots.txt if the links to these pages are exposed, I've been using the meta robots tag instead of robots.txt to block such pages...
<meta name="robots" content="noindex, nofollow">
It works on Google. It seems, though, that MSN might be doing things the opposite way, albeit they indicate that they do observe the "noindex"
MSN Live Search - Site Owner Help [search.live.com]
Use metadata tags to control page indexing and link crawlingYou can allow MSNBot to crawl your website and still restrict access to specific web pages and documents by using the noindex and nofollow meta tags within the page code. The noindex tag allows the web page to be retrieved by MSNBot, but blocks indexing of its content.
What I'm seeing is suggesting that not only is MSN not currently observing the robots noindex meta tag, at least not in a way that's consistent with Google's observance... but also that MSN is continuing to have huge problems making quality discriminations among pages and among inbound links. These pages with these links never should be ranking, let alone appearing in the index.
Has anyone else seen this?
Beyond that... and maybe MSN Dude will step in... how can we get our landing pages out of MSN search?
One thing that promotes clarity in these discussions is to distinguish between robots fetching a page (controlled by robots.txt) and search engines listing (indexing) a page (controlled by the on-page meta-robots tag). A common problem is that if the page is Disallowed in robots.txt, then a robots.txt-compliant robot can't fetch it to see the meta-robots tag on that page. In that case, the result is that the page may be listed in the SE index as URL-only or URL-with-link-text if a link to the page is found.
Jim
...A common problem is that if the page is Disallowed in robots.txt, then a robots.txt-compliant robot can't fetch it to see the meta-robots tag on that page. In that case, the result is that the page may be listed in the SE index as URL-only or URL-with-link-text if a link to the page is found.
Jim - Thanks....
Yes, your latter point is something that needs to be emphasized to webmasters. Using the meta robots tag and then obscuring it with robots.txt is a common problem. In my experience, the subject has led to several debates with webmasters of clients, where I'd wanted to use the meta robots tag and have requested the webmaster to drop the robots.txt. It frightens them.
Unfortunately, the situation with MSN right now is not making matters easier. In the case I cite in my post above, I've specifically "been using the meta robots tag instead of robots.txt to block such pages..."
And what to do if Google does it one way and MSN choose to do it another? The engines do need to get on the same page about this (no pun intended). I know they all talk to each other. This needs to go at the top of their list, and MSN needs to fix its problems immediately.
And again, how in the world did they ever decide that these pages were worthy of ranking?
> how in the world did they ever decide that these pages were worthy of ranking?
Unique content? :)
As to your question of what to do if they do things differently or are broken, either live with it or maybe cloak the pages with a password required for robot user-agents. No intent to deceive -- just keep out, thanks.
Jim
One page in particular is *not* a page with anything at all, it's a redirect URL for an affiliate link through a popular aggregating service...
Marcia - Not sure exactly what you're referring to here. I should have said "url," not "page," with regard to ranking... but are you suggesting that this is akin to the old 302 "hijacking" problem, and that it's a click-tracking page that's ranking?
If I remember correctly how those looked, it's not the same... since the result is clustered with another of our pages, not one of the linking site's pages.
xyz.example.com
The URL under that reads:
xyz.example.com/robots_txt_excluded_subdirectory/jump.php?sid=12345678abcdefg...
It's a long tail search that's got only a few hundred pages returned, but it's so way off it's as though you were searching for imported canned kumquats and when you clicked on that link you arrived at a page selling snowplows.
are you suggesting that this is akin to the old 302 "hijacking" problem, and that it's a click-tracking page that's ranking?
But yes, this is a link that's going through a tracking page on a third party site, and the link that MSN's crawler grabbed and listed is JAVASCRIPT.
[edited by: Marcia at 9:33 am (utc) on Mar. 11, 2007]
xyz.example.com/robots_txt_excluded_subdirectory/jump.php?sid=12345678abcdefg...It's a long tail search that's got only a few hundred pages returned, but it's so way off it's as though you were searching for imported canned kumquats and when you clicked on that link you arrived at a page selling snowplows.
Marcia - Yes, that "jump.php..." does remind me of the good old days of 302 "page-jacking."
That's not what I think I'm seeing at all, except that MSN may be badly behaved.
These results I'm talking about are for searches that in Google return about 2-million and 25-million pages respectively... in MSN return about 100,000 and 1.5 million (interesting difference). Nothing long tail about these. It took a year or two or three to achieve the results we have on Google, with a fair number of genuinely good links. It's unbelievable that on the basis of one tracking string link, or a spidered major search engine ad, MSN would put these in the top 10.
The results are spot on for the site, albeit they're not the pages (or urls) you'd expect to rank, particularly since we've got a meta robots noindex tag on the pages.
The serp listing is our domain as the title line, and then below it simply the urls to that a few linkers happened to use... in this case either an old affiliate link to us...
domain.com/pagename?XYID=F3456q789 on one...
...or the url of one of our pay per click landing pages (with no tracking string), in the other, that someone who found us by an ad used in an article.
So, I'm not seeing it as the same situation, except as an example that MSN has got some cleaning up to do.
> It scares them.
If bandwidth consumption by 'bots is a concern, then it might legitimately do so...
Jim - Talking about the habits of clients wanting to keep their robots.txt. I think it's more the departure from the norm that frightens them. Many are willing to add the robots meta but argue a lot about dropping the robots.txt. It can't be bandwidth... we're generally not talking about that many pages.
As I said, examples like this don't help. It's hard to explain to a marketing manager, particularly if you'd argued with his IT guy to get him to drop the robots.txt.
either an old affiliate link to us...domain.com/pagename?XYID=F3456q789 on one...
I always have seen occasional affiliate URLs crop up at MSN, and they'll rank for the search term too, with whoever the link originates from ending up getting paid commissions for the sales.
All around, MSN really needs to work very HARD on how they handle redirects, both 301 and 302.