|Live has hijacking problems with 302's and Cold Fusion|
The wrong URLs, page titles and description snippets are being listed in the SERPs. The pages the original links are on, which are what was crawled and should be indexed, are not being indexed or cached. The root URL of the target site is being used as the page title, the text anchor is being used as the snippet, and instead of the actual page being listed and linked to, the URL of the link is listed and is going to the linked_to page instead. You never get to see the page that was crawled.
It's ending up hijacking tons of pages.
MSNdude's stickymail is full, so details can't be given to them, but it's happening with LOADS of pages that are having their listings hijacked - including all the links from a certain affiliate network that uses Cold Fusion technology on IIS.
I can't believe the problem is because it is Coldfusion (or IIS for that metter). It must be the 302s of the hijacker regardless of the platform.
Hey ho... the others have been through 302 hell. I guess it was their turn.
I've seen an increasing number of hijacking too. Doesn't seem to be CF specific though.
MSN has serious issues now with their fake referrals, missing homepages and others.
Oh you have got to be kidding, now we have to worry about MS getting duped by that.
Honestly I think the engines need to treat a 302 direct as a NO INDEX since its a "temporary move". That will fix that.
|It must be the 302s of the hijacker |
LOL.. there isn't any hijacker, it's me! This is happening with links I myself put up on my own pages. Instead of the page it's on being indexed, the URL of my page is gone from the SERPs and the URL of the link inside the <a href= is being indexed and replacing my page.
The only upside is that it's only happening on some, not all, and once it's figured out why some are affected and not others, it's adding a layer of transparency to the algo that can be utilized, or seriously exploited, depending on whose hands it falls into.
[edited by: Marcia at 7:36 am (utc) on Nov. 23, 2007]
Marcia, is this when you use <cfheader with statuscode="302" or <cflocation?
blend27, my pages are just plain old static HTML pages. But they are also indexing the URLs for some banner/ppc ads AND Adwords Adlink URLs - not the target URL, the actual Adwords URL with www.google.com as the page title and with google.com in the URL they're listing.
What they're doing is getting mixed up on however they're handling the anchor text on links when they fetch the URLs.
Too bad MSN doesn't monitor forums to see when there are bugs they need to take care of that are on their end, not the users and not the webmasters.
Just a follow-up: it's still happening (Yahoo also indexing those URLs).
I've had someone quite "tech-savvy" (to say the least) check out those URLs and his verdict is that it's a 302 problem with those URLs showing up in the Live SERPs.
I tried a jump script, but they indexed the jump.php? redirect URL verbatim instead of a web page.
So what's the alternative? Give humans a page with links but give MSN pages without links, just the same text, so they won't index the <a href's instead of indexing web pages like search engines usually do?
|I tried a jump script, but they indexed the jump.php? redirect URL verbatim instead of a web page. |
I had posted a couple months ago in another thread how LIVE has removed my HTML pages from their database and instead is showing a bunch of jump cgi, even though I explicitly disallow any and all indexing of my cgi-bin via my robots.txt file.
So they display exactly what I don't want, and removed what I DO want!
What's amazing to me is that this is Microsoft, not some shoestring budget startup by a couple people in a basement somewhere. Incredible.
|I had posted a couple months ago in another thread how LIVE has removed my HTML pages from their database and instead is showing a bunch of jump cgi, even though I explicitly disallow any and all indexing of my cgi-bin via my robots.txt file. |
I don't recall seeing that, but it's actually worse. Obviously, there's something drastically wrong somewhere in the process, between fetching and storing outlinks off pages and indexing the anchor text and hyperlinks of those verbatim, and not at all dealing correctly with 302's.
|So they display exactly what I don't want, and removed what I DO want! |
Not only that, but how about the user experience? People click on a page when they search to see what's on the page, not some URL they're redirected to from clicking on a search result. Someone could spend a week creating content, and the user will never see it. Maybe there's a CHOICE on the page for the user to make or be helped with, or CONTENT that they'd benefit from reading. Instead they're being hijacked by the search engine, and not being allowed to see what they should see.
[edited by: Marcia at 7:10 am (utc) on Jan. 8, 2008]