|Prominent site links to mine via a 302 redirect and takes over|
Hi guys, I have to restart an older thread of mine in which I can no longer post. I have more data now and want to continue the discussion.
Some time ago (approx. 6 months) I have noticed that one of the most prominent sites in the industry started linking to sites (including mine) via a redirect script that actually returns a 302 header. Upon further investigation, it actually returns 302 TWICE due to what seems like a programming error. From the site's page they link (I assume by mistake) to a non-www version of the redirect script, then their .htaccess rules kick in and 302 it into the www (canonical) version of the redirect script and only then it 302 redirects to my site's home page. I cannot track the timing down because this has been going on for a year or so, and I haven't noticed quick enough, but my site had lost all its PR (PR4 to PR0) and Google's traffic is about 1% (that's right, 1/100th) of what it was a year ago.
In addition, I have just found a rather alarming additional detail: I am linking to another site of mine (site #2) from the homepage of the suffering site (site #1). In Webmaster Tools link page I am now seeing the prominent's site redirect URL as the page that links to site #2, not the site #1 homepage that actually has that link.
This greatly concerns me, I though all 302-related issues have been fixed by Google more than a year ago but I guess some of the issues still remain? In the reply I got when I posted about it first time Robert Charlton was kind enough to suggest that the SERP snippet may be all that's affected but it no longer seems to be the case given this link reporting issue.
I am not saying this is the sole reason for the traffic to drop simply because there may be compounding effect of several issues. But, needless to say, I've been over every reason I could think of and have already fixed everything I could find (like canonical URLs and such)
What would you guys suggest I do considering that chances that the prominent site will change the way they link out upon my request are slim to none?
Since they aren't probably going to change their script just for you, i would write a script that takes any incoming link from their site to it's own page, then put a direct link into yours from that page.
It only takes a few minutes to write a polite note asking them to change or even remove the link, so be sure to do that even if your confidence is low that they'd change anything.
They should be made aware of the concern because if their method of redirecting is truly the cause of your problems it's likely happening to others as well.
Good luck with this.
I'm not sure that's technically possible: for a user with a browser it works itself out after a quick series of redirects.
For Googlebot though: how do I catch that Googlebot is following a link from a certain page? Googlebot does not supply referrer. Have I completely missed the point?
I 100% agree with you but want to get the facts together because, frankly, you can never be sure with Google abou the cause of a trouble.
So, do you think a redirection like that constitutes a problem? Because if that's the case, I can count at least half a dozen less important sites that link to mine this way. Clearly, this has to be VERY wide spread. Also, there is no malicious intent on their part (I have no reason to suspect, anyways). I assume they just want to count clicks out of the site.
My dilemma is: I can say "remove the link" and loose the referred traffic (not terribly great tho but considering lack of Google traffic, I can use that) yet there is no guarantee it will fix any other trouble with the site.
Again, I'm trying to establish if there is a consensus among the users of this reputable forum about the negative effect of 302 redirect so I can bring it to the attention of the site's owner(s).
The 302 redirect is very bad news... unless those redirect script URLs are disallowed by the robots.txt file on the other site.
>> Googlebot does not supply referrer. Have I completely missed the point? <<
Google does not follow links from site to site like a person with a browser would.
Their system spiders a page and adds any new links that are found to a central database.
Their spidering system works through that list retrieving those URLs in turn.
If it's a legit site, and it sounds like it is, contact them by phone or email and explain the issue to them. They shouldn't be doing what they're doing. They should either use a 301 or should have the redirect script banned in the robots.txt file.
It also seems to me that their own canonical URL fix should be sending a 301, not a 302. If you can confirm that this is true (and important), then maybe doing them the favor of telling them this is the grease that will get them to revisit their linking methods.
If that fails, you can always go medieval on them by documenting everything carefully and then sending a C&D notice.
Thanks for the great idea! Yes, of course 302 redirecting for canonical URL correction is setting yourself for a trouble and these guys are doing just that. This is what I'm going to write them about. I have checked what I could and they do not yet appear to suffer from any of the penalties I know of (950, 30) but we all know that the road from a lively site to ruins is very short indeed.
Some additional info about the matter in case someone also suffers from a possibly well-meaning referrer:
#1 double 302 seems to be a part of the problem, as in
non-canonical ->302 to canonical ->302 to your site
#2 their second (canonical, www) 302 redirect page actually returns 302 header AND about 600Kb(!) of HTML which renders as their ENTIRE link library. Certainly looks like a programming error (link redirect script seems to be home-made), I still don't believe there's a foul play. However, I would question if there is any use at all in returning HTML with a redirect header.
#3 the redirect script is NOT excluded using robots.txt
#4 High PR (PR6) may be exaggerating the problem because less important and lower PR sites do not seem to take over
So, it looks like if you have a combination of some or all of these factors in play, 302 takeover can still take place.
1script good point. I didnt think about that.
I really wish someone would come out with a "fix" for this type of redirection. We fight them all the time. Between them and the proxy servers webmasters are spending so much of their time playing goalie than actually working on their site.
I certainly hope no proxy ever get PR6 or higher. The higher rank/trust Google considers the redirecting sites to have, the worse the problem must be for the sites being linked to. In addition, proxies should have a problem with trust factor since they don't have any real content at all but the prominent site I was referring to above does have a ton of original content.
So, yes, technically this looks like what a malicious proxy owner might have done but fortunately there seem to be factors in play that would prevent a proxy (or at least an "ordinary" proxy) from achieving much using that technique.
I also have a link from a page on some Malaysian university website linking to me like that has which some content but looks borderline spammy. Those guys failed to take over. However, on a second though maybe they failed because they had to compete with THIS prominent site which won. Scary thought, man. I would certainly hope it was not so easy.
Hopefully you'll be able to get through to the person in charge of this stuff and hopefully he will be able to understand the issues involved.
A month later I see more sites creeping up to replace my pages. The site I started posting about is still there, higher than others but there is at least one more that's permanently taking place of mine and there is one more that goes in an out.
Here is how it looks like: I'm doing a search for the site's own domain name in quotes: "mysite.com". My own site is nowehere to be found but there are titles on the first page that are my page's titles.
Basically, you see my page's Title and a description snippet but THEIR URL. In one case my pages opens up after clicking on Google SERP but the one that's on the top even opens a page on their own site, meaning they have completely taken over.
I cannot get through to either of the site owners. It could be they've set this up with malicious intent or they simply don't understand implications or don't care or all of the above.
So, the question still remains: is there anything at all I can do from my end to avoid such 302-redirect takeover?
>A month later I see more sites creeping up to replace my pages.
I'll raise my hand and verify that 302 hijacking is still happening in the Google SERPs. And not only by prominent sites; I've seen some with affiliate tracking URLs indexed with the title & description from the hijacked site. And those aren't prominent domains, they're dummy domains (abcdxyz.com)used only for network tracking.
302 hijacks are still alive and kicking.
I hijacked myself with a 302 last year - after buying a domain name for my web store (subdirectory of suppliers domain) I found out I could only do a 302 redirect from that registrar. For quite a while I had BOTH my domain and real storefront ranking on the first page of search results, then the storefront eventually dropped out leaving the 302 domain name that doesn't even have its own website. Works for me either way, it was bonus when I had two spots.
|Works for me either way, it was bonus when I had two spots. |
Lots of people think it's a bonus, but content appearing under more than one url can lead to major problems. It splits your inbound link vote, so you might end up having two results showing up on page 3 rather than one on page 1. Ultimately, Google will rank just one url or the other, and it may not be the url you would choose.
It's also much better to be ranking on a domain you own rather than on a subdomain of someone else's site, where you don't control server access and aren't able to do redirects.
I guess I am still not clear about why is this happening at all. 302 is supposed to be a temporary redirect. So, where is the logic in putting a temporary location in the place of the final and permanent one? Is this strictly a bug or is there any legitimate logic in Google's treatment of 302 redirects?
I've never understood why Google has had such a long-term problem with the cross-domain 302 hijack. The root of it is that they don't apply one ironclad rule: always index the target url of a cross-domain 302. Instead, they make exceptions.
Matt Cutts discussed some reasons for this (over two years ago!) in this blog post [mattcutts.com]. He uses an example of the SF Giants and mlb.com to illustrate why Google sometimes makes an exception. But apparently the logic for deciding those exceptions leans toward stronger sites. Matt suumarized:
|Now you see the trade-offs. Go with the destination 100% of the time and you’ll get some ugly urls (but never any hijacking). On the other hand, if you sometimes return the source url you can show nicer urls (but with the possibility of source pages showing up when they shouldn’t). |
At the time Matt wrote his blog post, 302 hijacking had already bedevilled us for several years. And it is still frustrating today, even if it's not quite so widespread as it once was. There must be a more complex issue involved than I can see.
It seems that the search experience of the average Google end user is more important than the harsh effect on website businesses that end up as casualties. OK, I get that - it's long term thinking. It focuses, above all else, on the core business issue of keeping a maximum number of end users happy -- and website owners are not, exactly, a typical end user.
But after all these years, the 302 hijack problem is also getting to be quite long term. Even if you diversify so you do not depend 100% on Google, the sudden appearance of a 302 hijack can still hurt. I really hoped that Google would nail this one by now.
I say give the SF Giants an ugly url, if that's what it takes. They will deal with it, but the average website owner is often left without much recourse.
Speaking of recourse, you can report bad search results through the "Dissatisfied? Help us improve" link at the bottom of the SERP, and also through a reconsideration request. Even if you don't get a reply, Google tells us that someone always reads the feedback - so give them feedback. Just keep rants (like this one) out of it ;)
Thanks, Ted. I have already reported the search result that gets my title and description but forwards to their site. It's been a few days, so we'll see if they do respond and take any action at all.
As far as Matts explanation about ugly URL, I don't buy that because the one that comes up is much uglier than mine. Mine is a static html, theirs is dynamic with bunch of parameters.
I do believe it indicates that something is wrong with my own site, too (-30 penalty or something like that is on its 17th month and counting :-( ) The penalty made it easier for another site to take over, so it's just another penalty-related problem for me to agonize over.
|I do believe it indicates that something is wrong with my own site, too (-30 penalty or something like that is on its 17th month and counting :-( ) The penalty made it easier for another site to take over, so it's just another penalty-related problem for me to agonize over. |
Yes, I think you are right. For me what you describe is usually only a problem with a new or penalized page - something with very weak ranking power on its own. So it may be more of a symptom of your root problem rather than a cause.
It's the same old drill to fix - get as many natural (or at least natural looking!) links as you can to that page from high ranking sites, preferably in your niche, and clean up anything spammy on your site.
The "weakened site" theory makes a lot of sense - Matt also blogged about that in relation to proxy server hijacks. Restoring trust can be a difficult job, especially if you aren't clear about how the loss of trust came about. But especially in cases like the -30, removal almost takes a manual action. The algo will not remove it automatically once it's in place.
|webmasters are spending so much of their time playing goalie than actually working on their site. |
So true, as a less talented webmaster I am facing challenges i have little confidence in overcoming while content and development stagnates.
>>The "weakened site" theory makes a lot of sense - Matt also blogged about that in relation to proxy server hijacks. Restoring trust can be a difficult job, especially if you aren't clear about how the loss of trust came about.
Was there something specific posted about how to restore trust when sites get killed by proxy hijackers?
>>But especially in cases like the -30, removal almost takes a manual action. The algo will not remove it automatically once it's in place.
So would it take submitting a complete case history with the request, including telling which sites did the hijacking?
|Was there something specific posted about how to restore trust when sites get killed by proxy hijackers? |
Not that I saw, Marcia. Matt just seemed to be saying that a site's ranking positions were not likely to be hijacked unless the site was previously weakened somehow in Google's scoring. As I remember, he was speaking specifically about proxy server urls in that case, not 302s.
|So would it take submitting a complete case history with the request |
In the case of a -30 penalty, yes I think so. But not necessarily with a 302 hijack. A quick summary would probably get better results in that case.