Forum Moderators: Robert Charlton & goodroi
I used the email address as provided by GoogleGuy and included ‘canonicalpage’ in the subject line. Also included in each email were the specifics and details of the URL’s redirecting to my sites as GoogleGuy suggested.
Here are the Reply’s from Google:
Reply 1:
“Thank you for bringing this to our attention. We understand your concern about the inclusion and ranking of your site. Please note that there is almost nothing a competitor can do to harm your ranking or have your site removed from our index.”
Reply 2:
“If you are concerned about another site linking to your site, we suggest contacting the webmaster for the site in question.”
The two separate replies contradict themselves. The first reply is an outright denial that your site can be damaged by a competitor. However you could argue that “almost nothing” is a disclaimer.
The second reply from Google suggests that you can be harmed by a competitor and that the onus is on you to get the redirects taken down by the hijackers.
Unfortunately in my case, the Romanian and Russian sites intentionally using 302 redirects, using my title and description in their URL’s, and a cached version of my page, don’t answer their email, neither do their hosting companies.
If only 30 examples were submitted, then why should they really believe the problem is widespread?
I find that one nearly impossible to believe. I damn sure sent one. I know somebody else who did too, not at my urging. I find it hard to believe there a 50 page thread at WW, it's discussed in every other forum also, literally hundreds of reports within the SEO community and only 30 people sent an email to the special and secret email address that GoogleGuy set up for us.
Here's my impression of GoogleGuy's mail filter
[root@google]# more /etc/procmailrc
* ^Subject:.*canonicalpage*
/dev/null
surely there are at least 10 readers here who understand the problem and are qualified to write code of good quality.
so how come google does not just *302* some volunteers from this forum to work on this problem and this problem only. hey, 1000 treasury shares is as good as cash :)
the qualification process from the inside would be as simple as reading through every single thread here looking for people who seem to be on the right track from the outside. imagine what they could do with the source code of the relevant classification system in front of them.
hey you, yes you! a phd is not a prerequisite for exercising common sense and writing good code.
++
hey you, yes you! a phd is not a prerequisite for exercising common sense and writing good code.
No arguments there. Not only have I worked with a bunch of them I've even managed a few of them and they are horrible on meeting their deadlines because they over think and over analyze everything.
Not that they are all bad, a couple I've known were truly genius, the rest were just book smart.
When other site's URLs appear in Google SERPs via a 302 redirect and contain my copyrighted information in their cache, have I been hijacked?
If clicking on the link in Google takes the user to your site, then your site has not been hijacked.
There are some people whose site have genuinely been hijacked. Unfortunately, a lot of people who don't understand the issues are seeing the "302 incorrect snippet problem" affecting their site and saying their site's been hijacked when it hasn't.
When Google indexs a porn site an includes my domain name in the snippet, should I be the least little bit concerned?
I wouldn't be. If you don't like it then Google offer a URL removal service.
[edited by: mrMister at 10:47 am (utc) on Mar. 24, 2005]
mrMister, don't take this personal, but both statements in msg #41 are totally wrong. They're not even close to be accurate. Perhaps you have your own particular reasons not to be worried about the last one, but that is your personal reasons and those are not valid for the rest of the webmaster community.
>> Slashdot thread
Did anybody besides me notice this:
one engineer proposed a way that might help these sites, and he's got a testset of sites that would be affected by changes in how we canonicalized urls. A few of us have been looking through it to see if we can improve things, but please know that this is not a wildfire issue that will result in the web melting down.
and this:
We are collecting data to user support to build up a testset for checking any changes we want to try.
So, they're working on it :-)
>> If clicking on the links takes the user to your site, then it's not a hijack.
>> I wouldn't be. If you don't like it then Google offer a URL removal service.
mrMister, don't take this personal, but both statements in msg #41 are totally wrong. They're not even close to be accurate. Perhaps you have your own particular reasons not to be worried about the last one, but that is your personal reasons and those are not valid for the rest of the webmaster community.
I've re-read those statements and they are both correct. I'll re-state them in case there's any confusion.
A hijack only occurs when a user goes to a different site than the one named in the title of the Google SERPS. An incorrect snippet is not a hijacking. They are two seperate (albeit related) issues.
I don't understand why you think those statements are wrong. Could you please explain why they are wrong?
You can request your URL to be removed with the Google URL removal tool if you are concerned about the incorrect snippet.
If URLa did a 302 redirect to URLb, then this is a temporary redirect. URLa is saying that the content temporarily resides at URLb. There is no reason to include URLa in the search results though. Google could quite easily include URLb in the results with its associated content being cached and indexed.
...and even passing the PR of URLa to URLb.
Exactly my thoughts..
But here is what GG says about it:
"
Here's the skinny on "302 hijacking" from my point of view, and why you pretty much only hear about it on search engine optimizer sites and webmaster forums. When you see two copies of a url or site (or you see redirects from one site to another), you have to choose a canonical url. There are lots of ways to make that choice, but it often boils down to wanting to choose the url with the most reputation. PageRank is a pretty good proxy for reputation, and incorporating PageRank into the decision for the canonical url helps to choose the right url.
So they blindly follow their PR reasoning even against any logic.
As many pointed out, the redirected site (the one doing 302 to another place) should be out of serps.
Period.
You can request your URL to be removed with the Google URL removal tool if you are concerned about the incorrect snippet.
Some web sites have thousands of pages that have been scraped.
It's not anyone's job to police and correct Google's SERP's.
Google needs to get their (publicly traded) s**t together and fix this problem internally.
"Read the post properly next time."
He quoted you correctly, you re-quoted yourselfwrong.
LOL. fair point, I mis-read his post! I apologise to Buckworks for that.
He's still talking twaddle though
That episode cost me several thousand visitors a day, for weeks. It was only fixed because a friend far more technically skilled than I wrote some code for my .htaccess file that would feed Googlebot a 404 if it followed the link from the redirecting site. Rankings and traffic returned to normal after Google picked that up.
The funny thing with him is that a bit of "snake oil" cured his problem.
the .htaccess could not have solved his problem because Google does not send a HTTP REFERER header.
His solution is technically incapable of solving the hijacking problem. Therefore, it sounds to me as if something else was the cause of his loss of traffic.
Either that or the technical guy's code worked as a placebo!
If his technical guy tells him the emporer is wearing new clothes, who am I to disagree!
Yes of course, no problem. Basically, what you are referring to is a hijack, no doubt about that. This is the extreme case in which the clickstream of the searcher is getting hijacked (perhaps we could call this a "user hijack"). But, where you are wrong is in saying that this is the only thing that is a hijack.
It's sort of like this: If i borrow you money, that money is no longer in my pocket. You can choose to pay me back, or you can choose not to. No matter what you will choose later on; from the moment i have borrowed you the money, that money is no longer in my pocket. In this example, "hijack" refers to "my money being out of my pocket" - it does not refer to what you intend to use the money for, or if you will ever pay me back.
Now, if we go back to the SERPs, it's not that the snippets are wrong. The snippets are the right ones, and so is the page size, the headline, the SERP position, and the Google cache. The only thing that is wrong is the URL used for the individual result.
It's a multi-step process where the thing you define as a hijack is only one of several possible outcomes, sort of like this:
While step four is optional, the other steps are not. Although it is optional, it does indeed happen, and i do agree that this is the worst case.
It is not the only case, as hijacking (as defined by "hijacking the URL of another web page in the SERPS") is damaging in the other cases as well. Not all of them will be damaging to the searcher, and not all of them will be damaging to all webmasters, but all are part of this hijacking issue. The hijack is established in step one above, regardless of later outcome.
This whole chain of events can be executed either by using a 302 redirect, a meta refresh with a zero second redirect time, or by using both in combination.
Now, by my definition of hijacking (taking control of another entity), simply having the wrong URL listed is not hijacking.
However, by implementing your step 4 it becomes a hijack.
Lets get this clear...
I do not consider an incorrect URL in Google's listings to be a hijack. If the site user wants that page removed, they can with Google's URL removal tool.
If the page has been hijacked (step 4), then there <b>is</b> cause for concern. However actual hijackings are very rare, and in most cases, people who claim to have had their site hijacked are only suffering from an incorrect URL.
The particular type from step four is still rare. Step five isn't rare anymore. While it's good that step four is rare, both of them s*cks.
Also, in order to acheive step four, deliberate action has to be performed by the hijacker. This is not the case with the other steps, as you can cause harm purely by accident.
>> and in most cases, people who claim to have had their site hijacked are only suffering from an incorrect URL.
This, by itself, will be seen as very wrong and indeed hamful (to the "brand" or "trademark") by some, even in the cases where it has no influence on searcher or website.
I agree that there is some amount of panic at play here, but i do feel that it is justified, as (a) everyone and dog seems to be hit "at random", (b) we (as webmasters) do not know for certain what it takes for these problems to escalate and affect traffic to the whole site, and (c) there is no proven recovery method that works in all cases.
I suspect that it's coincidence.
The human brain is very good at creating relationships between things where there is none.
Just like the guy that used some snake oil to fix his site. Because the increase in traffic happened at the same time as his supposed fix, he assumed that it was a fix.
Feel free to disagree by all means. Webmasters that have experienced this certainly don't. As to the vagueness, i know, and i'm sorry about this (i am still working on it, though). I would be the first to put out the exact details in public if they were known to me - you can only know so much about what happens inside Google if you do not work there.
Depending on number of successful hijacks (or some other measure of "severity" only known to Google) the SERP traffic to the other webmaster dries up and disappears, because all his pages (not just the hijacked one(s)) are now "contaminated" and no longer show up for relevant searches.
mrMister let me explain how this could happen.
I want to rule keyword1.
So I 302 a script injection into the top 300 sites returned in a google search.
When Google gets around to respidering from its database the script points to all combinations of the sites domain name and IP addy.
Now I've enlisted googlebot to do my dirty work.
Google now splits the site creating multiple copies of many pages including the home page of the site.
If any of the sites have so called parked domains they get added to the mix and if perchance there are vaild external links to those parked domains that used to be counted as belonging to the original site the pr of the site gets split.
If the sites PR gets split it effects all downstream sites linked to by the site with the active injected bad script.
In other words it cascades.
Folks get those 301 redirects installed and checked, then clean up the mess and keep watching it.
Many of these businesses had their main pages unintentionally hijacked by directories who exactly as Claus spelled out.
Yes, their page would come up when you click on the rediredct url. But the cache now was for the redirect url. If you went to the original url, in most cases the TB showed no PR and if you checked the cache, Google said the page didnt exsist.
After numerous emails and even calls to a few well-placed Googlers yielded squat, our only solution was to contact the directories that were unknowingly hijacking these pages, spend hours explaining the problem to them, pull all the listings from these directories and remove the 302 url from Google.
Most sites came back within a month. If you have a problem calling this hijacking, please provide a more descriptive term and I'll be the first to adopt it.
I spoke with Mat Cutts at PubCon and gave him concrete examples of this. I got the impression that he was surprised that this was happening to mom and pops. IMO Google earlier adopted the position the majority of the 302 issues were deliberate acts of terrorism by one spammer site on another, and that these sites probably deserved it. The rest was just an acceptable amount of collateral damage.
Now they know that with the rising number of scrapers, (ironically spawned as a result of Adwords) that the problem does indeed affect legit businesses (a label they dont want to attritube to anyone who frequents a forum).
Most sites came back within a month. If you have a problem calling this hijacking, please provide a more descriptive term and I'll be the first to adopt it.
I'd describe it as the Google 302 canonical page bug.
The problem I'm seeing is that a lot of people see this bug with their listing in Google and they hear of the hijacking issue with 302 redirects and assume that their site has been hijacked.
Their site has not been hijacked. Their traffic is not being stolen. They are confusing the issue by calling it hijacking.
There are some genuine hijackings where the traffic is being stolen. Unfortunately people are getting confused by the whole issue. There are nowhere near as many hijacked pages as are being complained of in these forums and others.
These are two different (but releated issues)
The Google 302 hijacking issue is the equivalent of someone barging in to the cockpit of an aeroplane and taking control of its path.
The Google 302 canonical page bug is the equivalent of someone spraypainting the name of their organisation on the side of the aeroplane.
Yes they're both cause for concern.
However, the hijacking is a far bigger concern, and the people who are suffering from just the Google 302 canonical page bug aren't helping matters by declaring that they've been hijacked.
If the site user wants that page removed, they can with Google's URL removal tool.
Google does not allow you to remove someone else's URL. Also, note that it's not the PAGE that you'd want to remove, just the redirecting URL.
MrMister, I'm the first to admit that I'm not strong on the technical side, so it may well be that the problem I experienced went away because of something other than the bit of code I posted. However, no one had any idea what else to try; it was a last-ditch effort. If you're right, that makes the 302 redirect situation even scarier, because it means an average webmaster is truly helpless when it happens to them.
Their traffic is not being stolen.
When it happened to me, my content was attributed to someone else's URL and it ranked significantly lower in relevant searches than when it was properly attributed to my own URL. The loss in traffic amounted to thousands of visitors per day. I call that traffic being stolen.
The Google 302 hijacking issue is the equivalent of someone barging in to the cockpit of an aeroplane and taking control of its path.The Google 302 canonical page bug is the equivalent of someone spraypainting the name of their organisation on the side of the aeroplane.
You think this is just grafitti?
The canonical page bug results in Google booting the spraypainted aeroplane out of their index. No Google indexing, no Google visitors. Its not just grafitti, its akin to grounding the plane. The traffic isnt always hijacked. In some cases the traffic is eliminated.
Not only have I worked with a bunch of them I've even managed a few of them and they are horrible on meeting their deadlines because they over think and over analyze everything.(re: phd's)
and
Did anybody besides me notice this:
one engineer proposed a way that might help these sites, and he's got a testset of sites that would be affected by changes in how we canonicalized urls. A few of us have been looking through it to see if we can improve things, but please know that this is not a wildfire issue that will result in the web melting down.
quiet! rocket scientists at work
but it ain't rocket science fer gawds sake!