Welcome to WebmasterWorld Guest from 188.8.131.52
Sometimes, an HTTP status 302 redirect or an HTML META refresh causes Google to replace the redirect's destination URL with the redirect URL. The word "hijack" is commonly used to describe this problem, but redirects and refreshes are often implemented for click counting, and in some cases lead to a webmaster "hijacking" his or her own URLs.
Normally in these cases, a search for cache:[destination URL] in Google shows "This is G o o g l e's cache of [redirect URL]" and oftentimes site:[destination domain] lists the redirect URL as one of the pages in the domain.
Also link:[redirect URL] will show links to the destination URL, but this can happen for reasons other than "hijacking".
Searching Google for the destination URL will show the title and description from the destination URL, but the title will normally link to the redirect URL.
There has been much discussion on the topic, as can be seen from the links below.
How to Remove Hijacker Page Using Google Removal Tool [webmasterworld.com]
Google's response to 302 Hijacking [webmasterworld.com]
302 Redirects continues to be an issue [webmasterworld.com]
Hijackers & 302 Redirects [webmasterworld.com]
Solutions to 302 Hijacking [webmasterworld.com]
302 Redirects to/from Alexa? [webmasterworld.com]
The Redirect Problem - What Have You Tried? [webmasterworld.com]
I've been hijacked, what to do now? [webmasterworld.com]
The meta refresh bug and the URL removal tool [webmasterworld.com]
Dealing with hijacked sites [webmasterworld.com]
Are these two "bugs" related? [webmasterworld.com]
site:www.example.com Brings Up Other Domains [webmasterworld.com]
Incorrect URLs and Mirror URLs [webmasterworld.com]
302's - Page Jacking Revisited [webmasterworld.com]
Dupe content checker - 302's - Page Jacking - Meta Refreshes [webmasterworld.com]
Can site with a meta refresh hurt our ranking? [webmasterworld.com]
Google's response to: Redirected URL [webmasterworld.com]
Is there a new filter? [webmasterworld.com]
What about those redirects, copies and mirrors? [webmasterworld.com]
PR 7 - 0 and Address Nightmare [webmasterworld.com]
Meta Refresh leads to ... Replacement of the target URL! [webmasterworld.com]
302 redirects showing ultimate domain [webmasterworld.com]
Strange result in allinurl [webmasterworld.com]
Domain name mixup [webmasterworld.com]
Using redirects [webmasterworld.com]
redesigns, redirects, & google -- oh my [webmasterworld.com]
Not sure but I think it is Page Jacking [webmasterworld.com]
Duplicate content - a google bug? [webmasterworld.com]
How to nuke your opposition on Google? [webmasterworld.com] (January 2002 - when Google's treatment of redirects and META refreshes were worse than they are now)
Hijacked website [webmasterworld.com]
Serious help needed: Is there a rewrite solution to 302 hijackings? [webmasterworld.com]
How do you stop meta refresh hijackers? [webmasterworld.com]
Page hijacking: Beta can't handle simple redirects [webmasterworld.com] (MSN)
302 Hijacking solution [webmasterworld.com] (Supporters' Forum)
Location: versus hijacking [webmasterworld.com] (Supporters' Forum)
A way to end PageJacking? [webmasterworld.com] (Supporters' Forum)
Just got google-jacked [webmasterworld.com] (Supporters' Forum)
Our company Lisiting is being redirected [webmasterworld.com]
This thread is for further discussion of problems due to Google's 'canonicalisation' of URLs, when faced with HTTP redirects and HTML META refreshes. Note that each new idea for Google or webmasters to solve or help with this problem should be posted once to the Google 302 Redirect Ideas [webmasterworld.com] thread.
<Extra links added from the excellent post by Claus [webmasterworld.com]. Extra link added thanks to crobb305.>
[edited by: ciml at 11:45 am (utc) on Mar. 28, 2005]
1.http://site.com (no desc)
Is that a sign of duplicate content penalty? I thought so and made the mistake of removing [site.com...] with google removal tool, which also removed the cannonical one.
If you have more sites with these two listings what should you do to fix that?
There is only a few who have done this incorrectly. Other that did it correctly still seen their whole site drop from the index. There are still the many who didn't do a darned thing and dropped from the index and there are many of athority who didn't do a darned thing and dropped from the index also.
So a select few screwed up and removed their own site so what about the rest. Spam filter? Look at the results for goodness sake! Much scraper spam crap and athorities missing.
1.http://site.com (no desc)
Is that a sign of duplicate content penalty?
I would say no. It seems Google knows they are for the same page, if removing one using the removal tool removes both (almost as if Google treates them as the same url, despite toolbar pr differences--just taking one as canonical and devaluing the other to avoid duplicate appearances in the serps). This was not true of the 302 removals as the 302 urls were distinctly different from the canonical, therefore enabling you to remove the 302 without hurting the intended.
My 4 year old website has always had both versions listed, the www version always had the highest pagerank, and G devalued the other. That would be my logic, but I believe some have used 301s to redirect one to the other just in case. It just seems Google has likely advanced beyond penalizing for having a www version and non-www version in the serps.
1. Install a 301 rewrite rule set.
2. Add randomly changing page context and site related piece of content to the pages starting with your homepage.
The first will heal the site and the second will break the duplicate content problem.
Note that number 2 is also theorised (by some folks, not me) to be non functional in the case of a so called 302 hijack.
We are happy with our progress, a large high pr site is apt to have been classified by Google as a spammer so your mileage may depend upon how far along the duplication process is.
You may have to ask Google as noted in msg #116 in this thread by GoogleGuy to reinclude your site.
[edited by: theBear at 7:58 pm (utc) on April 20, 2005]
Let's get this straight. It isn't a "penalty" as such, it is simply that when faced with multiple URLs delivering the exact same content, that they want to only list one of them.
Google chooses one URL to list and drops the others. On the way out you may see some, or all, of the others as URL-only listings for a while.
The wider problem for a site is that page1.html might be associated with domain.com and page2.html might be associated with www.domain.com and so on. This can have consequences for the way that PR is distributed around your site, and you can see such split PR in operation on many such sites.
That is, if domain.com/page1.html links to domain.com/page2.html but for page2.html Google actually lists www.domain.com/page2.html, then that latter page isn't getting any PR from page1.html is it?
You'll see various pages switch allegience from domain.com to www.domain.com, and back, on a random basis, and all sorts of other strange effects.
If one of the versions becomes a Supplemental Result then you could be in bigger trouble. Google does not update the search index or the snippet for those, and your page might start being returned as a result based on old content: for content that is no longer on the real page and no longer in the displayed cache either.
Google used to be able to consolidate listings and merge PR, and used to do this every few months. I haven't seen that happening since at least last Summer.
You can help the situation simply by using a 301 redirect from non-www to www and that will eventually fix the problem.
As for removal, it seems that a request to take out anything with domain.com in it also takes out www.domain.com at the same time. Unrelated to the 302 problem that everyone else here is asking about, I have a friend who uses www on all the URLs of his site. In fact the non-www version cannot even be accessed. However, there was a URL-only listing for thesite.com/ in Google a few weeks ago when doing a site: search. I used the tool to get rid of that rogue result and the www index page disappeared too. It's no big deal as it is only really a splash page (sooo 1998) and the rest of the site is unharmed. I'm still wondering where Google got the non-www result from. The URL cannot even be accessed. There is nothing there.
[edited by: g1smd at 7:48 pm (utc) on April 20, 2005]
Did your site completely disappear from serps? When you searched for your company name, where did it rank in the serps prior to making the changes you mentioned above? I have seen my company name jump from position 75+ to position 4. So, there is still some dup penalty issues, despite my changing all the content several times since the first of the year.
No the site remained visable but its traffic was sinking fast.
We caught it before it totally crashed.
We had from 2 to 5 copies of 750 pages.
Pre 301 insertion, we also had a number of 302 leaches (a problem I consider completely fixable).
Could cause a trickle down effect and bang whole site is gone.
If google is able to associate with and without www, it seems that they may have associated 302's in some way, then if the 302 goes supplemental they could both go or revert to older associations?
The main problem, and no reason to dance away from this, is Google's failure to handle this issue properly, both from a technology standpoint to start with, and then from a deal-with-webmasters one later. Google has adopted a cloak of secrecy about its many technological problems, and in doing so leads to webmasters more or less staggering around in the dark trying to fix problems caused by Google.
Of course Google owes nobody nothing, but that is not the point. The point is Google's database and search results are crippled by the combination of their bad technology (and thinking) combined with their cloak of secrecy.
If Google wants better search results, Google needs to learn from their mistakes. A small bit of evidence has shown up suggesting they have learned some, but more evidence abounds to suggest they have much more things to learn.
<Of course Google owes nobody nothing, but that is not the point.>
You are right. But honest decent publishers who follow Google´s own webmasters guidelines expect at least fairness of Google. And its not fair at all to remove or penalize sites just because somebody has decided to hijack their contents against the will of the owners of these sites.
"a particular page returned in the search results might not be a supplemental result for all search queries that it could be returned for".
This is what was so scary to me since my site is currently returned as a supplemental page for my own name.
Some tour companies make their money guiding people around national parks. If they were to start forest fires, cause accidents, or lose their clients in the woods, they would be held accountable.
Then, there's Google. Overnight, it can ruin years of work of thousands of webmasters without as much as an apology. That might be legal but it ain't ethical. As a company, Google has some growing up to do.
Professional business conduct
To the same degree that Microsoft, the Telephone Company, and many others owe us. I built my business around needing the telephone company, Microsoft Windows and many other services. I give Google more than $200K income per year - I expect better from them.
Do I get calls from Microsoft, the telephone company, and others to help improve my business? YES! And I generate a lot less revenue for them.
I don't want Google to call me. I just don't want them to destroy my business overnight unless I have done evil.
I have to say that I was hesitant since this is like admitting that I did something wrong - which I never did. The site is approaching its 1st birthday in the end of May. Every night I write till I can't stand it any longer. At least one article a day, now 850 pages - over 500,000 words.
I'm starting to see good traffic from Yahoo. MSN has me on page 1 for a 230 million term (which I have to admit is probably an overstatement of my site's importance for this particular word). And my wife thinks I'm nuts when I try to explain why she can't find my website in Google (it's at 213 tonight, showing as supplemental).
I figure if I give up now, it's like admitting defeat. I'm just not a quitter, so... it's back to writing - then maybe a Margarita!
If Google lifts the penalty, I will make one promise... You'll see me at the next Conference (although I will be writing later that evening...)
For the person who asked about the url removal tool: its removal for six months, not 90 days. I understand how someone thought it might help to try the url removal tool, but please don't use it on one's own site. arubicus, did you say you saw weird behavior with www vs. non-www or trailing slashes vs. without? Could you submit something to google.com/support so I can try to get someone to check it out? Use arubicus in the form somewhere. I'm going to be gone Friday and this weekend, but I'd be curious to hear of any remaining canonicalization issues.
Ugh. Very bleary. Going to bed now..
I have sent mine with my handle.
I admit I might have a problem with duplicate content - but trying to add lots of reviews etc.
But the whole site has gone - but the way it has gone I am wondering if it is a canonical url non-www problem.
Okay, it's not a 302 one, but I've been thinking of it as a similar "canonicalization issue"...
Google's database is overflowing with URL listings like:
where there is also a normal listing for
These occur from the trifecta of the unfortunate Google policy of URLs-are-pages combined with the bajillion puke scraper sites that scrape search results, where both Yahoo and MSN display results without the trailing slash
It's my experience that when a page gets a second URL only listing, it drops in the results, which would then end up penalizing pages without a file extension, particularly if they are popular and get scraped often.
These URL only links fade fairly quickly, but still it would be nice to see Google recognize and combine these with the canonical page, rather than seemingly demerit the canonical page.
I believe I may have been confused about the re-inclusion thing. My site does show up doing a site command and it does show up when I do a www.my-site.com with no commands. However, it shows up with no title or description.
I presume then that a re-inclusion will not do any good since the site is in the index. The problem I have is that it is no longer in the ranking which is the real underlying problem. And it is a site which was in the top 5 for a 3 million page plus keyword for years.
And yes, we did see other domains listed using the site command. Now they are gone, but everything is still listed as supplemental.
Do I wait? Do a re-inclusion? Can it hurt to do a re-inclusion?
<Very few people used the url removal tool to take out their own sites, so I can try to gather some people into one group and ask someone if we can do anything on our end.>
Though I´m not among them who took out their sites by mistake, I wish to thank you on their behalf for taking care of this matter.
Very kind of you GG. Much appreciated.
Although this is not about redirection, I used it to remove some duplicate domain.com versions of my pages via noindex tags, including my home page. Unfortunately, as I now know, it removes the www versions as well.
It's been 20 days or so and despite being re-spidered the www pages have not reappeared.
The 'Remove Individual pages' section of the google help page does not have a footer, similar to 'Remove your website', indicating a 90 day period.
Does anybody know whether the 90 days / 6 months applies to individual pages removed using noindex tags?
Hard to define (as I am not an expert)
Following thread may help - read GG posts:-
But basically as I understand it Google finds the main url of the site (which is normally the page with the highest page rank) and perhaps where the site crawl starts.
However, I think, sometimes the wrong url can be picked. (Eg if you have the site on the non-www aswell - or your homepage is something like www.domain.com/home.php?sid=122323 - or all your links point to another page and therefore that page is seen as the most important and hence the canonical url)
Not 100% sure though