|Google's 302 Redirect Problem|
(Continuing from Google's response to 302 Hijacking [webmasterworld.com] and 302 Redirects continues to be an issue [webmasterworld.com])
Sometimes, an HTTP status 302 redirect or an HTML META refresh causes Google to replace the redirect's destination URL with the redirect URL. The word "hijack" is commonly used to describe this problem, but redirects and refreshes are often implemented for click counting, and in some cases lead to a webmaster "hijacking" his or her own URLs.
Normally in these cases, a search for cache:[destination URL] in Google shows "This is G o o g l e's cache of [redirect URL]" and oftentimes site:[destination domain] lists the redirect URL as one of the pages in the domain.
Also link:[redirect URL] will show links to the destination URL, but this can happen for reasons other than "hijacking".
Searching Google for the destination URL will show the title and description from the destination URL, but the title will normally link to the redirect URL.
There has been much discussion on the topic, as can be seen from the links below.
How to Remove Hijacker Page Using Google Removal Tool [webmasterworld.com]
Google's response to 302 Hijacking [webmasterworld.com]
302 Redirects continues to be an issue [webmasterworld.com]
Hijackers & 302 Redirects [webmasterworld.com]
Solutions to 302 Hijacking [webmasterworld.com]
302 Redirects to/from Alexa? [webmasterworld.com]
The Redirect Problem - What Have You Tried? [webmasterworld.com]
I've been hijacked, what to do now? [webmasterworld.com]
The meta refresh bug and the URL removal tool [webmasterworld.com]
Dealing with hijacked sites [webmasterworld.com]
Are these two "bugs" related? [webmasterworld.com]
site:www.example.com Brings Up Other Domains [webmasterworld.com]
Incorrect URLs and Mirror URLs [webmasterworld.com]
302's - Page Jacking Revisited [webmasterworld.com]
Dupe content checker - 302's - Page Jacking - Meta Refreshes [webmasterworld.com]
Can site with a meta refresh hurt our ranking? [webmasterworld.com]
Google's response to: Redirected URL [webmasterworld.com]
Is there a new filter? [webmasterworld.com]
What about those redirects, copies and mirrors? [webmasterworld.com]
PR 7 - 0 and Address Nightmare [webmasterworld.com]
Meta Refresh leads to ... Replacement of the target URL! [webmasterworld.com]
302 redirects showing ultimate domain [webmasterworld.com]
Strange result in allinurl [webmasterworld.com]
Domain name mixup [webmasterworld.com]
Using redirects [webmasterworld.com]
redesigns, redirects, & google -- oh my [webmasterworld.com]
Not sure but I think it is Page Jacking [webmasterworld.com]
Duplicate content - a google bug? [webmasterworld.com]
How to nuke your opposition on Google? [webmasterworld.com] (January 2002 - when Google's treatment of redirects and META refreshes were worse than they are now)
Hijacked website [webmasterworld.com]
Serious help needed: Is there a rewrite solution to 302 hijackings? [webmasterworld.com]
How do you stop meta refresh hijackers? [webmasterworld.com]
Page hijacking: Beta can't handle simple redirects [webmasterworld.com] (MSN)
302 Hijacking solution [webmasterworld.com] (Supporters' Forum)
Location: versus hijacking [webmasterworld.com] (Supporters' Forum)
A way to end PageJacking? [webmasterworld.com] (Supporters' Forum)
Just got google-jacked [webmasterworld.com] (Supporters' Forum)
Our company Lisiting is being redirected [webmasterworld.com]
This thread is for further discussion of problems due to Google's 'canonicalisation' of URLs, when faced with HTTP redirects and HTML META refreshes. Note that each new idea for Google or webmasters to solve or help with this problem should be posted once to the Google 302 Redirect Ideas [webmasterworld.com] thread.
<Extra links added from the excellent post by Claus [webmasterworld.com]. Extra link added thanks to crobb305.>
[edited by: ciml at 11:45 am (utc) on Mar. 28, 2005]
GG - Yeah I will send it in here in a little bit.
I'm about to implement exit trackers throughout a site. Can someone confirm what kind will NOT cause problems for the sites I am linking to? I would prefer to use header based redirection as opposed to script or meta redirection.
Hi Googleguy, Thank you very much for not forgetting me.
I have the same problem as "GuinnessGuy" url only shows when searched for www.sitename.com not even for its "unique name". When searched for site:sitename.com it brings 50 results total from previous 12,000 or so out of that 50 half of them are url only and the rest some are listed as site.com/pages and some www.site.com/pages. Please help!
About the email, i had sent an email to webmaster@google soon after that i got a reply to send an email through [google.com...] which i did but i chose "My site disappeared from the search results or dropped in ranking" i guess i was supposed to send it at "Adding my site to Google" which i did now.
I have also mentioned my webmaster world handle in the msg. Was i supposed to mention that in the subject line "reinclusion request - webmaster world handle - illusionist"? Please advise...
vincevincevince, afaik, if you want to send a header out with the redirect and avoid hijacking, a "301 Permanently Moved" is the best option. It's only that the default in PHP is to send a 302 if you only specify "Location: www.example.com" so lots of scripts use that.
One could very well say that a "302 Moved Temporarily" is actually the right code to use, as if you maintain a list of links, and one of those link targets move, then you will correct that link to point to the new place and hence it would not be permanent. For directories this is something that happens so often that it wouldn't be wrong to view every single link as a temporary location. Then again - just like an address IRL, some tend to keep the same ones for a long time.
So, if you just want to avoid 302'ing some sites, use a 301. If you really want to do the right thing, make an assesment of the sites you link out to, and judge if you think their URL's are mostly temporary or mostly permanent (are they likely to change, ever?), then choose 301 or 302 based on that. My personal experience tells me that most URL's change.
If you use a 302 it will probably be best to put the URL to your redirect script in your robots.txt-file - the downside of this is that the search engines will not be able to follow these links, but the upside is that then they will not cause any problems.
I've missed the last few days, but I think it's important to add that crobb305 was rather specific in his description of submitting a URL that links to a 'hijacked' URL. It is often the case that people get the wrong idea, often via an Internet version of 'Chinese Whispers'.
As I wrote in that thread [webmasterworld.com], "The next question, is how quickly the benefit of the backlinks will be applied to the rightful URL". Time will tell. I think a lot of webmasters should take note of GoogleGuy's experience that in many cases a spam penalty had reduced PageRank, leading to a change in the listed URL.
|url canonicalization is definitely on our radar now |
Thank you. I was just trying to offer some help for people like myself who wanted to get a jump on the 302 removal. I hated searching for the site: command and seeing a dozen of those unrelated urls listed. Sure, the hijacked page may have previously seen a decline in PageRank, making it vulnerable to the "hijacking" as Googleguy mentioned. But, I didn't know that at the time. All the signs were there of a true hijacking as I have stated many times before and my efforts to help the situation seemed reasonable and were well intended. Googleguy has always been a big help around here for years so I believe things will get better for all of us who have persisted in developing a good website.
Most people followed the procedure correctly, but I am sure Google would have rather it never been mentioned. At any rate, here we are and I think most will be happy with Google's progress in a couple of months and that this will all be a thing of the past. Fingers crossed :)
Have a good weekend.
|I submitted to support using the button "My site disappeared from the search results or dropped in ranking" i guess i was supposed to send it at "Adding my site to Google" which i did now. |
Does this matter really? Did GG say either way? I imagine it will get to the same place but who knows.
Should nicknames be incl in subject lines?
> But, I didn't know that at the time
Either way, the removal of a 302 URL pointing at you should not be a problem. The danger is that someone ignores your cautions - e.g. by leaving the robots exclusion too long or submitting the wrong URL.
> this will all be a thing of the past. Fingers crossed :)
I'm sure I said something similar in 2002. :-) The good thing is that the issue is getting more intention now.
As some people mentioned in this thread, there is a particular problem with URL Console.
Fact one: www.site.com and site.com are different URLs for Google (as we see in the index).
Fact two: they are the same URL for URL Console.
In November 2004, I did a mistake, and removed www.site.com to avoid duplicate. I used meta-tags, and made them to show noindex only in the version I wanted to remove. I intended to leave site.com, but both are gone since then. Googlebot crawls it at least once a day and follows all new links I place in the index page, but the page itself is still missing in the index.
After 90 days, I e-mailed Google, and received automated answer, that site shall be gone for 90 days. I e-mailed again, explaining that 90 days have already passed (I cited the date from URL Console), and that I actually didn't intend to remove the site, but only a duplicate.
Few weeks after that, a WW member noticed that URL Console now says about six months. Shortly after that, I was removing an outdated domain, using robots.txt, and noticed, that URL Console listed both www and no-www version separately on the list of "pending" requests. I submitted only www-version (the only one that had been indexed from this particular domain), but URL Console added no-www version to "pending" list. I realised it's a new thing, as I used Console many times and it used to show only this version of URL that was submitted. It's much better that we are warned now, however I wish it weren't after submitting the URL.
I see that Google realised the problem, maybe because of my mail, or maybe because someone else's. But this, very honest indeed, informations showed in Console, are not a complete solution. I trust Google they are working for something better.
The ideal solution would require two things:
First: URL Console should either treat www and no-www versions separately, or perhaps Google URL indexer should merge www and no-www at the very start, just as it does with / and /index.html. But both should obey the same rules, to avoid confusion.
Second: URL Console should have one more feature implemented - the ability to cancel the six-months URL absence from index. I can see the list of URLs I have removed during last six months, so it's just a matter of a button at each URL saying "Cancel the removal" or "Reinclude".
I'd really wish to hear GoogleGuy's comment on these two suggestions.
|Googleguy: I'd be curious to hear of any remaining canonicalization issues |
I sent an example in a few days ago. The redirecting site does not use 302s, but uses other methods that cause pages to be hijacked, and is still showing canonicalization issues in the current serps. I did not include a handle. I can post the support ticket number, if that is allowed.
|Googleguy: If you do a reinclusion request, please include your handles so I can try to get someone to find you. |
In addition to the aforementioned example, I also sent in a reinclusion request. I did not include a webmasterworld handle. What happens to requests that don't have handles?
May I make a suggestion?
Some of this is a bit off topic but it needs to be said.
What I see taking place is google sits back and does not come out publicly on topics that are important.
Topics such as how PR is passed or this 302 issue.
When you do this these issues take on a life of their own and the darnedest things start happening.
The passing of PR issue for instance. SEO's claim that direct links will pass PR where 302 links will not. I think Google should clear this up.
I think you have done a good job by adding that piece on being careful with SEO's since there are many bad ones out there.
But you need to go one step more in my opinion.
Clear up the issues of a redirect to www for a website and how it will effect the ranking.
Clear up the issue of a direct link verses a non direct link.
Does having a text entry have the same effect as a link?
Such as "http://mydomain.com" verses <a href="target url
I think the PR ranking number is over rated but many believe it's very important - so much so we are now seeing offers for links on high PR rated sites for over $100 a month.
So since Google is the source of this ranking then it should state what it's relevance is!
And most important put out something on this topic of a 302 issue.
Google is number one and as such the pressure on Google is much higher than any other SE.
I believe you need to come up with some way to communicate important things such as these topics else you will find a crisis will take place that will become a mess.
By not clearing up these issues you allow those undesirable SEO's and others to claim things which are not true.
That in turn fuels even more problems by those following bad advice.
I am seeing the WEB becoming more and more a jungle.
And the major search engines are part of the problem rather than being a solution to it.
How many website which are targeting Google Adsense will it take before you start to clamp down on it for instance? If this is not addressed we will see a search engine war soon and that will have far greater effects on website owners than this fever of a 302 problem.
In my viewpoint google is allowing this spam by not eliminating these targeted websites from being adsense partners.
And it is nothing more than Spam when a site is designed with only one thing in mind - targeting adsense keywords!
And if I was Yahoo or MSN I would be thinking of blacklisting all websites that contain Adsense.
If that happens then where are we?
Today I have noticed that some of my pages have regained their pre-allegra positions on the serps for the same competitive keyphrases. Accordingly the traffic from Google to my site is more targeted today and again I started receiving visitors who like the design of my AdSense spots :-)
I know that this might change already tomorrow or tonight and my pages positions on the serps might decline again. But for now I shall forget the Rotating Algos and enjoy "RESELLERīs DAY" :-)
Have any of you who have been affected by allegra noticed something similar today?
There are some bugs in the URL console and support areas of Google.
Along with the lumping together of non-www and www even when you specifically stated only one of them; try these:
- Removal is quoted as 90 days in some places and 6 months in others;
- After removal, Google sends an email to firstname.lastname@example.org (as well as to the email address you used to sign in to the URL console itself) saying what has been done. In the email to the webmaster it says:
"From : "Google URL Console" <ur1-remove@goog1e,com>
The following urls/messages have been removed. Please contact goog1ebot@goog1e,com if you do not approve:
If you reply to that message, to the address that they indicated to use, then you get this in response:
"Thank you for writing to Google. We'd like to assist you, but we only respond to messages submitted through our online contact form. Please
visit [google.com...] to submit your message, and we'll get back to you soon. We apologize for any inconvenience, and we look forward to hearing from you."
|GoogleGuy mentioned: |
NOTE: Do not submit your own site to our url removal tool in attempt to force a canonical url. I repeat, do not submit your own site to our url removal tool. Using the url removal tool was some idea that a WebmasterWorld member came up with and started talking about.
What I think is that this is a result of an unwanted combination of circumstances, and posts in different threads about different subjects. In several 302 threads, the use of the URL removal tool was discussed to remove hijacks. Because 302 hijacking has not been a big issue for me I haven't followed these threads intensive, so I wasn't fully aware of the discussions taking place there and the important role of the url removal tool to solve the hijacking issue.
At almost the same time I have posted in the supporters forum about a specific www vs. non-www problem of a fellow member--and the possible duplicate content problem involved. In this thread I have mentioned the use of the url removal tool to clean up supplemental results. My intention there was to use the tool as intended, i.e. remove old and obsolete entries from the Google index. In the past I have used the removal tool to remove obsolete urls from deleted pages that were supplemental for quite some time with success. That was the scope of how I wrote that sentence, but may not have been the scope how others have read it.
I didn't mentions tricks like temporary adding "noindex" meta tags to remove just one of the versions, because I wasn't aware that people would combine the advice in the hijack threads together with mine about another subject to try to remove one of two current versions of their website. As most of us, I also didn't know that for the url removal tool the www and non-www urls are identical.
That evening I sent a private e-mail to the fellow member with the www non-www problem that it might be best to first wait until the effect of the 301 was visible before using the removal tool and went to bed to forget the whole issue.
Now that I have seen GG's specific warning not to use the URL removal tool on the own site to remove canonical url problems, I am affraid that some people have combined my advice, in combination with the trick in the hijack threads as a last resort to try to get back in the SERPs again. With disasterous results. The hijack removal trick, used on your own domain in an effort to resolve a www vs. non-www issue, in combination with the Google interpretation that www.domain.com is identical to domain.com has sent websites on a six month journey to Neverland.
If anyone was triggered by my post in the supporters forum about using the url removal tool, and started to experiment with it to remove one of two versions of their site or homepage, I would like to apologize. It was not my intention, and if I would have had the slightest idea that a specific use of the removal tool could harm sites in such a way as is now known, I would certainly have made that clear in my post, or not have posted that advice at all.
The advice about 302 hijacks, as offered, was clear, unambiguous, and well worded. If people did not read it correctly then that is not your fault. GoogleGuy has already stated that only "a small number" of people messed up. For removing the "badsite" it looks like it did exactly what was intended.
You wrote your post in good faith trying to help, and your post should be taken as such.
My friend.. its not your fault that some fellow members ended in removing their own site in a mistake.
Luckily GG might be able to help them and this part of the general problem is solved.
[edited by: reseller at 11:01 pm (utc) on April 21, 2005]
For those of you wondering how we are going to find these redirects and scraper sites now that they are not appearing in our site command:
Besides using the allinurl command, I have noticed with several different sites I have checked GoogleRanking for keyword ranking for a particular site that another site will sometimes come up instead of the one that I had input in the tool. This might be a bug in their tool because it gives out the wrong site. However, this site that is listed always has some kind of redirect to the site in question. I've seen the page caputred in frames with a php redirect, the title of the site in their site, the name of the site used multiple times on the page stealing rank, and other means to steal your content./rank. You might want to check this out for those sites that are ailing.
|You wrote your post in good faith trying to help, and your post should be taken as such. |
|can someone tell me what canonicalization stands for, sorry to be so dumb |
canonical URL is the actual URL of the page. any pages linking to it or redirecting to it are non-canonical.
canonical - where the page actually lives.
canonicalization - would mean 'looking at a mess of redirects and figuring out where the actual content lives.'
canonical link - a full address ht*p://w*w.site.com/page.htm
that is a canonical address - the actual address.
example of a non-canonical url would be a relative url such as ../page.htm which is an address relative to the page the link is on.
I think googleguy is using the term 'canonicalization' in the sense of deciding where the content resides in a re-direct situation. (from a search engine perspective) or where the site resides or which site gets credit for the content.
I posted earlier that I had used the url console to remove some duplicate domain.com versions of my pages via noindex tags, including my home page. Unfortunately, as we now know, it removes the www versions as well.
A few days ago I also sent a message via the 'Removing information from Googles search results' option on the Google contact page explaining what I had done and asking when the www version of the pages would reappear.
I am glad to say that this has resulted in some success as my www home page has been reinstated. (The non-www version is also back with it's cache from May 04.) My status page in the url console now shows 'request denied' against the domain.com entry having shown 'completed' for the last 20 days.
To lammert, and any others, I would like to point that I was not prompted to do what I did by anything anybody else might have said. I have used the console before but I was ignorant of the www non-www connection.
Losing the home page in Google is inconvenient but I don't remember having had any traffic from Google to any of the pages that were removed, whether www or non-www, since December anyway.
I only posted here (first time) because I was concerned that there might be a perception that only WebmasterWorld regulars had suffered thru their use of the url console.
So I am relieved to see that my request thru the published contact process has been acted upon.
Your case is great news. I'll try that right away.
I can only think that Google is the only one to blame for the www and non-www merge in the url console. How could you tell it was the same to the tool and different for the search engine? Was there any indication of that?
It is sad that Google likes to keep everything secret. They don't tell you why you are banned or wether your are banned or not. Their software has bugs (302s, Url console, etc.) and we have to work hard to convince them that that is the case. They think they are too smart to make mistakes, so it has to be our problem.
We follow their guidelines and they don't really care. If our sites vanish they will always find new ones to fill our place.
There was no indication of the www non-www connection.
But I think it's just an unfortunate outcome due to the tool being used, at least in my case, to clean up site listings when it was presumably not intended for that purpose.
More likely it was originally provided for people to quickly suppress content that they did not want shown. In which case suppressing all *.domain.com pages was probably not an issue.
Hopefully it will either be changed or at least have the page text clarified.
You are clarifying lots of queries here..Thanks
I need your help...I have tried almost everything metioned in this section. I have a site which was working very well in google till december 2004.
Suddenly all its rankings vanished and my site's pages converted into blue links (links not having title description) and some with supplemental results...Slowly pages also decreases from google database..
My site was not updated since long, so In march I did some updations on my site like content addition, implement absolute linking etc. On April 12th my site crawled and all my links (except home page) with title & descriptions were back...
But sadly on April 16th all my links again turned into blue links
Now the status is out of 40 pages showing on site:www.site.com
30 are blue links
8 links with cache date 11 & 12 April
2 links showing supplemental results
Hey, I have to be away from a computer today and this weekend. I'll check in when I get back though.
I did not get an auto reply for my site - as I have said it does have content which appears on other sites as it is a shopping review site which may have caused the problem - however the more I look around I am seeing lots of sites effected in the same way. Bluefind today (although perhaps not a good example) for example and other users I have had sticky conversations with.(which are much better examples)
Anyway may be worth a look when you get back.
Have a nice weekend GoogleGuy!
I hope you'll comment my suggestions in msg #249 after you're back.
OK so some guys messed up by trying to remove the non-w*w version of their homepage and ended up removing their entire site.
I have a few old pages appearing in the SERP's that no longer exist. I also have 2 or 3 cgi url's that got in before I blocked my cgi-bin from robots.
Would it be a bad move to remove those URL's now?
I don't want to risk 6 months in solitary but I sure would like those pages gone.
I have a few old pages appearing in the SERP's that no longer exist. I also have 2 or 3 cgi url's that got in before I blocked my cgi-bin from robots.
Would it be a bad move to remove those URL's now?
I don't want to risk 6 months in solitary but I sure would like those pages gone.
I had the same thing -- pages on two different sites that have been extinct for over 2 years on the one site and extinct pages still in the index after redesigning another. So I did the URL removal thing on them but it took a few emails back and forth with google to get it to work and they finally dumped them manually when I asked when they would disappear. No harmful effects. They are no longer appearing in the site command.
A friend of mine has been given the right run-around by Google.
He used the URL console to remove domain.co.uk from the index. The URL does not exist. It cannot be accessed, but there it was in the index as a URL only listing.
Both the domain.co.uk and the www.domain.co.uk listings were removed. He received an automated email that was sent to email@example.com that basically said "we have removed domain.co.uk from the index" and to "write back if that was a problem".
He did not want the www version to be removed. He wrote back to the specified email address, and then received an automatic response stating that the email address was no longer in use and to use the support form instead.
He filled in the support form, and next day received a standard response explaining that crawling and indexing the web was an automated process, and that there was nothing they could do, that he should check the webmaster guidelines, and thanks for using Google. This answer had nothing to do with the original question.
Sending back a complaint got a better response, but still saying that the site would be gone for 90 days (does that now confirm the timeframe against the details on their web site, which says 90 days in one part and six months in another?), and there was nothing they could do to fix that.
You had yours fixed, but another Google Bod is now saying that that can't be done. It obviously can, but not everyone at Google is working to the same plan.
[edited by: g1smd at 11:46 pm (utc) on April 22, 2005]
If you want to get rid of the /cgi-bin stuff from Google's index, simply disallow that folder in your robots.txt file, then submit the URL of the robots.txt file to their URL console. That will fix it. Specify the robots file as domain.com/robots.txt then go back and do it again as www.domain.com/robots.txt too. It will not harm the rest of your site.
Actually, I would recommend that everyone here does that. I checked 10 sites and on 9 of them Google had listed URLs that were disallowed by the robots.txt of the site. I submitted the robots file for each site to the URL console, and the pages are not listed anymore!
|then submit the URL of the robots.txt file to their URL console |
You meant to the URL submission console?
Not the url removal console?