Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google's 302 Redirect Problem

         

ciml

4:17 pm on Mar 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(Continuing from Google's response to 302 Hijacking [webmasterworld.com] and 302 Redirects continues to be an issue [webmasterworld.com])

Sometimes, an HTTP status 302 redirect or an HTML META refresh causes Google to replace the redirect's destination URL with the redirect URL. The word "hijack" is commonly used to describe this problem, but redirects and refreshes are often implemented for click counting, and in some cases lead to a webmaster "hijacking" his or her own URLs.

Normally in these cases, a search for cache:[destination URL] in Google shows "This is G o o g l e's cache of [redirect URL]" and oftentimes site:[destination domain] lists the redirect URL as one of the pages in the domain.

Also link:[redirect URL] will show links to the destination URL, but this can happen for reasons other than "hijacking".

Searching Google for the destination URL will show the title and description from the destination URL, but the title will normally link to the redirect URL.

There has been much discussion on the topic, as can be seen from the links below.

How to Remove Hijacker Page Using Google Removal Tool [webmasterworld.com]
Google's response to 302 Hijacking [webmasterworld.com]
302 Redirects continues to be an issue [webmasterworld.com]
Hijackers & 302 Redirects [webmasterworld.com]
Solutions to 302 Hijacking [webmasterworld.com]
302 Redirects to/from Alexa? [webmasterworld.com]
The Redirect Problem - What Have You Tried? [webmasterworld.com]
I've been hijacked, what to do now? [webmasterworld.com]
The meta refresh bug and the URL removal tool [webmasterworld.com]
Dealing with hijacked sites [webmasterworld.com]
Are these two "bugs" related? [webmasterworld.com]
site:www.example.com Brings Up Other Domains [webmasterworld.com]
Incorrect URLs and Mirror URLs [webmasterworld.com]
302's - Page Jacking Revisited [webmasterworld.com]
Dupe content checker - 302's - Page Jacking - Meta Refreshes [webmasterworld.com]
Can site with a meta refresh hurt our ranking? [webmasterworld.com]
Google's response to: Redirected URL [webmasterworld.com]
Is there a new filter? [webmasterworld.com]
What about those redirects, copies and mirrors? [webmasterworld.com]
PR 7 - 0 and Address Nightmare [webmasterworld.com]
Meta Refresh leads to ... Replacement of the target URL! [webmasterworld.com]
302 redirects showing ultimate domain [webmasterworld.com]
Strange result in allinurl [webmasterworld.com]
Domain name mixup [webmasterworld.com]
Using redirects [webmasterworld.com]
redesigns, redirects, & google -- oh my [webmasterworld.com]
Not sure but I think it is Page Jacking [webmasterworld.com]
Duplicate content - a google bug? [webmasterworld.com]
How to nuke your opposition on Google? [webmasterworld.com] (January 2002 - when Google's treatment of redirects and META refreshes were worse than they are now)

Hijacked website [webmasterworld.com]
Serious help needed: Is there a rewrite solution to 302 hijackings? [webmasterworld.com]
How do you stop meta refresh hijackers? [webmasterworld.com]
Page hijacking: Beta can't handle simple redirects [webmasterworld.com] (MSN)

302 Hijacking solution [webmasterworld.com] (Supporters' Forum)
Location: versus hijacking [webmasterworld.com] (Supporters' Forum)
A way to end PageJacking? [webmasterworld.com] (Supporters' Forum)
Just got google-jacked [webmasterworld.com] (Supporters' Forum)
Our company Lisiting is being redirected [webmasterworld.com]

This thread is for further discussion of problems due to Google's 'canonicalisation' of URLs, when faced with HTTP redirects and HTML META refreshes. Note that each new idea for Google or webmasters to solve or help with this problem should be posted once to the Google 302 Redirect Ideas [webmasterworld.com] thread.

<Extra links added from the excellent post by Claus [webmasterworld.com]. Extra link added thanks to crobb305.>

[edited by: ciml at 11:45 am (utc) on Mar. 28, 2005]

zeus

7:54 pm on Apr 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



By mistake I removed my site throught the removal tool, I left a / with not text after it, but the next day when I saw that my whole site was gone from google I changed my robots.txt again.

Here the last few days I see some of my site again in the seps, still only 5-10% like before, so now Im not sure if it is just google old serps or my site is realy back again. Today googlebot was by 14 times, I dont think it would do that if the site was removed completly with the removal tool.

Reid

10:37 pm on Apr 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Today googlebot was by 14 times, I dont think it would do that if the site was removed completly with the removal tool.

it sounds like your site is getting crawled again.
you could check your log files and see where it goes.

zeus

11:06 pm on Apr 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe you are right, because if it has the 6 month removetool time, googlebot would only have visit the main page and robots, I think.

I have never checked where the robot is going, Im not even sure how to do that, I use Urchin.

joeduck

11:57 pm on Apr 23, 2005 (gmt 0)

10+ Year Member



Dear Mr. GG -

At your and Google support's suggestion we have used 301 redirects to fix 302 and canonical problems.

However, our pages with the 301s still show up even after the new ones have been spidered - ie the index now shows even more unintentional duplicate content than before (20k+ pages across 3 geographic sections of our site).

Any suggestions how to avoid duplicate penalties here?

We don't want to kill the OLD 301 redirected pages because they have links and PR we want to pass to the new pages.

arubicus

12:19 am on Apr 24, 2005 (gmt 0)

10+ Year Member



joe- I recently requested the google team through the support form to email me so that I may completely report concerns similar to those that you have mentioned about 301/302 redirects. I tried to address to them the concerns that other webmasters have been reporting that we have been experiencing also and showing them examples from our own site. I don't know if they will reply with any explaination but the things I have been seeing have been kind of strange such as: external redirects are indexed under the redirecting url but with the content that contains the url. This itself could create 2000+ pages of duplicate content. There are more examples but I won't go into them. I believe that both 302/301 redirects were both affected by the "fix" that they attempted. Now it just may be that it is fixed and several months of crawling is needed to clean the index out. The fix may just need further tweaking or a complete overhaul on their end. We just don't know.

g1smd

12:52 am on Apr 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> We don't want to kill the OLD 301 redirected pages because they have links and PR we want to pass to the new pages. <<

If your server is correctly set up then if anything attempts to access yourdomain.com it will be redirected to www.yourdomain.com with a HTTP status of "301 Moved". The content at the old location will not actually be accessed at all.

If Google is crawling only the new URLs then it will not have seen the redirect. You need to make a list of all the URLs in the index that should not be there and make a page of links pointing to those URLs. Put that big list on another site. Google will find the list, crawl the URLs, and see the redirects. Allow at least a few weeks for the old URLs to drop out of the index after that.

arubicus

1:18 am on Apr 24, 2005 (gmt 0)

10+ Year Member



joe - "However, our pages with the 301s still show up even after the new ones have been spidered - ie the index now shows even more unintentional duplicate content than before (20k+ pages across 3 geographic sections of our site)."

Take a look at some of the old 301s caches if it shows. Check the date.

From what we have seen is that google has spidered 301's and instead of just showing the url it has been showing full title and descriptions and cached content of the new pages that it redirects to. This cached content is of our newest design in which the 301 has never seen at all. Now under a 301 this shouldn't happen. It is saying page A is now gone and B is the new location. Get rid of A show B.

We have seen a bunch of non www's fully listed (cahed pages) in which they have always been under a 301 redirect. They have ALWAYS returned a 301.

claus

2:31 am on Apr 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> 301

Normally1 a 301 redirect will take at least one full month to propagate to the SERP's. If they have changed the way they handle 301's i do not believe that this time frame is now shorter than before.

So, a 301 takes time. This, of course, does not explain wrong listings for URL's that have always been 301's.

---
1) That is, at least until a few months ago. I haven't done any 301's in a few months.

arubicus

4:58 am on Apr 24, 2005 (gmt 0)

10+ Year Member



These 301 redirects have been up for 2 1/2 - some even 3 years. I believe that the recent updates caused some sort of problem with them both. This does not mean it is not fixed. It may mean a simple crawl can resort this thing out. But if there are penalties from this then getting the penalty lifted ane even a complete crawl would be a mess in it's own.

Reid

5:37 am on Apr 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have never checked where the robot is going, Im not even sure how to do that, I use Urchin.


What you need to do is see if you can download the raw log files from the server.
Pick a date that you know googlebot showed up and download the logfile for that day .
Unzip it and open it in wordpad.

Take a look here for info about how to read it
[webmasterworld.com...]

you can just do a quick scan and see what files googlebot is requesting.
If it is continually requesting the same files you have a problem but if is requesting a lot of different files then you can break open the champaign

This 467 message thread spans 47 pages: 467