Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google's 302 Redirect Problem

         

ciml

4:17 pm on Mar 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(Continuing from Google's response to 302 Hijacking [webmasterworld.com] and 302 Redirects continues to be an issue [webmasterworld.com])

Sometimes, an HTTP status 302 redirect or an HTML META refresh causes Google to replace the redirect's destination URL with the redirect URL. The word "hijack" is commonly used to describe this problem, but redirects and refreshes are often implemented for click counting, and in some cases lead to a webmaster "hijacking" his or her own URLs.

Normally in these cases, a search for cache:[destination URL] in Google shows "This is G o o g l e's cache of [redirect URL]" and oftentimes site:[destination domain] lists the redirect URL as one of the pages in the domain.

Also link:[redirect URL] will show links to the destination URL, but this can happen for reasons other than "hijacking".

Searching Google for the destination URL will show the title and description from the destination URL, but the title will normally link to the redirect URL.

There has been much discussion on the topic, as can be seen from the links below.

How to Remove Hijacker Page Using Google Removal Tool [webmasterworld.com]
Google's response to 302 Hijacking [webmasterworld.com]
302 Redirects continues to be an issue [webmasterworld.com]
Hijackers & 302 Redirects [webmasterworld.com]
Solutions to 302 Hijacking [webmasterworld.com]
302 Redirects to/from Alexa? [webmasterworld.com]
The Redirect Problem - What Have You Tried? [webmasterworld.com]
I've been hijacked, what to do now? [webmasterworld.com]
The meta refresh bug and the URL removal tool [webmasterworld.com]
Dealing with hijacked sites [webmasterworld.com]
Are these two "bugs" related? [webmasterworld.com]
site:www.example.com Brings Up Other Domains [webmasterworld.com]
Incorrect URLs and Mirror URLs [webmasterworld.com]
302's - Page Jacking Revisited [webmasterworld.com]
Dupe content checker - 302's - Page Jacking - Meta Refreshes [webmasterworld.com]
Can site with a meta refresh hurt our ranking? [webmasterworld.com]
Google's response to: Redirected URL [webmasterworld.com]
Is there a new filter? [webmasterworld.com]
What about those redirects, copies and mirrors? [webmasterworld.com]
PR 7 - 0 and Address Nightmare [webmasterworld.com]
Meta Refresh leads to ... Replacement of the target URL! [webmasterworld.com]
302 redirects showing ultimate domain [webmasterworld.com]
Strange result in allinurl [webmasterworld.com]
Domain name mixup [webmasterworld.com]
Using redirects [webmasterworld.com]
redesigns, redirects, & google -- oh my [webmasterworld.com]
Not sure but I think it is Page Jacking [webmasterworld.com]
Duplicate content - a google bug? [webmasterworld.com]
How to nuke your opposition on Google? [webmasterworld.com] (January 2002 - when Google's treatment of redirects and META refreshes were worse than they are now)

Hijacked website [webmasterworld.com]
Serious help needed: Is there a rewrite solution to 302 hijackings? [webmasterworld.com]
How do you stop meta refresh hijackers? [webmasterworld.com]
Page hijacking: Beta can't handle simple redirects [webmasterworld.com] (MSN)

302 Hijacking solution [webmasterworld.com] (Supporters' Forum)
Location: versus hijacking [webmasterworld.com] (Supporters' Forum)
A way to end PageJacking? [webmasterworld.com] (Supporters' Forum)
Just got google-jacked [webmasterworld.com] (Supporters' Forum)
Our company Lisiting is being redirected [webmasterworld.com]

This thread is for further discussion of problems due to Google's 'canonicalisation' of URLs, when faced with HTTP redirects and HTML META refreshes. Note that each new idea for Google or webmasters to solve or help with this problem should be posted once to the Google 302 Redirect Ideas [webmasterworld.com] thread.

<Extra links added from the excellent post by Claus [webmasterworld.com]. Extra link added thanks to crobb305.>

[edited by: ciml at 11:45 am (utc) on Mar. 28, 2005]

esllou

8:55 pm on May 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I used meta tag to get rid of three 302's and the tag was on my page for a total of ten seconds.

so something is not correct in the previous posts....

g1smd

10:25 pm on May 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> It would be interesting to find out how many of those 8 quadrillion pages are unique, and how many exist only in the mind of G. <<

For the ODP they have stretched 650 000 categories, 650 000 category charters, 70 000 profiles, and 2 000 informational pages into more than 11 000 000 listings. Where did the additional 9 600 000 entries come from?

The site: command is now truly broken by Google trying to filter 302 redirects out of the results (rather than removing them from the database). You cannot get to see 1000 results for any search term, even those reporting millions of matches.

.

>> Since one of the heuristics to pick a canonical site was to take PageRank into account.. <<

Yes, but they should be comparing PR of real pages, not the PR of the entry point of a redirect, that entry point being just a URL. The redirect-start-URL is not a real page.

.

>> The problem is not consistent. The only consistent thing is Google calls links "pages". As long as they do that, problems of many kinds will occur. <<

Yes, you can also link to a page and add whatever dynamic strings you want and totaly rename the target page in the SERPs if the linking page has enough PR: www.yoursite.com/shiny.widgets.html?this-product-is-junk-do-not-buy-it and it works; and that is scary. Google doesn't ask the target server what the page is called, it lists the page as having whatever name was on the link that it followed to get to it.

I did that to a page on a site that had information that was four years out of date on it. The webmaster refused to admit that printing very old contact information, where nearly every telephone number and email address in it had an error, was wasting people's time. I replaced the URL in the SERPs with www.domain.com/contact.list.html?this-page-is-four-years-out-of-date and linked to it from two PR 6 pages, and within a week the URL was changed in the SERPs. After a further 6 months, the site owner eventually updated the page information with what had been emailed to him every 3 months for the last 3 years.

steveb

1:07 am on May 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Where did the additional 9 600 000 entries come from?"

For starters, addurl, updateurl, applytoedit, reportabuse, editcat "pages".

They also now seem to be calling the lowercase versions "pages".

Reid

3:57 am on May 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



walkman:
Reid,
didn't GoogleGuy say not to try this ;)?

msg#163 by googleguy

steveb:NOTE: Do not submit your own site to our url removal tool in attempt to force a canonical url. I repeat, do not submit your own site to our url removal tool. Using the url removal tool was some idea that a WebmasterWorld member came up with and started talking about. I just talked with user support about a reinclusion request, and using the url removal tool on your own site will *not* help. All it will do is remove your site for six months.

and then after he understood what we were doing with the removal tool and some guys screwed up and removed their own sites msg232

Very few people used the url removal tool to take out their own sites, so I can try to gather some people into one group and ask someone if we can do anything on our end.
For the person who asked about the url removal tool: its removal for six months, not 90 days. I understand how someone thought it might help to try the url removal tool, but please don't use it on one's own site. arubicus, did you say you saw weird behavior with www vs. non-www or trailing slashes vs. without?

steveb:

"If you remove the META tag, 404 or alter your robots.txt before removalbot visits, you will get 'request denied'."
Definitely not true of the META tag. You can (and should) remove it immediately... so the tag would only be on the page for five seconds or so.

yeah I beleive you are right about that , option 2 and 3 (META or 404) you get instant results but option 1 (robots.txt) you gotta wait for the bot.
Thats why I like option 1, because it tells you what it is going to do (so you still have a chance to change robots.txt if you want)
failsafe robots.txt: (to cause all your removal requests to be denied)
user-agent: *
disallow:
the other options 2 and 3 merely tell you what you've done already.

shurik

BTW, did anyone tried Disallow: /?
May be I got delisted for doing it?

yes if you put
disallow: /
in your robots.txt file and submitted the URL of your robots.txt file into option 1 of the removal tool, you have sucessfully removed your entire site from google for 6 months.
What did googleguy say to do? submit a reinclusion request explaining how you accidentally removed your site and put attn: googleguy on it.

claus

7:46 am on May 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> dmoz, lowercase

Funny, i didn't notice this before. Sreveb you're right, URLs like that are an open invitation for duplicate "page" creation.

I'm sure you can multiply the number of real dmoz pages with at least two due to different spelling of the URLs in links. Yet another case (pun not intended) where an URL does not equal a page.

joeduck

8:32 am on May 4, 2005 (gmt 0)

10+ Year Member



Shurik -

If you put the line you indicated (disallow: / ) in your robots.txt you were probably deleted from the indexes. It's telling the bots "do not index me" and if used with robots exclusion tool at Google it removes from Google index all pages of your site in less than 48 hours.

Remove that line from your robots.txt!

Then submit a reinclusion request via google.com/support/

zeus

6:27 pm on May 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was also hit by all this hijacking, but my question/interest is why is it that we sometime see a hit in the logs from our old main keyword ranking on google as before the hijacking, but it only shows up once, then nothing, but is it another server/filter or what is it, I dont think is flux, because then we would see it more everyday.

Everytime it happens it gets my hopes up, but after a few min. I remeber ohh I have seen this before.

Shurik

10:15 pm on May 4, 2005 (gmt 0)

10+ Year Member



joeduck, I didn't put "Disallow: /" in my robots.txt
I used "Disallow: /?" to remove dup pages of my index page that looked like www.mysite.com/?a=1
And i have submitted 10 reinclusion requests by now.

walkman

10:19 pm on May 4, 2005 (gmt 0)



"And i have submitted 10 reinclusion requests by now. "

when was that Shurik? did you make sure the site was clean (by G standards)?

Shurik

10:40 pm on May 4, 2005 (gmt 0)

10+ Year Member



walkman, i was sending re-inclusion req. since mid January, like every 2 weeks. I even received 2 replies from google reassuring me that the site was not penalized and my disappearance may be due to "...natural index fluctuations". I have newer seen any standards from google – only recommendations. From my perspective the site was always clean.
This 467 message thread spans 47 pages: 467