Forum Moderators: Robert Charlton & goodroi
Sometimes, an HTTP status 302 redirect or an HTML META refresh causes Google to replace the redirect's destination URL with the redirect URL. The word "hijack" is commonly used to describe this problem, but redirects and refreshes are often implemented for click counting, and in some cases lead to a webmaster "hijacking" his or her own URLs.
Normally in these cases, a search for cache:[destination URL] in Google shows "This is G o o g l e's cache of [redirect URL]" and oftentimes site:[destination domain] lists the redirect URL as one of the pages in the domain.
Also link:[redirect URL] will show links to the destination URL, but this can happen for reasons other than "hijacking".
Searching Google for the destination URL will show the title and description from the destination URL, but the title will normally link to the redirect URL.
There has been much discussion on the topic, as can be seen from the links below.
How to Remove Hijacker Page Using Google Removal Tool [webmasterworld.com]
Google's response to 302 Hijacking [webmasterworld.com]
302 Redirects continues to be an issue [webmasterworld.com]
Hijackers & 302 Redirects [webmasterworld.com]
Solutions to 302 Hijacking [webmasterworld.com]
302 Redirects to/from Alexa? [webmasterworld.com]
The Redirect Problem - What Have You Tried? [webmasterworld.com]
I've been hijacked, what to do now? [webmasterworld.com]
The meta refresh bug and the URL removal tool [webmasterworld.com]
Dealing with hijacked sites [webmasterworld.com]
Are these two "bugs" related? [webmasterworld.com]
site:www.example.com Brings Up Other Domains [webmasterworld.com]
Incorrect URLs and Mirror URLs [webmasterworld.com]
302's - Page Jacking Revisited [webmasterworld.com]
Dupe content checker - 302's - Page Jacking - Meta Refreshes [webmasterworld.com]
Can site with a meta refresh hurt our ranking? [webmasterworld.com]
Google's response to: Redirected URL [webmasterworld.com]
Is there a new filter? [webmasterworld.com]
What about those redirects, copies and mirrors? [webmasterworld.com]
PR 7 - 0 and Address Nightmare [webmasterworld.com]
Meta Refresh leads to ... Replacement of the target URL! [webmasterworld.com]
302 redirects showing ultimate domain [webmasterworld.com]
Strange result in allinurl [webmasterworld.com]
Domain name mixup [webmasterworld.com]
Using redirects [webmasterworld.com]
redesigns, redirects, & google -- oh my [webmasterworld.com]
Not sure but I think it is Page Jacking [webmasterworld.com]
Duplicate content - a google bug? [webmasterworld.com]
How to nuke your opposition on Google? [webmasterworld.com] (January 2002 - when Google's treatment of redirects and META refreshes were worse than they are now)
Hijacked website [webmasterworld.com]
Serious help needed: Is there a rewrite solution to 302 hijackings? [webmasterworld.com]
How do you stop meta refresh hijackers? [webmasterworld.com]
Page hijacking: Beta can't handle simple redirects [webmasterworld.com] (MSN)
302 Hijacking solution [webmasterworld.com] (Supporters' Forum)
Location: versus hijacking [webmasterworld.com] (Supporters' Forum)
A way to end PageJacking? [webmasterworld.com] (Supporters' Forum)
Just got google-jacked [webmasterworld.com] (Supporters' Forum)
Our company Lisiting is being redirected [webmasterworld.com]
This thread is for further discussion of problems due to Google's 'canonicalisation' of URLs, when faced with HTTP redirects and HTML META refreshes. Note that each new idea for Google or webmasters to solve or help with this problem should be posted once to the Google 302 Redirect Ideas [webmasterworld.com] thread.
<Extra links added from the excellent post by Claus [webmasterworld.com]. Extra link added thanks to crobb305.>
[edited by: ciml at 11:45 am (utc) on Mar. 28, 2005]
All I have is a bunch of sweat equity in my site and a wife that thinks I'm nuts. Did you ever try to explain this to an outsider?
I started this whole thing because people at work tell me that writing is a strenght of mine. I'm asked to write on all kinds of topics, so why not write for myself.
When I was looking for my first job all those years ago, someone gave me a gift - The Psychology of Winning. A couple of things stuck with me all these years and one had to do with watching Television...
Paraphrasing:
When you watch TV, you are watching entertainers working. They are getting rich because so many people are willing to sit idly by and watch them. When you watch TV, you are watching entertainers working. Why invest in them, when you could invest in yourself?
I try to put all my time to good use, whether it be with my family or working on this project. I'd rather talk to a person than stare at the boob tube. I love to learn and the TV just doesnt do it for me.
Is it okay to use redirects for statistics purposes when the redirect link goes through your cgi-bin AND you block all robots from links to your cgi-bin in your robots.txt file?
Google Removal Tool.
First you need an e-mail address to register - reply to auto-generated response.
when you login
you get an 'options' page.
Please keep in mind that submitting via the automatic URL removal system will cause a temporary, six months, removal of your site from the Google index. You may review the status of submitted requests in the column to the right.
there are 4 options
1."remove pages, subdirectories or images using a robots.txt file"
2. "Remove a single page using META tags"
3. "remove an outdated link"
4. "remove your usenet post from Google Groups"
the first option is the one to clean up the google index from the cgi-bin URL's. It links to a page with a box to type in the URL of your robots.txt file (example provided)
Before you do this it is esential to check your robots.txt file for any errors. whatever is disallowed will be removed 'for six months' so if you
disallow: /
your site will be removed from google for six months
but if you
disallow: /cgi-bin/
all the 302's or "URL only"s from your cgi-bin will be removed for 6 months and never get indexed again if you leave that disallow there.
so it is critical to understand what robots.txt is allowing and disallowing before you submit your robots.txt to google.
after you submit you will be given a 'sucess page' with a link on it 'view options'
this takes you back to the original 'options' page where you will see in the 'grey area' what it will be removing within 48 hours or so (when googlebot visits)
they will show as pending so if you messed up and somethings in there pending that you dont want removed then you can alter your robots.txt before 'removalbot visits'
lets say it finds a bunch of stuff from your cgi-bin that you didn't want removed you could alter your robots.txt file so that cgi-bin is allowed.
then you will get 'request denied' and you didn't remove anything.
I would recommmend running your robots.txt through a validator like the one here at WW.
another option is there is a tool called 'poodle predictor' that is a good diagnostic tool to 'crawl your own site'. It does a good job of mimicking.
one guy had a site that was doing fine in MSN and Yahoo but googlebot would just ask for robots.txt and / and then leave. All that was in the index was a 14 month old cache of his homepage 'under construction'
well poodle predictor showed a '500' for that page because the server wasn't returning a 'last modified'
date. That was the problem with googlebot. So it would be a good idea to use that tool and make sure everything looks ok.
BTW - I did use this method to clean out my cgi-bin and it worked fine - and google still crawls my site.
I did alter the robots.txt file and got 'request denied' and had to re-submit the non-existant 301's that I disallowed in my robots.txt file and got google to remove them. i just didn't wait long enough (I waited 24 hrs) before cleaning out my robots.txt file (from disallowing files that don't exist).
Lets say www.badguy.com/sites/site123 points to my page
www.mysite/mypage.html.
I check his header and sure enough, an unauthorized 302 redirect.
Suppose I ALTER my filename /mypage.html to something else, forcing
temporary 404 errors.
THEN I use the G Remove Tool option #3 to "KILL THE LINK"
Would that effectively nuke the 302 redirect for that one page at least?
It seems a lot safer, just wondering if it would work at all. -Larry
Hi Reid: " 3. "remove an outdated link" Would option #3 work?
I'm frankly scared of using robots.txt for these purposes
option# 2 works if you put the META tag
META name="googlebot" content="noindex,nofollow"
in the page header.
This option is for those who do not (or cannot) have a robots.txt file
option #3 works for pages which no longer exist (must return a 404)
As far as removing 302 redirects pointing at your page from another site. What we were doing is fooling the removal tool by causing the target of the 302 to pass a 404 or using the META tag on the target page and then submitting the other guys URL (the 302 pointing at your page) into the removal tool. So for 302's you are stuck with option 2 or 3 since the removalbot probably wont be seeing your robots.txt file when it follows the 302 to your site. And you can't disallow the URL from another site in your robots.txt (I would like to try this on the removal tool though).
I did use the robots.txt method to remove some old non-existant files that reappeared after the last update.
I was using .htaccess to 301 these URL's to existing pages and they came out of nowhere and appeared in the index.
I just disallowed these non-existant files in robots.txt and submitted it, worked like a charm.
Not sure how it would take this but would like to try it
disallow: ht*p://w*w.badguys302.php
Before submitting robots.txt to google it is critical that you know your robots.txt is flawless and you understand EXACTLY what it does.
It's a lot like updating firmware. Scary but exhilerating.
Option #1 submit your robts.txt URL
Option #2 submit URL of page (with META tag)
Option #3 submit URL of page (404)
after you submit you get 'pending removal' in the grey bar on the options page of the removal tool.
You must leave the page or robots.txt in the state it was in until you get 'complete' status in the grey area of the removal tool 'options' page. Otherwise you will get 'request denied'.
1. Submit
shows in the grey area as 'pending removal'
2. within 5 days removalbot visits robots.txt or page (whatever you submitted) Did mine within 48hrs.
shows in grey area as 'complete'
If you remove the META tag, 404 or alter your robots.txt before removalbot visits, you will get 'request denied'.
The robots.txt method is by far superior because I was able to leave the 301 in my .htaccess file for the non-existant pages and just use robots.txt to remove them.
You can just leave robots.txt there as long as you like but if I want to remove a 302 pointing at my index page i don't want to have the META tag or the 404 condition on it for 5 days waiting for removalbot. (what if REAL googlebot visits?)
That is why if
disallow: ht*tp://w*w.baguysURL.php
works on the removal tool then this would be the far better option.
Of course, you submit www.badsite.com/redir.php?url=www.yoursite.com to Google console. You know perfectly well that you must not send www.yoursite.com or you'll remove your own site.
But what if someone else submitted your site to url console during this time?
BTW, did anyone tried Disallow: /?
May be I got delisted for doing it?