homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 467 message thread spans 16 pages: < < 467 ( 1 ... 5 6 7 8 9 10 11 12 13 14 [15] 16 > >     
Google's 302 Redirect Problem

 4:17 pm on Mar 25, 2005 (gmt 0)

(Continuing from Google's response to 302 Hijacking [webmasterworld.com] and 302 Redirects continues to be an issue [webmasterworld.com])

Sometimes, an HTTP status 302 redirect or an HTML META refresh causes Google to replace the redirect's destination URL with the redirect URL. The word "hijack" is commonly used to describe this problem, but redirects and refreshes are often implemented for click counting, and in some cases lead to a webmaster "hijacking" his or her own URLs.

Normally in these cases, a search for cache:[destination URL] in Google shows "This is G o o g l e's cache of [redirect URL]" and oftentimes site:[destination domain] lists the redirect URL as one of the pages in the domain.

Also link:[redirect URL] will show links to the destination URL, but this can happen for reasons other than "hijacking".

Searching Google for the destination URL will show the title and description from the destination URL, but the title will normally link to the redirect URL.

There has been much discussion on the topic, as can be seen from the links below.

How to Remove Hijacker Page Using Google Removal Tool [webmasterworld.com]
Google's response to 302 Hijacking [webmasterworld.com]
302 Redirects continues to be an issue [webmasterworld.com]
Hijackers & 302 Redirects [webmasterworld.com]
Solutions to 302 Hijacking [webmasterworld.com]
302 Redirects to/from Alexa? [webmasterworld.com]
The Redirect Problem - What Have You Tried? [webmasterworld.com]
I've been hijacked, what to do now? [webmasterworld.com]
The meta refresh bug and the URL removal tool [webmasterworld.com]
Dealing with hijacked sites [webmasterworld.com]
Are these two "bugs" related? [webmasterworld.com]
site:www.example.com Brings Up Other Domains [webmasterworld.com]
Incorrect URLs and Mirror URLs [webmasterworld.com]
302's - Page Jacking Revisited [webmasterworld.com]
Dupe content checker - 302's - Page Jacking - Meta Refreshes [webmasterworld.com]
Can site with a meta refresh hurt our ranking? [webmasterworld.com]
Google's response to: Redirected URL [webmasterworld.com]
Is there a new filter? [webmasterworld.com]
What about those redirects, copies and mirrors? [webmasterworld.com]
PR 7 - 0 and Address Nightmare [webmasterworld.com]
Meta Refresh leads to ... Replacement of the target URL! [webmasterworld.com]
302 redirects showing ultimate domain [webmasterworld.com]
Strange result in allinurl [webmasterworld.com]
Domain name mixup [webmasterworld.com]
Using redirects [webmasterworld.com]
redesigns, redirects, & google -- oh my [webmasterworld.com]
Not sure but I think it is Page Jacking [webmasterworld.com]
Duplicate content - a google bug? [webmasterworld.com]
How to nuke your opposition on Google? [webmasterworld.com] (January 2002 - when Google's treatment of redirects and META refreshes were worse than they are now)

Hijacked website [webmasterworld.com]
Serious help needed: Is there a rewrite solution to 302 hijackings? [webmasterworld.com]
How do you stop meta refresh hijackers? [webmasterworld.com]
Page hijacking: Beta can't handle simple redirects [webmasterworld.com] (MSN)

302 Hijacking solution [webmasterworld.com] (Supporters' Forum)
Location: versus hijacking [webmasterworld.com] (Supporters' Forum)
A way to end PageJacking? [webmasterworld.com] (Supporters' Forum)
Just got google-jacked [webmasterworld.com] (Supporters' Forum)
Our company Lisiting is being redirected [webmasterworld.com]

This thread is for further discussion of problems due to Google's 'canonicalisation' of URLs, when faced with HTTP redirects and HTML META refreshes. Note that each new idea for Google or webmasters to solve or help with this problem should be posted once to the Google 302 Redirect Ideas [webmasterworld.com] thread.

<Extra links added from the excellent post by Claus [webmasterworld.com]. Extra link added thanks to crobb305.>

[edited by: ciml at 11:45 am (utc) on Mar. 28, 2005]



 12:10 am on May 3, 2005 (gmt 0)

I agree, GoogleGuy owes us nothing. Anything he can do is a plus. What I do hope is that Google figures this thing out so that other webmasters don't have to sweat this out. I don't make my living this way. I do feel badly for all those that are suffering because of this - those that really depend on the revenues to feed their families.

All I have is a bunch of sweat equity in my site and a wife that thinks I'm nuts. Did you ever try to explain this to an outsider?

I started this whole thing because people at work tell me that writing is a strenght of mine. I'm asked to write on all kinds of topics, so why not write for myself.

When I was looking for my first job all those years ago, someone gave me a gift - The Psychology of Winning. A couple of things stuck with me all these years and one had to do with watching Television...


When you watch TV, you are watching entertainers working. They are getting rich because so many people are willing to sit idly by and watch them. When you watch TV, you are watching entertainers working. Why invest in them, when you could invest in yourself?

I try to put all my time to good use, whether it be with my family or working on this project. I'd rather talk to a person than stare at the boob tube. I love to learn and the TV just doesnt do it for me.


 1:41 am on May 3, 2005 (gmt 0)

Is it okay to use redirects for statistics purposes when the redirect link goes through your cgi-bin AND you block all robots from links to your cgi-bin in your robots.txt file?

if you block your cgi-bin in robots.txt then it won't get spidered anymore but there may be something from your cgi-bin already crawled (even it doesn't show up in the index).

Google Removal Tool.

First you need an e-mail address to register - reply to auto-generated response.

when you login
you get an 'options' page.

Please keep in mind that submitting via the automatic URL removal system will cause a temporary, six months, removal of your site from the Google index. You may review the status of submitted requests in the column to the right.

On the right of the page there is a grey area showing what URL's you have requested for removal and their status:
pending (can't remember the exact phrase)
request denied

there are 4 options
1."remove pages, subdirectories or images using a robots.txt file"

2. "Remove a single page using META tags"
3. "remove an outdated link"
4. "remove your usenet post from Google Groups"

the first option is the one to clean up the google index from the cgi-bin URL's. It links to a page with a box to type in the URL of your robots.txt file (example provided)

Before you do this it is esential to check your robots.txt file for any errors. whatever is disallowed will be removed 'for six months' so if you
disallow: /
your site will be removed from google for six months
but if you
disallow: /cgi-bin/
all the 302's or "URL only"s from your cgi-bin will be removed for 6 months and never get indexed again if you leave that disallow there.
so it is critical to understand what robots.txt is allowing and disallowing before you submit your robots.txt to google.
after you submit you will be given a 'sucess page' with a link on it 'view options'
this takes you back to the original 'options' page where you will see in the 'grey area' what it will be removing within 48 hours or so (when googlebot visits)
they will show as pending so if you messed up and somethings in there pending that you dont want removed then you can alter your robots.txt before 'removalbot visits'
lets say it finds a bunch of stuff from your cgi-bin that you didn't want removed you could alter your robots.txt file so that cgi-bin is allowed.
then you will get 'request denied' and you didn't remove anything.

I would recommmend running your robots.txt through a validator like the one here at WW.
another option is there is a tool called 'poodle predictor' that is a good diagnostic tool to 'crawl your own site'. It does a good job of mimicking.
one guy had a site that was doing fine in MSN and Yahoo but googlebot would just ask for robots.txt and / and then leave. All that was in the index was a 14 month old cache of his homepage 'under construction'

well poodle predictor showed a '500' for that page because the server wasn't returning a 'last modified'
date. That was the problem with googlebot. So it would be a good idea to use that tool and make sure everything looks ok.
BTW - I did use this method to clean out my cgi-bin and it worked fine - and google still crawls my site.
I did alter the robots.txt file and got 'request denied' and had to re-submit the non-existant 301's that I disallowed in my robots.txt file and got google to remove them. i just didn't wait long enough (I waited 24 hrs) before cleaning out my robots.txt file (from disallowing files that don't exist).


 2:19 am on May 3, 2005 (gmt 0)

Hi Reid: " 3. "remove an outdated link" Would option #3 work?
I'm frankly scared of using robots.txt for these purposes.

Lets say www.badguy.com/sites/site123 points to my page
I check his header and sure enough, an unauthorized 302 redirect.

Suppose I ALTER my filename /mypage.html to something else, forcing
temporary 404 errors.

THEN I use the G Remove Tool option #3 to "KILL THE LINK"
Would that effectively nuke the 302 redirect for that one page at least?
It seems a lot safer, just wondering if it would work at all. -Larry


 11:38 am on May 3, 2005 (gmt 0)

Is it true that the hijacked pages still show in SERPS with allintext or allinanchor for their terms?


 3:24 pm on May 3, 2005 (gmt 0)

Hi Reid: " 3. "remove an outdated link" Would option #3 work?
I'm frankly scared of using robots.txt for these purposes

option# 2 works if you put the META tag
META name="googlebot" content="noindex,nofollow"
in the page header.
This option is for those who do not (or cannot) have a robots.txt file

option #3 works for pages which no longer exist (must return a 404)

As far as removing 302 redirects pointing at your page from another site. What we were doing is fooling the removal tool by causing the target of the 302 to pass a 404 or using the META tag on the target page and then submitting the other guys URL (the 302 pointing at your page) into the removal tool. So for 302's you are stuck with option 2 or 3 since the removalbot probably wont be seeing your robots.txt file when it follows the 302 to your site. And you can't disallow the URL from another site in your robots.txt (I would like to try this on the removal tool though).
I did use the robots.txt method to remove some old non-existant files that reappeared after the last update.
I was using .htaccess to 301 these URL's to existing pages and they came out of nowhere and appeared in the index.
I just disallowed these non-existant files in robots.txt and submitted it, worked like a charm.
Not sure how it would take this but would like to try it
disallow: ht*p://w*w.badguys302.php

Before submitting robots.txt to google it is critical that you know your robots.txt is flawless and you understand EXACTLY what it does.
It's a lot like updating firmware. Scary but exhilerating.


 3:49 pm on May 3, 2005 (gmt 0)

Here is the problem too why some people were unsuccessful at removing 302's

Option #1 submit your robts.txt URL
Option #2 submit URL of page (with META tag)
Option #3 submit URL of page (404)

after you submit you get 'pending removal' in the grey bar on the options page of the removal tool.
You must leave the page or robots.txt in the state it was in until you get 'complete' status in the grey area of the removal tool 'options' page. Otherwise you will get 'request denied'.

1. Submit
shows in the grey area as 'pending removal'
2. within 5 days removalbot visits robots.txt or page (whatever you submitted) Did mine within 48hrs.
shows in grey area as 'complete'

If you remove the META tag, 404 or alter your robots.txt before removalbot visits, you will get 'request denied'.

The robots.txt method is by far superior because I was able to leave the 301 in my .htaccess file for the non-existant pages and just use robots.txt to remove them.
You can just leave robots.txt there as long as you like but if I want to remove a 302 pointing at my index page i don't want to have the META tag or the 404 condition on it for 5 days waiting for removalbot. (what if REAL googlebot visits?)
That is why if
disallow: ht*tp://w*w.baguysURL.php
works on the removal tool then this would be the far better option.


 4:24 pm on May 3, 2005 (gmt 0)

didn't GoogleGuy say not to try this ;)?


 7:00 pm on May 3, 2005 (gmt 0)

I would be more careful with methods you describe. Using removal tool requires to leave your page/site until Googlebot crawls it either returning 404 or placed in robots.txt, or with meta noindex tag.

Of course, you submit www.badsite.com/redir.php?url=www.yoursite.com to Google console. You know perfectly well that you must not send www.yoursite.com or you'll remove your own site.

But what if someone else submitted your site to url console during this time?


 7:23 pm on May 3, 2005 (gmt 0)

walkman, I think no one expected GG to comment on individual cases of site disappearances. However, I think it is reasonable to expect some sort of a follow-up from GG stating that he did all he could but, alas, it is beyond his control.

BTW, did anyone tried Disallow: /?

May be I got delisted for doing it?


 7:50 pm on May 3, 2005 (gmt 0)

"If you remove the META tag, 404 or alter your robots.txt before removalbot visits, you will get 'request denied'."

Definitely not true of the META tag. You can (and should) remove it immediately... so the tag would only be on the page for five seconds or so.


 8:55 pm on May 3, 2005 (gmt 0)

I used meta tag to get rid of three 302's and the tag was on my page for a total of ten seconds.

so something is not correct in the previous posts....


 10:25 pm on May 3, 2005 (gmt 0)

>> It would be interesting to find out how many of those 8 quadrillion pages are unique, and how many exist only in the mind of G. <<

For the ODP they have stretched 650 000 categories, 650 000 category charters, 70 000 profiles, and 2 000 informational pages into more than 11 000 000 listings. Where did the additional 9 600 000 entries come from?

The site: command is now truly broken by Google trying to filter 302 redirects out of the results (rather than removing them from the database). You cannot get to see 1000 results for any search term, even those reporting millions of matches.


>> Since one of the heuristics to pick a canonical site was to take PageRank into account.. <<

Yes, but they should be comparing PR of real pages, not the PR of the entry point of a redirect, that entry point being just a URL. The redirect-start-URL is not a real page.


>> The problem is not consistent. The only consistent thing is Google calls links "pages". As long as they do that, problems of many kinds will occur. <<

Yes, you can also link to a page and add whatever dynamic strings you want and totaly rename the target page in the SERPs if the linking page has enough PR: www.yoursite.com/shiny.widgets.html?this-product-is-junk-do-not-buy-it and it works; and that is scary. Google doesn't ask the target server what the page is called, it lists the page as having whatever name was on the link that it followed to get to it.

I did that to a page on a site that had information that was four years out of date on it. The webmaster refused to admit that printing very old contact information, where nearly every telephone number and email address in it had an error, was wasting people's time. I replaced the URL in the SERPs with www.domain.com/contact.list.html?this-page-is-four-years-out-of-date and linked to it from two PR 6 pages, and within a week the URL was changed in the SERPs. After a further 6 months, the site owner eventually updated the page information with what had been emailed to him every 3 months for the last 3 years.


 1:07 am on May 4, 2005 (gmt 0)

"Where did the additional 9 600 000 entries come from?"

For starters, addurl, updateurl, applytoedit, reportabuse, editcat "pages".

They also now seem to be calling the lowercase versions "pages".


 3:57 am on May 4, 2005 (gmt 0)

didn't GoogleGuy say not to try this ;)?

msg#163 by googleguy

steveb:NOTE: Do not submit your own site to our url removal tool in attempt to force a canonical url. I repeat, do not submit your own site to our url removal tool. Using the url removal tool was some idea that a WebmasterWorld member came up with and started talking about. I just talked with user support about a reinclusion request, and using the url removal tool on your own site will *not* help. All it will do is remove your site for six months.

and then after he understood what we were doing with the removal tool and some guys screwed up and removed their own sites msg232

Very few people used the url removal tool to take out their own sites, so I can try to gather some people into one group and ask someone if we can do anything on our end.
For the person who asked about the url removal tool: its removal for six months, not 90 days. I understand how someone thought it might help to try the url removal tool, but please don't use it on one's own site. arubicus, did you say you saw weird behavior with www vs. non-www or trailing slashes vs. without?

"If you remove the META tag, 404 or alter your robots.txt before removalbot visits, you will get 'request denied'."
Definitely not true of the META tag. You can (and should) remove it immediately... so the tag would only be on the page for five seconds or so.

yeah I beleive you are right about that , option 2 and 3 (META or 404) you get instant results but option 1 (robots.txt) you gotta wait for the bot.
Thats why I like option 1, because it tells you what it is going to do (so you still have a chance to change robots.txt if you want)
failsafe robots.txt: (to cause all your removal requests to be denied)
user-agent: *
the other options 2 and 3 merely tell you what you've done already.

BTW, did anyone tried Disallow: /?
May be I got delisted for doing it?

yes if you put
disallow: /
in your robots.txt file and submitted the URL of your robots.txt file into option 1 of the removal tool, you have sucessfully removed your entire site from google for 6 months.
What did googleguy say to do? submit a reinclusion request explaining how you accidentally removed your site and put attn: googleguy on it.


 7:46 am on May 4, 2005 (gmt 0)

>> dmoz, lowercase

Funny, i didn't notice this before. Sreveb you're right, URLs like that are an open invitation for duplicate "page" creation.

I'm sure you can multiply the number of real dmoz pages with at least two due to different spelling of the URLs in links. Yet another case (pun not intended) where an URL does not equal a page.


 8:32 am on May 4, 2005 (gmt 0)

Shurik -

If you put the line you indicated (disallow: / ) in your robots.txt you were probably deleted from the indexes. It's telling the bots "do not index me" and if used with robots exclusion tool at Google it removes from Google index all pages of your site in less than 48 hours.

Remove that line from your robots.txt!

Then submit a reinclusion request via google.com/support/


 6:27 pm on May 4, 2005 (gmt 0)

I was also hit by all this hijacking, but my question/interest is why is it that we sometime see a hit in the logs from our old main keyword ranking on google as before the hijacking, but it only shows up once, then nothing, but is it another server/filter or what is it, I dont think is flux, because then we would see it more everyday.

Everytime it happens it gets my hopes up, but after a few min. I remeber ohh I have seen this before.


 10:15 pm on May 4, 2005 (gmt 0)

joeduck, I didn't put "Disallow: /" in my robots.txt
I used "Disallow: /?" to remove dup pages of my index page that looked like www.mysite.com/?a=1
And i have submitted 10 reinclusion requests by now.


 10:19 pm on May 4, 2005 (gmt 0)

"And i have submitted 10 reinclusion requests by now. "

when was that Shurik? did you make sure the site was clean (by G standards)?


 10:40 pm on May 4, 2005 (gmt 0)

walkman, i was sending re-inclusion req. since mid January, like every 2 weeks. I even received 2 replies from google reassuring me that the site was not penalized and my disappearance may be due to "...natural index fluctuations". I have newer seen any standards from google only recommendations. From my perspective the site was always clean.


 10:49 pm on May 4, 2005 (gmt 0)

"I even received 2 replies from google reassuring me that the site was not penalized"

are you banned (as in NOT on the index) or just have bad rankings? Two very different things...


 12:00 am on May 5, 2005 (gmt 0)

Walkman -

Regarding Google support telling us "no penalty":

We did NOT lose home page but did appear to lose about 100k indexed pages of about 350k total (though frankly I'm increasingly skeptical about learning much from "site:oursite.com"

Some are back in index now but Google traffic remains at about 5% of pre Feb 2 level. Yahoo traffic fairly stable.

Shurik -

I misread what you meant with that question mark. You had the mark in your robots.txt as in this: "disallow /?"

I don't know how the bot would interpret that instruction. If your site is completely gone it appears it ignored the question mark. Search for "robots exclusion protocol" for details on syntax.


 12:41 am on May 5, 2005 (gmt 0)

walkman, the site was completely gone in 3 days. From 2000 unique and indexed pages to absolutely nothing. Funny thing is that i still receive referrals from google images. And i think the links from my site are still potent since i just build another site and put a link to it from a front page of my de-listed site. Within 3 days new site ranks well for targeted non-competitive terms.

As for "Disallow: /?" - i have read the specs before attempting to do it. Nothing special was mentioned about "?" character. And google's extended robots.txt syntax does not mention any special meaning of "?" either.


 6:53 am on May 5, 2005 (gmt 0)

I think google is now just a piece of dung for black-hats to toy with. 302ed or not (I really don't understand all this) A hobbie site of about 1000+ pages I've updated for years is completely gone practically overnight. Was getting about 2000 google search hits a day. Don't even see it when I type the url.
All I see is hundreds of spam pages from site snippett cutters/stealers listed, with unique sentences and words stolen from my pages. Same thing when I type in last months key-phrases from my log files.
These listings that now show up are not even websites, just coppied junk from hundreds of sites, with redirected stuff or other peoples sentences hidden.
In the last year or so this engine has really become a not so funny joke, with somewhat of a mismanaged monopoly on search traffic. Surely better things are comming.
The old google was actually pretty good, but I think spammers and bloggers really whacked it and its like a dying dog now.


 4:24 pm on May 5, 2005 (gmt 0)

Yep, link popularity would be a good idea for an ideal world. I think google should stop patching what cannot work in real world and move on to something radically new. Innovate or die!


 6:36 pm on May 5, 2005 (gmt 0)

>> >> "Where did the additional 9 600 000 entries come from?" << <<

>> For starters, addurl, updateurl, applytoedit, reportabuse, editcat "pages". <<

No. Definately not. Those are all disallowed by the robots.txt file and they are not indexed.


 7:27 pm on May 5, 2005 (gmt 0)

what's Google's response time for "am I banned" type questions? Just the yes or no.I remember reading it was about a week or so. Still the same?


 1:19 am on May 6, 2005 (gmt 0)

"No. Definately not. Those are all disallowed by the robots.txt file and they are not indexed."

Actually yes, definitely. You should know by now that *indexed* means nothing to Google. They count URLs.

1.28 million "pages" in the Google index due to the report abuse link alone:


 8:58 am on May 6, 2005 (gmt 0)

>> They count URLs.

Exactly. And robots.txt does not keep them from doing this, only from indexing the page.


 11:10 pm on May 6, 2005 (gmt 0)

I forgot that report abuse is on a separate subdomain. I assumed it was just another cgi.

None of the other pages you mentioned are indexed as far as I am aware. Still another 8 million to go then...


 1:36 am on May 7, 2005 (gmt 0)

add.cgi displays all of eight pages, but 885,000 are claimed

apply.cgi shows seven pages, but 1.38 million are claimed.

update.cgi 341,000 editcat.cgi 100k

Double everything for lowercase and capital letters, add a dash of Supplementals for deleted categories, and you start getting close to how Google's numbers are eight times off what is the reality of real indexed pages.

This 467 message thread spans 16 pages: < < 467 ( 1 ... 5 6 7 8 9 10 11 12 13 14 [15] 16 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved