homepage Welcome to WebmasterWorld Guest from 54.163.91.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 467 message thread spans 16 pages: 467 ( [1] 2 3 4 5 6 7 8 9 ... 16 > >     
Google's 302 Redirect Problem
ciml




msg:732619
 4:17 pm on Mar 25, 2005 (gmt 0)

(Continuing from Google's response to 302 Hijacking [webmasterworld.com] and 302 Redirects continues to be an issue [webmasterworld.com])

Sometimes, an HTTP status 302 redirect or an HTML META refresh causes Google to replace the redirect's destination URL with the redirect URL. The word "hijack" is commonly used to describe this problem, but redirects and refreshes are often implemented for click counting, and in some cases lead to a webmaster "hijacking" his or her own URLs.

Normally in these cases, a search for cache:[destination URL] in Google shows "This is G o o g l e's cache of [redirect URL]" and oftentimes site:[destination domain] lists the redirect URL as one of the pages in the domain.

Also link:[redirect URL] will show links to the destination URL, but this can happen for reasons other than "hijacking".

Searching Google for the destination URL will show the title and description from the destination URL, but the title will normally link to the redirect URL.

There has been much discussion on the topic, as can be seen from the links below.

How to Remove Hijacker Page Using Google Removal Tool [webmasterworld.com]
Google's response to 302 Hijacking [webmasterworld.com]
302 Redirects continues to be an issue [webmasterworld.com]
Hijackers & 302 Redirects [webmasterworld.com]
Solutions to 302 Hijacking [webmasterworld.com]
302 Redirects to/from Alexa? [webmasterworld.com]
The Redirect Problem - What Have You Tried? [webmasterworld.com]
I've been hijacked, what to do now? [webmasterworld.com]
The meta refresh bug and the URL removal tool [webmasterworld.com]
Dealing with hijacked sites [webmasterworld.com]
Are these two "bugs" related? [webmasterworld.com]
site:www.example.com Brings Up Other Domains [webmasterworld.com]
Incorrect URLs and Mirror URLs [webmasterworld.com]
302's - Page Jacking Revisited [webmasterworld.com]
Dupe content checker - 302's - Page Jacking - Meta Refreshes [webmasterworld.com]
Can site with a meta refresh hurt our ranking? [webmasterworld.com]
Google's response to: Redirected URL [webmasterworld.com]
Is there a new filter? [webmasterworld.com]
What about those redirects, copies and mirrors? [webmasterworld.com]
PR 7 - 0 and Address Nightmare [webmasterworld.com]
Meta Refresh leads to ... Replacement of the target URL! [webmasterworld.com]
302 redirects showing ultimate domain [webmasterworld.com]
Strange result in allinurl [webmasterworld.com]
Domain name mixup [webmasterworld.com]
Using redirects [webmasterworld.com]
redesigns, redirects, & google -- oh my [webmasterworld.com]
Not sure but I think it is Page Jacking [webmasterworld.com]
Duplicate content - a google bug? [webmasterworld.com]
How to nuke your opposition on Google? [webmasterworld.com] (January 2002 - when Google's treatment of redirects and META refreshes were worse than they are now)

Hijacked website [webmasterworld.com]
Serious help needed: Is there a rewrite solution to 302 hijackings? [webmasterworld.com]
How do you stop meta refresh hijackers? [webmasterworld.com]
Page hijacking: Beta can't handle simple redirects [webmasterworld.com] (MSN)

302 Hijacking solution [webmasterworld.com] (Supporters' Forum)
Location: versus hijacking [webmasterworld.com] (Supporters' Forum)
A way to end PageJacking? [webmasterworld.com] (Supporters' Forum)
Just got google-jacked [webmasterworld.com] (Supporters' Forum)
Our company Lisiting is being redirected [webmasterworld.com]

This thread is for further discussion of problems due to Google's 'canonicalisation' of URLs, when faced with HTTP redirects and HTML META refreshes. Note that each new idea for Google or webmasters to solve or help with this problem should be posted once to the Google 302 Redirect Ideas [webmasterworld.com] thread.

<Extra links added from the excellent post by Claus [webmasterworld.com]. Extra link added thanks to crobb305.>

[edited by: ciml at 11:45 am (utc) on Mar. 28, 2005]

 

bose




msg:732620
 5:15 pm on Mar 25, 2005 (gmt 0)

Thanks for posting such a comprehensive list of 302 related threads.

There goes my weekend, Calum. :)

Trawler




msg:732621
 5:02 pm on Mar 25, 2005 (gmt 0)

<Note: in reply to [webmasterworld.com...] >

g1smd

RE: If URLa did a 302 redirect to URLb, then this is a temporary redirect. URLa is saying that the content temporarily resides at URLb. There is no reason to include URLa in the search results though. Google could quite easily include URLb in the results with its associated content being cached and indexed.
___

From my personal observations concerning 302's which I use readilly for my own domains.

That is exactly the way it worked right up to and just after Florida. Link popularity and pagerank was passed to URLb Then came a change in later updates. The change is in the algo and how it determines the "better page". Apparently the algo is looking for the higher authority (via backlinks /pagerank / whatever) If URLa has more backlinks /pagerank/ whatever / than URLb, then the content is indexed under URLa. The funny thing is using this flawed reasoning results in bizarre search results.

As this began to take place, many black hat webmasters figured out how to beat google using the 302. That is why it now is getting out of hand and will continue to get worse.

They either fix it or the black hats will fill the top 50 in just about any search area they go after.

For google to backtrack on this method of ranking results will set them back to pre-florida. From what I can see Yahoo cares less about a higher authority and just indexes URLb ( perhaps with a penalty) - Msn follows google somewhat and indexes URLb but assigns much less weight to the backlinks /pagerank/whatever of URLa.

[edited by: ciml at 6:01 pm (utc) on Mar. 25, 2005]
[edit reason] Note added. [/edit]

Reid




msg:732622
 7:09 am on Mar 26, 2005 (gmt 0)

Well, the hijacker keeps the link active even when I temporarily disable my entire site. So Google won't let me delete the link or cache. I've sent letters to the offender, thier hosting company and to Google. Of course no response from anyone yet.....what a nightmare.

Net warrier this means only one thing. The hijacker has a copy of your page on his site. That is if the link goes to your page even when your site is down.

I would do a detailed route on how this link goes (supposedly to you) and where this page actually resides. Make detailed record of the whole set-up and start threatening people.
This is not a 302 redirect bug, this is blatant hijacking. Is the guy running adsense? Threaten to report him. Threaten to report his host to the proper authorities if they don't take action.
Make sure you got the url of your page on his server first though. You start raising crap and that page will be gone.

Reid




msg:732623
 7:19 am on Mar 26, 2005 (gmt 0)

oh and another thing that is very important net warrier if this guy does have a copy of your page on another server then you want that link to return a 404 so you can remove it from google's directory, don't settle for anything less.

4crests




msg:732624
 5:40 pm on Mar 26, 2005 (gmt 0)

So, my site was hijacked last week. When it happened, the hijacking page replaced my index page in the serps. I used the Google Removal tool. Worked like a charm. However, after 4 days, my original index page still isn't showing back up in the index. My index page is linked on several pages that get cached by google every day. So, I would have thought it would have popped right back into the serps.

Anyone else have anything to say about this? How long have others had to wait before their site came back?

I'm getting worried.

thankfully the other 8,750 pages of my site are all still in the index. It's only my main index page. But, this main index page showed up near the top for many of my main important keywords, now i'm not there.

4crests




msg:732625
 7:17 pm on Mar 26, 2005 (gmt 0)

My page had about 10 url's that I had to get rid of with the Google Removal Tool. They all had a cache of my index page. The google removal tool nicely removed them all.

I wonder if I somehow received a Duplicate Content penalty on my index page before I could get them removed.

When I type my page URL into Internet Explorer, the Google Toolbar still shows my site as a pagerank 5, as normal.

If i did receive a duplicate content penalty, would my PR show zero, or be shaded or something. Or is it normal for it to retain it's PR?

Am I totally off track on this?

epptom




msg:732626
 9:22 pm on Mar 26, 2005 (gmt 0)

+1 emails to google reporting this - I just discovered a competitor doing this to my site.

An extra entry in robots.txt enabling/disabling 302's to/from external sites would take care of this...

excell




msg:732627
 1:11 am on Mar 27, 2005 (gmt 0)

Please excuse me if this is the wrong thread and if this has been answered before (I've tried looking, but there is so much noise, I cannot find the answer with what to do next). I'm trying to find if I am doing something wrong with email to webmaster @ google.com... I sent with the following subject line "Attn: Googleguy Re: 302s canonicalpage & possible google copyright infringemnet" but got an answer from help @ google.com to say go post from the forms on the contact page. Does that mean googleguy will not see my e-mail? Thanks

surfgatinho




msg:732628
 4:38 pm on Mar 27, 2005 (gmt 0)

Seems Google has really cleared up the SERPs and got rid of the skyscraper redirect sites - NOT!

I have just found two indentical (not similar) pages from 2 different skyscraper redirect sites next to each other in the SERPs of a competitive KW

Does this mean the dup filter is broken now or do 302 sites just get special treatment all round.

I never thought I'd hear myself say it but 'Google is broke'!

jeanluc




msg:732629
 5:21 pm on Mar 28, 2005 (gmt 0)

Epptom wrote:
An extra entry in robots.txt enabling/disabling 302's to/from external sites would take care of this...

In my opinion, it would be even better to create a new option for the META ROBOTS tag, like this:
<META NAME="robots" CONTENT="noredir">

1.) This would allow webmasters to be able to deal with redirections due to the ignorance or abuse of other webmasters.

2.) Redirection 302 would still be accepted as such by search engines when authorized by the webmaster of the target page.

3.) The tricks with a temporay "noindex, nofollow" is quite interesting, but it is obviously too tricky to be a real long term solution. The worst side of it is that it creates a lot of unproductive work for the victim.

An advantage of this NOREDIR is that it could work with all search engines. Not only Google suffers from the 302 redirection.

With the recent introduction of NOFOLLOW, the search engines proved they were able to create new attributes to make their work better. Why not a NOREDIR? It would give a competitive advantage to the search engines supporting it.

Jean-Luc

4crests




msg:732630
 9:15 pm on Mar 28, 2005 (gmt 0)

my site is missing from Google since using the Google Removal Tool. However, I see it is showing once in a while on a certain data center. (yes, I did remove the "NOINDEX" metatag immediately).

But, when i hit the Cache button, it shows the following:

"Your search - cache:ehkm_NZ0P5kJ:shop.store.yahoo.com/widgets/ - did not match any documents"

Anyone know what this means?
Why isn't there a cache of my site?

steveb




msg:732631
 10:06 pm on Mar 28, 2005 (gmt 0)

If you removed a page from Google like: store.com/widgets/

Then store.com/widgets/ will be out of the results for some time. (A Supporters thread suggests 90 days.)

Reid




msg:732632
 11:17 pm on Mar 28, 2005 (gmt 0)

my site is missing from Google since using the Google Removal Tool. However, I see it is showing once in a while on a certain data center. (yes, I did remove the "NOINDEX" metatag immediately).

1 Was it there before you used the removal tool?

2 What did you remove?

4crests




msg:732633
 11:28 pm on Mar 28, 2005 (gmt 0)

Well, my site just reappeared in Google.

I'm so happy.

It was out for about a week. My main index page had been hijacked by one of my affiliates. It totally replaced my index page in Google. I used the google removal tool to remove the offending site. Also, found about 9 other offending sites at the same time and removed them also. They all cached back to my index page, so removal was simple. But, after removal, my index page still wasn't back. I wrote to every email address I could possibly find at Google and asked for reinclusion. I'm not sure if it just naturally took a week to come back on it's own, or if the emails I wrote did the trick.

Regardless, I'm going to go celebrate!

frritchi




msg:732634
 12:31 am on Mar 29, 2005 (gmt 0)

given that a 302 between domains is so likely to be a "link" (text someone can click on to get from site a to b ) shouldn't google be treating these as links if they wish PR to be accurate? Most of the large directories (a good source of popularity and reputation) use 302's for linking...whether or not a page jacking problem exists it seems to me a 302 should be treated as a link, especially for a search engine placing so much emphasis on them in their algorithm

Reid




msg:732635
 3:43 am on Mar 29, 2005 (gmt 0)

Good to hear 4 crests. I bet it just took time regardless of your e-mails.

302 should be a link. - I agree but as it stands right now some types of cross-domain 302's are a temporary location of your home page.

window




msg:732636
 6:39 am on Mar 29, 2005 (gmt 0)

I have a site <www.example.com>, which was ranking very well in Google on certain competitive keyphrases. Suddenly all my rankings gone in January.
After reading all the post regarding my problem, I analysed may be I am hijaked...

Plz giude me what can I do.. I am specifying below, what i am doing to cope up with this problem..

1.Sending request to Google automatic URL removal system to remove those URLs whose:
a.Links are showing supplemental result.
b.Links are directing to some other site.
c.Links are not showing any title & description.

2.Modifying our content in all the sites, so that Google find some fresh content and may crawl our site.

3.Modifying images size of the pages in order to change the size of the page, again to invite google to crawl our site.

4Implementing Day/Time function in our site, so that it updates regularly.

5.Adding DTD statement in all the pages of the websites.

6.Placing absolute internal linking on your web site internal pages (i.e. including full domain name in links that are pointing from one page of your site to another page on your site).

7.Creating a 404 error page for our sites.

Should I do these changes?
What else can I do to get back my rankings?

[edited by: ciml at 9:56 am (utc) on Mar. 29, 2005]
[edit reason] Examplified [/edit]

claus




msg:732637
 7:26 am on Mar 29, 2005 (gmt 0)

These very large redirect threads have been active for more than a month now. As talk about removing redirect URLs with the remove tool started quite early, i am wondering:

Has anybody removed a redirect URL from somebody elses site and seen it come back (yet)?

Reid




msg:732638
 10:17 am on Mar 29, 2005 (gmt 0)

Claus - I have been following this thread and yes there have been 2 or 3 who have succeeded in bringing their sites back from oblivion by removing the 302's.
See post #16 above for instance.
"about a week" seems to be the common timeframe.

Window - first step do site:www.yoursite

Any URL's in there which are not yours should be removed. Usually they will have your page as a cache (an old copy). They will also have your title and description but a different URL. Get rid of those.

1. Place a <META name="googlebot" content="noindex, nofollow"> in the head of the page being hijacked.
2. Use URL removal tool to remove the hijackers links.
3. Remove the META tag so google can crawl the page again.
4. Try to get the webmasters who own the offending links to remove them before they get indexed again.

If googlebot doesn't come around much it may be a good idea to modify your home page at least. Make sure your server has a 'last modified' feature enabled (not always possible). If your site was getting crawled ok before it is likely because of those foreign links doing 302 to your home page. Googlebot is going to the other sites to find your home page and is getting lost.
I have seen 2 or 3 people had this same problem and google came back munching away and the SERP's came back about a week after removing those URL's.

Reid




msg:732639
 10:58 am on Mar 29, 2005 (gmt 0)

what to do to prevent 302 hijacks?
Apparently nothing you can do.

There have been some sugguestions
1. Put dynamic content on the page.
2. Use cannonical url's for internal links.

I don't know if any of these things can prevent the 302 problem from happening. After all when googlebot finds a 302 link on another website there is little you can do to change that fact.

Myself I still use relative links for internal crosslinking but I have a base href= META tag on every page. I wouldn't use relative linking without that tag. It seems that anypage is vulnerable esp if it is a deliberate hijack.
I don't understand how cannonical links can sheild a page from hijacks.
Dynamic content- the hijacking links seem to keep the original cache and never update it. I had 2 of these to deal with myself where the current page is completely different than the cached hijacking page. It never gets re-cached so how would dynamic content help?

A custom 404 page? completely useless for preventing hijacks

DTD tag? Nothing to do with hijacks.

Before googlebot even fetches your page it 'already knows' that you are only a temporary location for the hijackers page. You could feed it a 301 if it was possible to know when googlebot is going to show up for this one time fetch but that is impossible.

claus




msg:732640
 11:00 am on Mar 29, 2005 (gmt 0)

Oh, i didn't mean that the hijacked sites came back. I'll try rephrasing:

Has anybody managed to remove a wrong URL on somebody else's site from Google (ie. a 302 redirect URL) and seen the wrong URL (the redirect script) come back in the serps?

accidentalGeek




msg:732641
 2:37 pm on Mar 29, 2005 (gmt 0)

HTTP Filter Defense. Take 2.

After aleksl demonstrated why my first take [webmasterworld.com] failed to solve the problem and I came up with a couple of ways that a determined attacker could hijack a Google listing without using a 302, I left the problem alone for a couple of days and focused on other things. This morning, a variation of the defense popped into my head and I'd like to see if aleksl (or anyone else) can shoot this one down.

Remember that this deals with HTTP 302 hijackings only, some of which appear to be acccidental. It's useless against more clever or more brutish attacks.

Set up a filter on the Web server which intercepts all inbound requests and does the following:

  1. If the client is definitely not a googlebot (or any other targeted robot), take no action. Allow the request to be processed normally.
  2. If the URL contains a special code that we provide for robots (see the next step), and if the code has not yet been used in a request, take internal steps to insure that this code is not used again. Then allow the request to be processed normally.
  3. If the client might be a targeted robot, present it with a dynamic splash screen that contains an ordinary hyperlink. The hyperlink contains the URL it requested in absolute form along with a code as a GET parameter. The code will be generated by our filter. This splash screen may contain a bit of text that we don't mind being indexed under the attacker's page. It should not contain the name of our organization or anything else that we don't want indexed under the attacker's page.

I dislike splash screens as much as the next guy, or maybe more than the next guy. The idea here is not to create a splash screen that everyone sees. It's to dynamically create one for a robot that might have been referred to this page by a 302 link. Because robots do not provide meaningful referrer headers, there is no direct way to tell how they arrived at this page. Because they hit a page, index it, and return for the hyperlinks at some later time, we cannot use a timing mechanism like the one I described in my first take.
Therefore, I think our best bet is to decorate the HTTP request. An unfortunate side effect of this will be that the decoration will appear in the Google listing for this page. This should have no technical effect because the filter will notice that users referred by google are not robots and will let the request straight through. If the Web site contains scripts that rely on GET parameters, it might be a good idea for the filter to strip its code from the parameter list before letting the request through -- just to be safe.

This approach requires a slightly more sophisticated filter than the one I described in my first take because it will need to generate, track, and evaluate codes for one time passes. Because filters on most Web servers are necessarily stateless, the codes will need to be stored in a file, database, or some sort of session agent. There's a performance hit associated with this, but it should apply only to clients that might be robots. This should minimize its impact.

I believe that this defense will succeed where my first take failed because the robot now receives harmless content that it can associate with the 302 referrer. The dynamic splash screen gets indexed under the attacker's listing, but the content is a hyperlink that points to our site. As far as the robot is concerned, this is an ordinary static hyperlink to a completely different Web server, not part of the 302 redirect. If it follows this link, it should index the content it finds under our listing, not that of the attacker.

Does this approach hold more promise than my first take?

theBear




msg:732642
 3:22 pm on Mar 29, 2005 (gmt 0)

Reid asks:

"Dynamic content- the hijacking links seem to keep the original cache and never update it. I had 2 of these to deal with myself where the current page is completely different than the cached hijacking page. It never gets re-cached so how would dynamic content help?"

The dynamic content prevents the duplicate content filter from triping in the first case.

At least that is one of the theories, (one that may account for why one of the sites I work on didn't totally tank).

There is also the theory set forth by others that a 302 hijack is an automatic permanent dup content problem because Google says the target pages are the same so the content must be the same so filter this sucker always.

Since we don't have access to the crown jewels of Google we will never know for sure.

Reid further asks:

"I don't understand how cannonical links can sheild a page from hijacks."

Once again only a theory here as is all of what anyone says on this site.

If the 302 injection causes a site split (plays with Google's cannonical page determination subroutines, or inserts links for the bots to follow) then the 301 rewrite rules prevent the site split thus preventing massive duplicate content problems. Use of relative links is implied here of course.

Please note that any page of a site would be subject to replication even if nonrelative hrefs were used, but it would be on a page by page basis and would self correct in time (maybe not fast enough however).

Remember this all theory.

Marcia




msg:732643
 4:07 pm on Mar 29, 2005 (gmt 0)

It's being compounded by meta refresh, sometimes being used on its own and sometimes together with 302.

grail




msg:732644
 5:19 pm on Mar 29, 2005 (gmt 0)

FAGIN

(spoken) You see, Oliver...

(sung) In this life, one thing counts
first page serps, large amounts
I'm afraid these don't grow on trees,
You've got to link with three oh two

You've got to link with three oh two, boys,
You've got to link with three oh two.

BOYS

Large amounts don't grow on trees.
You've got to link with three oh two.

FAGIN

(spoken) Let's show Oliver how it's done, shall we, my dears?

(sung) Why should we break our backs
Stupidly paying tax?
Better get some adsense income
Better link with three oh two.

You've got to link with three oh two, boys
You've got to link with three oh two.

BOYS

Why should we all break our backs?
Better link with three oh two.

FAGIN

(spoken) Who says crime doesn't pay?

(sung) Widget Website, what a crook!
Gave away, what he took.
Charity's fine, subscribe to mine.
Get out and join adsense too

You've got to join adsense too, boys
You've got to join adsense too.

BOYS

Widget Website was far too good
He had to join adsense too.

FAGIN

Take a tip from scraper sites
they can rip what they likes.
I recall, they started small
then they link with three oh two.

You've got to link with three oh two, boys
You've got to link with three oh two.

BOYS

We can rank like scraper sites
If we link with three oh two.

FAGIN

(spoken) Stop thief!

Dear old gent passing by
Something nice takes his eye
Everything's clear, attack the rear
Get in and link with three oh two.

You've got to link with three oh two, boys
You've got to link with three oh two.

BOYS

Have no fear, attack the rear
Get in and link with three oh two.

FAGIN

When scraper see content rich,
adsense thumbs start to itch
now they rank some page of mine
they have link with three oh two.

You've got to link with three oh two, boys
You've got to link with three oh two.

BOYS

Just to find some page of mine

FAGIN AND BOYS

We have to link with three oh two!

Reid




msg:732645
 7:40 am on Mar 30, 2005 (gmt 0)

That's a funny poem Grail but although the 302 hijackers may enjoy stealing other peoples hard work a true SEO strategy is to build a good website.
Vultures come and go but the REAL website will steadily get better and better.
So have fun while it lasts but don't forget "what comes around goes around"

grail




msg:732646
 9:53 am on Mar 30, 2005 (gmt 0)


Just to explain to those unfamiliar with 'fagin'. That was just a joke to be sung to the tune of "Pick a pocket or two" from "Oliver Twist".

It was meant to 'take the mickey' out of google/adsense/fagin not antagonise the victims of 302.

vincentg




msg:732647
 2:58 pm on Mar 30, 2005 (gmt 0)

The concern on 302 is in my opinion being blown way out of proportion.

I am seeing posts to create bot to try and defend against such a thing.

The web does not need more bots!
Bots written by non-professionals are only going to cause problems.

I have seen no hard facts to support this claim but I will not dismiss it as a possibility.

First a 302 by itself will not harm a website according to those that have brought this topic to life.

If you do not make it clear as to what the problem is you will have touched off a frenzy to remove 302 link every where.

Website owners that do this will in fact be hurting their PR rather than helping it.

All links are important and just removing links due to a scare based on a 302 Google problem is not an answer.

I run a website that does a redirect rather than a direct link. There is nothing wrong with this.
Yahoo does a Redirect as do PPC Search Engines and others.

I am listed in Yahoo and they have hurt my PR and I have listed in many PPC engines which have not effect my PR either.

My Directory is listed in other Directories which use redirects and again I have no problem.

If there are websites that cause a problem then I say post them here or bring a Google Rep into the Forum to clear this up!

Vincent G. Click4choice

accidentalGeek




msg:732648
 10:54 pm on Mar 30, 2005 (gmt 0)

Vincent, I get the sense from your post that, like nearly everyone who uses the World Wide Web today, you are unfamiliar with the HTTP protocol. I don't intend this as an insult. One sure sign that a technology has matured is that you don't need to be a geek to use it.

You can build a fine static web site without even knowing that HTTP exists. You can build a fine dynamic one with very little knowledge of the protocol. However, when it comes down to the expected behavior of clients (usually web browsers) when faced with various HTTP response codes, it's time to bring out the official protocol specification [webmasterworld.com] and get geeky.

HTTP Response Codes: a simple overview
At its most basic level, HTTP is a simple request-response protocol where the client sends one request and the server returns one response.(1) The HTTP specification details what constitutes a valid request and a valid response. A valid response will always contain exactly one numeric status code. The status code contains exactly three digits and the first digit places the response into one of four broad categories. The third category (status code 300-307) covers various types of redirects which can be used to inform a client that the content is available in some other location or must be accessed using some other means.

Different Types of Redirect (300-307)
The key here is that HTTP 1.1 specifies seven different kinds of redirect (Count seven rather than eight because 306 is unspecified). These redirect codes tell the client something about the nature of the redirect. But here's where it gets tricky. The specification does not dictate exactly what the client is expected to do with the response. In some places, the specification recommends an action (note the word "SHOULD" in the specification), but ultimately the client is free to do whatever it would like.

From your post, it seems to me that you were confused into thinking that all HTTP redirects were the same. A quick review of the specification will show that they are not. The problem we're facing with "hijacked" Google listings results from the way that a particular HTTP client, a googlebot, handles a particular kind of redirect, HTTP 302:


302 Found
The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field.

In other words, 302 means that the "resource" (generally a Web page) is temporarily unavailable at the requested URL, but can be found at some other URL.

Robots and HTTP 302
When confronted with a 302, googlebot uses the redirect to load the resource and then indexes it under the original URL. This behavior makes perfect sense if we make the assumption that the resource will eventually return home and be available at its original URL. After all, this is what the response code indicates.

However, this assumption introduces an element of trust. The client must trust that the server is telling the truth. That is, a) the resource normally lives at the requested URL and b) the resource is also available at some other location on a temporary basis.

This required element of trust creates an opening for a misbehaving or malicious Web server to "hijack" a google listing. It needs only to issue a 302 redirect to some other Web server. A googlebot will assume that the content it finds on the other end of the redirect really belongs on the first Web server. Note that this is not a stupid assumption on the part of the googlebot. The assumption is built into the specification of response code 302. My guess is that this code was specified in a more innocent era when systems generally trusted one another. I doubt that the architects of HTTP 1.0 had robots in mind and there's no noticable consequence if an ordinary web browser follows a 302 redirect that really should have been a 301 ("moved permanently"). The only difference should be in the way that the browser maintains its cache.

If you're Google or the maintainer of some other crawler robot, this situation presents a problem that is difficult to solve. How do you avoid getting duped by an incorrect 302 without behaving in a way inconsistent with the specification? Put more simply, how do you fix the current problem without breaking all sorts of well-established systems, some of which you won't know about until the complaints come flooding in?

Scope of the Problem
From what I've read, people who study such things have known about this vulnerability for several years. However, it has recently become more widely known and the number of reported exploitations has been on the rise. The aspect about this vulnerability that bothers me the most is how easy it is to exploit. It takes virtually no expertise. An exploit can be achieved with a line or two of server-side script code. The effect of the exploit is that the target's listing in Google (and other search engines) will be replaced by one that contains the target's content and a hyperlink to any arbitrary URL that the attacker designates. If you run a site that helps small children learn to read, an attacker can make your Google listing point to a porn site. If you run a banking site, an attacker can make your Google listing point to a phishing site.
The push toward a solution may be mitigated by the fact that there are other attacks that achieve the same result and are much more difficult to detect and likely impossible to defend against.

Possible Defenses
I've seen a number of proposals for defending a site against being indexed by robots that were referred by an HTTP 302. Most of them involve tweaking content or deploying meta-tags. In my view, these are unlikely to be effective because they operate at a higher level than the problem. It's like trying to deal with a flooded basement when your sump and hoses are stuck on the third floor. Because this is a protocol-level problem, I believe that effective solutions are to be found on the protocol level. I proposed a couple of solutions earlier. Aleksl demonstrated that my first solution was doomed from the start. I haven't seen any response to my second.

-----------
1. This statement is correct for HTTP 1.0. It's an oversimplification for HTTP 1.1 which introduces some flow control and allows multiple requests and responses per connection.

This 467 message thread spans 16 pages: 467 ( [1] 2 3 4 5 6 7 8 9 ... 16 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved