Solutions for 302 Redirects and META Refreshes in Google

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Solutions for 302 Redirects and META Refreshes in Google

Ideas for Google or webmasters to help with the "hijacking".

ciml

4:14 pm on Mar 25, 2005 (gmt 0)

We have had plenty of discussion on the META refresh and 302 redirect issue, and this thread is intended as a repository for each of the ideas to help or solve the problem.

If you have an idea that Google could use to alleviate this problem, or that a webmaster could use to fix or avoid this problem, please post it here.

Each post should contain only one idea. Each idea should have only one post. There's no need for a long code example, just the mechanism.

Any followup discussion belongs in the Google's 302 Redirect Problem [webmasterworld.com] thread, not here.

claus

12:04 pm on Mar 27, 2005 (gmt 0)

>> CN++, M$, Or*cle, S*n, Newswe*k, IB*, and Genital Motors

I haven't checked all those, but it's common for large-ish sites, as you noted. A lot of news sites do this as well. They don't do this in order to redirect www to non-www (or the other way round) - some of them do have www and non-www duplicate issues anyway (a 302 does not help with this).

What they do is to use the 302 redirect as it's supposed to be used. On their front page, they always redirect to the most recent version. As that version can change URL, they want the browser/user/spider to keep the main page URL for next visit, but look up the content on the newest URL.

DaveAtIFG

6:34 pm on Mar 27, 2005 (gmt 0)

These include, CN++, M$, Or*cle, S*n, Newswe*k, IB*, and Genital Motors

What claus said, and those sites all enjoy substantial Page Rank which makes them extremely difficult to PageJack.

As GoogleGuy said:

PageRank is a pretty good proxy for reputation, and incorporating PageRank into the decision for the canonical url helps to choose the right url.

GuinnessGuy

11:05 pm on Mar 27, 2005 (gmt 0)

Hi Claus,

Interesting observation. But not all the sites I listed seem to have in mind what you are saying, that is, some just have a target of www.somedomain.com rather than a specific file that should be captured. So if you type in the the www version there is no redirection.

One thing that confuses me with regards to their 302 status codes is that some will say, "Object Moved", while others say, "Moved Temporarily". What is the difference from a technical standpoint, or are these decriptors supplied by the coders for their own personal reasons?

Too bad these big boys can't fall prey to this hi-jacking. We'd get a lot more attention from Google if they could.

Question: If I notice that the hi-jacker's site is returning a 404, can I then use the removal tool just as if my site were delivering a 404? Or does it have to be my site that is showing a 404 to do this magic?

GuinnessGuy

macdave

3:03 pm on Mar 28, 2005 (gmt 0)

It's been suggested several time in this thread and elsewhere that adding dynamic content to your pages can help to prevent a hijack by avoiding potential duplicate content. However, the more I look into this issue, the more I believe the hijacking problem is about duplicate URLs and not about duplicate content.

Consider an internal page that has been hijacked:

hijacker.com/link.php?id=1234 --> 302 redirect --> mysite.com/somepage.html

Search for

site:mysite.com

and you get a bunch of pages from mysite.com, plus the hijacking URL from hijacker.com. Classic sign of the hijack.

Now search for

site:mysite.com inurl:mysite.com/somepage.html

and the results still include the hijacker.com URL, even though it (hijacker.com/link.php?id=1234) doesn't contain your site name or page name.

This tells me that for each "page," Google stores at least two URLs in its index:
1) Display URL: The URL that is displayed and linked to in the SERPs
2) Content URL: The URL that is queried by "inurl" and "allinurl" searches

When determining which URL to display, there's no need for Google to even consider duplicate content. It sees that there is more than one "page" with the same Content URL, assumes duplicate content based on that fact alone, and then chooses a Display URL based on the factors GG mentioned in his Slashdot post. Game over.

(This makes sense when you consider that the cache and the search index are essentially two different systems. The search index only knows about Content URLs. The SERP display and cache systems only know about Display URLs, and some piece in between links Content URLs with Display URLs. This duality would appear to be pretty deeply ingrained in Google's multi-tiered search architecture, and hence may not be as easy a problem to sort out as we'd like to think. Though

if (DisplayURL == ContentURL) { it's the canonical URL }

would be a pretty obvious fix...)

In this context, the

<base href="http://mysite.com/somepage.html">

suggestion makes a lot of sense, and is certainly a factor that Google could be considering as part of its canonicalization algo.

g1smd

7:45 pm on Mar 28, 2005 (gmt 0)

You might be right about the two URLs. Is this related observation any help?

An informational site in a foreign country that I tried to get some incorrect content modified on, updated their content recently. There are two URLs for the content, but they represent the same physical harddrive space.

Both URLs showed in the results for a while, but when you clicked on "cache" for www.domain.it/keyword/ the text above the cached copy said "this is Google's cache of keyword.otherkeyword.it/somefolder/" - Google knows that they are the same content, has only one cache copy, has identified the cached copy under one URL, and then redirects calls for the "other" URL back to this one.

claus

7:09 am on Mar 29, 2005 (gmt 0)

GuinessGuy, discussion should be in this thread [webmasterworld.com] in stead, but a quick reply: (A) Some very large companies with lots of high-payed employees just don't get it - they use the 302 where they should use a 301. It's wrong even though they're big. (B) As for "Object Moved" i believe what you saw is the difference between Apache and Microsoft IIS. But, you can configure your redirect to display whatever message you want to.

claus

9:29 am on Mar 30, 2005 (gmt 0)

Just thought i would add this thread from May 2004: What about those redirects, copies and mirrors? [webmasterworld.com]. I think the opening post should be mandatory reading at the search engines, but then again i'm biased as i wrote it ;)

window

11:10 am on Apr 5, 2005 (gmt 0)

I am observing some bad links are appearing with my site's pages. When I search my site using "allinurl" command, links (with %20 or www.www) which I have never created is appearing with my site's pages.
Also some other sites link like:

othersite.com/links/index.php3?mode=update_ link&link=http://www.mysite.com%2F - 30k - Supplemental Result - Cached - Similar pages

I send request to google at help@google.com with subject canonicalpages They replied:
" We'd like to assist you, but we only
respond to messages submitted through our online contact form. Please
visit [google.com...] to submit your message, and we'll get
back to you soon. "

I cant remove these URLs by removal tool as these pages don't exists.

What should I do now?

broker_boy

10:40 am on Apr 11, 2005 (gmt 0)

I had the same problem and i had to email them using that form problem was 1000 characters was not enough to explain the problem fully

Cheers,

This 39 message thread spans 2 pages: 39