Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google's 302 Redirect Problem

         

ciml

4:17 pm on Mar 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(Continuing from Google's response to 302 Hijacking [webmasterworld.com] and 302 Redirects continues to be an issue [webmasterworld.com])

Sometimes, an HTTP status 302 redirect or an HTML META refresh causes Google to replace the redirect's destination URL with the redirect URL. The word "hijack" is commonly used to describe this problem, but redirects and refreshes are often implemented for click counting, and in some cases lead to a webmaster "hijacking" his or her own URLs.

Normally in these cases, a search for cache:[destination URL] in Google shows "This is G o o g l e's cache of [redirect URL]" and oftentimes site:[destination domain] lists the redirect URL as one of the pages in the domain.

Also link:[redirect URL] will show links to the destination URL, but this can happen for reasons other than "hijacking".

Searching Google for the destination URL will show the title and description from the destination URL, but the title will normally link to the redirect URL.

There has been much discussion on the topic, as can be seen from the links below.

How to Remove Hijacker Page Using Google Removal Tool [webmasterworld.com]
Google's response to 302 Hijacking [webmasterworld.com]
302 Redirects continues to be an issue [webmasterworld.com]
Hijackers & 302 Redirects [webmasterworld.com]
Solutions to 302 Hijacking [webmasterworld.com]
302 Redirects to/from Alexa? [webmasterworld.com]
The Redirect Problem - What Have You Tried? [webmasterworld.com]
I've been hijacked, what to do now? [webmasterworld.com]
The meta refresh bug and the URL removal tool [webmasterworld.com]
Dealing with hijacked sites [webmasterworld.com]
Are these two "bugs" related? [webmasterworld.com]
site:www.example.com Brings Up Other Domains [webmasterworld.com]
Incorrect URLs and Mirror URLs [webmasterworld.com]
302's - Page Jacking Revisited [webmasterworld.com]
Dupe content checker - 302's - Page Jacking - Meta Refreshes [webmasterworld.com]
Can site with a meta refresh hurt our ranking? [webmasterworld.com]
Google's response to: Redirected URL [webmasterworld.com]
Is there a new filter? [webmasterworld.com]
What about those redirects, copies and mirrors? [webmasterworld.com]
PR 7 - 0 and Address Nightmare [webmasterworld.com]
Meta Refresh leads to ... Replacement of the target URL! [webmasterworld.com]
302 redirects showing ultimate domain [webmasterworld.com]
Strange result in allinurl [webmasterworld.com]
Domain name mixup [webmasterworld.com]
Using redirects [webmasterworld.com]
redesigns, redirects, & google -- oh my [webmasterworld.com]
Not sure but I think it is Page Jacking [webmasterworld.com]
Duplicate content - a google bug? [webmasterworld.com]
How to nuke your opposition on Google? [webmasterworld.com] (January 2002 - when Google's treatment of redirects and META refreshes were worse than they are now)

Hijacked website [webmasterworld.com]
Serious help needed: Is there a rewrite solution to 302 hijackings? [webmasterworld.com]
How do you stop meta refresh hijackers? [webmasterworld.com]
Page hijacking: Beta can't handle simple redirects [webmasterworld.com] (MSN)

302 Hijacking solution [webmasterworld.com] (Supporters' Forum)
Location: versus hijacking [webmasterworld.com] (Supporters' Forum)
A way to end PageJacking? [webmasterworld.com] (Supporters' Forum)
Just got google-jacked [webmasterworld.com] (Supporters' Forum)
Our company Lisiting is being redirected [webmasterworld.com]

This thread is for further discussion of problems due to Google's 'canonicalisation' of URLs, when faced with HTTP redirects and HTML META refreshes. Note that each new idea for Google or webmasters to solve or help with this problem should be posted once to the Google 302 Redirect Ideas [webmasterworld.com] thread.

<Extra links added from the excellent post by Claus [webmasterworld.com]. Extra link added thanks to crobb305.>

[edited by: ciml at 11:45 am (utc) on Mar. 28, 2005]

larryhatch

11:28 pm on Apr 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's one, completely new to me:

Doing a site:mysite check, I found 4 of my pages with title only, no description.

I did a Copyscape check on those looking for scrapers.

One page wasn't scraped really, just my anchor-text and a snippet,
but get this: the hypertext LINK reads:
<a href="/go/jjj.yneelungpu.arg"> #*$!x Map: Eastern Hemisphere</a>

jjj.yneelungpu.arg?

I put that into the address bar and of course there was no such URL or TLD.

The 'linking' site had similar baby-talk URLs for other sites besides mine.

I'm used to the /scrapings.php/site#123 type ripoffs, but what is this?

What are the mechanics of the "/go part of such an hyperlink? -Larry

claus

11:34 pm on Apr 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, i might have been a little too early with the optimism, so perhaps i should not call it solved yet. It's a very clear indication that something is being done actively, though.

I agree 100% that "the proof will be in the pudding [webmasterworld.com]" so let's see some of those sites come back before we jump for joy.

If this "data wash" is anything like a real update (it should be similar regarding the amount of data to update) then it will take time as batches run and data is shifted betweeen DC's and such. To make the sites come back a real update is probaly needed on the cleaned data as well, so it might still take a while.

claus

11:37 pm on Apr 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> jjj.yneelungpu.arg?

It's just a file name. It can be anything, even a php script. Its probably an ID field in a database using letters in stead of numbers (for whatever reason).

(eg. a rewrite of "go.php?jjj&yneelungpu&arg")

[edited by: claus at 11:49 pm (utc) on April 18, 2005]

theBear

11:48 pm on Apr 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Claus,

I agree something is being done.

What follows is highly maybe could have or could happen but we don't know or may never know. In other words take it with a dump truck load of salt.

However getting the baby out of the drain pipe where it was tossed isn't a simple matter.

Sites may have been split because of this, that means that PR probably took hits and sites downstream got a kick in the head as well.

Then folks who said I'll wait got classified as spammers and then those sites started losing pages.

Then there were all those new 301's that while they prevented one form of site cancer may have tripped the new link addition filters.

So damned if you do damned if you don't.

GoogleGuy

11:59 pm on Apr 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We've been steadily improving our heuristics for 302s based on the feedback that you've sent us. There have been two recent changes that I know of. We changed things so that site: won't return results from other sites in the supplemental results. We are also changing some of the core heuristics for the results for 302s. I believe that most of these changes are out, but there may be a few more in the pipeline.

Note that for inurl: and allinurl: searches, results from other sites are perfectly valid. So if you own yoursite.com and do a search allinurl:www.yoursite.com, it's a completely valid result to get a url from www.someothersite.com/resources?url=www.yoursite.com, for example. That's how inurl: and allinurl: are supposed to work--they match all docs with the requested terms in the url, not just docs on www.yoursite.com. That doesn't imply any problem/hijacking/issue; just that someone else had your domain name in their url.

Thank you for the feedback that people have given us about 302s. I'd be interested to hear if anyone sees a result where site:yoursite.com returns urls from domains other than yoursite.com. You might want to wait another few days before checking though, to give things time to get fully out. I have to duck out right now, but I'll try to stop by and give more details as things are more fully deployed.

larryhatch

12:18 am on Apr 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello GG:

Thanks for a fast long awaited response.

Its good that other sites are removed from the site: search,
but it also kills our main way of finding malicious 302 redirects.

"We are also changing some of the core heuristics for the results for 302s."

That, I think, is the burning question!

Will the heuristics changes prevent a malicious site from scoring for my original content?
Will these changes pass PR (etc.) thru to the actual pages with the content?
Will the 302-jackers be derated if not penalized?

Since there remain legitimate uses for the 302 redirect,
is there a simple way to only credit those redirected to the same domain?
That alone might solve most of the problem. -Larry

Marval

12:24 am on Apr 19, 2005 (gmt 0)

10+ Year Member



I agree about the core changes and also agree that taking away the ability to see those other sites with the site: command might not be the best solution as many of us have used them to Googles benefit as well as our own to remove people that were scraping.

I also noticed that the algo for the site: command may have a little glitch as you can get the total number of results (pages for the site) on the first page (ex.125 pages) and that number reduces itself as you go deeper in the results - losing 1 or 2 pages each 10 results in the total pages number at the top. Ive tried it and replicated it across a few domains but of curse others may not see the same thing.

I was also wondering why the &filter=0 filter wording was changed - seems not to have the same effect anymore as it was originally designed and talked about here a little over a year ago - seems that Google still has a similar filter working but it's not accessible using that command? It was very useful in seeing if Google considered something from your site a duplicate result - again helping us find scrapers and report them.

g1smd

12:31 am on Apr 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> We changed things so that site: won't return results from other sites in the supplemental results. <<

Why only in supplemental results? What about normal results too?

Not showing the sites in the search results is not the same as not having the rogue URLs in the database. Not the same by a very long way.

Much of the stuff that has been seen in recent results should not have even made it into your database.

Why can't this be fixed by going back to how things were a few years ago? And, dare I mention the logic that Yahoo applies to 301 and 302 redirects, and the different way that they treat onsite and offsite redirects now?

As for the search I mentioned above, is it really true that out of 1.2 million pages, Google only has 950 pages indexed for the ODP now (but says there are 11 million when you first look)? This filtering of the results served, rather than cleaning of the internal dataset, seems to have some flaws perhaps?

larryhatch

12:58 am on Apr 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(copied from a dictionary site. -LH)

HEURISTICS - This describes a set of rules developed to attempt to solve problems when a specific algorithm cannot be designed.
For example, if the problem is "When do you eat food?", if you answer, "When I'm hungry" then you would have to eat immediately every single time you were hungry.
Instead, we follow heuristics to determine when to eat by gauging our hunger level, the situation we are in, and our ability to get food. As you can imagine, heuristics are very important for solving artificial intelligence problems.

GoogleGuy

1:06 am on Apr 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



g1smd, the changes had already been applied for the main results. The web has changed over time, but url canonicalization is definitely on our radar now--contacting at google.com/support with the title of "canonicalpage" will make it to engineers who read the reports and suggestions.

larryhatch, I believe the answers are yes, I'm not sure given the current heuristics, and yes. Marval, if someone is doing 302s to your site, you might be able to find redirecters by looking in your server logs for unusual referrers. I'll ask about filter=0. There's been some index changes lately, but I hadn't heard about any changes with filter=0.

This 467 message thread spans 47 pages: 467