Welcome to WebmasterWorld Guest from 188.8.131.52
Sometimes, an HTTP status 302 redirect or an HTML META refresh causes Google to replace the redirect's destination URL with the redirect URL. The word "hijack" is commonly used to describe this problem, but redirects and refreshes are often implemented for click counting, and in some cases lead to a webmaster "hijacking" his or her own URLs.
Normally in these cases, a search for cache:[destination URL] in Google shows "This is G o o g l e's cache of [redirect URL]" and oftentimes site:[destination domain] lists the redirect URL as one of the pages in the domain.
Also link:[redirect URL] will show links to the destination URL, but this can happen for reasons other than "hijacking".
Searching Google for the destination URL will show the title and description from the destination URL, but the title will normally link to the redirect URL.
There has been much discussion on the topic, as can be seen from the links below.
How to Remove Hijacker Page Using Google Removal Tool [webmasterworld.com]
Google's response to 302 Hijacking [webmasterworld.com]
302 Redirects continues to be an issue [webmasterworld.com]
Hijackers & 302 Redirects [webmasterworld.com]
Solutions to 302 Hijacking [webmasterworld.com]
302 Redirects to/from Alexa? [webmasterworld.com]
The Redirect Problem - What Have You Tried? [webmasterworld.com]
I've been hijacked, what to do now? [webmasterworld.com]
The meta refresh bug and the URL removal tool [webmasterworld.com]
Dealing with hijacked sites [webmasterworld.com]
Are these two "bugs" related? [webmasterworld.com]
site:www.example.com Brings Up Other Domains [webmasterworld.com]
Incorrect URLs and Mirror URLs [webmasterworld.com]
302's - Page Jacking Revisited [webmasterworld.com]
Dupe content checker - 302's - Page Jacking - Meta Refreshes [webmasterworld.com]
Can site with a meta refresh hurt our ranking? [webmasterworld.com]
Google's response to: Redirected URL [webmasterworld.com]
Is there a new filter? [webmasterworld.com]
What about those redirects, copies and mirrors? [webmasterworld.com]
PR 7 - 0 and Address Nightmare [webmasterworld.com]
Meta Refresh leads to ... Replacement of the target URL! [webmasterworld.com]
302 redirects showing ultimate domain [webmasterworld.com]
Strange result in allinurl [webmasterworld.com]
Domain name mixup [webmasterworld.com]
Using redirects [webmasterworld.com]
redesigns, redirects, & google -- oh my [webmasterworld.com]
Not sure but I think it is Page Jacking [webmasterworld.com]
Duplicate content - a google bug? [webmasterworld.com]
How to nuke your opposition on Google? [webmasterworld.com] (January 2002 - when Google's treatment of redirects and META refreshes were worse than they are now)
Hijacked website [webmasterworld.com]
Serious help needed: Is there a rewrite solution to 302 hijackings? [webmasterworld.com]
How do you stop meta refresh hijackers? [webmasterworld.com]
Page hijacking: Beta can't handle simple redirects [webmasterworld.com] (MSN)
302 Hijacking solution [webmasterworld.com] (Supporters' Forum)
Location: versus hijacking [webmasterworld.com] (Supporters' Forum)
A way to end PageJacking? [webmasterworld.com] (Supporters' Forum)
Just got google-jacked [webmasterworld.com] (Supporters' Forum)
Our company Lisiting is being redirected [webmasterworld.com]
This thread is for further discussion of problems due to Google's 'canonicalisation' of URLs, when faced with HTTP redirects and HTML META refreshes. Note that each new idea for Google or webmasters to solve or help with this problem should be posted once to the Google 302 Redirect Ideas [webmasterworld.com] thread.
<Extra links added from the excellent post by Claus [webmasterworld.com]. Extra link added thanks to crobb305.>
[edited by: ciml at 11:45 am (utc) on Mar. 28, 2005]
I can confirm that the "jacker" urls within a site view are no longer showing.
I had a sticky from a fellow member who I was working with, he went looking for the 302's I stickyed to him that were showing up as being part of his site.
I also confirmed that the leaches attached to one of our sites also no longer show up in a site: search.
And a certain Drudge no longer has any attached to his site. In fact I looked at 15 sites that I knew about having leaches and they were all gone.
Now is the problem fixed?
I don't know it could just be hidden
joined:Mar 17, 2005
I'll be happy to recant when and if my G referals start topping AltaVista referals again.
LOL, nothing like putting it in their face.
I shall have to do some searchs a bit later. There were whole piles of sites that I was aware of that got hit.
I stopped after looking at 15 of them so I don't think we had a hand edit done.
Altough I did tell Google how to cleanup the mess I don't think they would pay a clerk to sit at a computer to hand delete many entries.
When one PHd programmer type could do far more damage automagicly if you get my drift.
If anyone from GOOG should read this: Thanks a lot :)
Now it will be interesting to see at which rate the sites that were hit will surface again :)
My ex-hijecked site does now appear 11th for a seearch for its site name, better than not at all, but not first like it did for four years. Other example searches include going from 21st to out of the top 1000, and 1st to sixtieth.
They may be handling the technical part now, but the damage to their index continues. In the above site's case, it appears this four year old site is being treated as if sandboxed, that is, brand new after the hijacking URLs were removed.
I also confirmed that the leaches attached to one of our sites also no longer show up in a site: search.
Now is the problem fixed?
That would be a big relief!
However I spotted something else today while trying to figure out what exactly is happening on a site I manage which I will call "target site".
I was checking the ranking of the site name for the target site with a tool that gives the Google ranking for keywords and found a PHP redirect to the site in the results, instead of the target sites site name, and it's ranking #62 in Google.
I checked the Site: and Allinurl: commands and nothing there.
But the site, while it is fully indexed and has been since early January, and has at least 50 quality links, is not ranking for it's major keywords in Google and particularly it's site name (but is doing fine in Yahoo) and the site is almost 6 months old. By now it should be ranking #1 for at least the site name. And thus my suspicions that there is a redirect affecting this site.
I checked out the redirecting site and while there is a real URL on the page to the the target site it is NOT an active link, yet there are other links for Comments and votes, and the Site name for this site which all contain a php redirect. SO I'm wondering why they purposely added the real URL? Possibly to make it APPEAR legit to the unknowing or for Google to pick up the URL?
So now, along with the above discussion I'm hoping google has indeed removed the redirects from it's index, BUT what if it has just removed the evidence?
I was able to find that site with the redirect by searching google for the site name and looking for #62 in the results. it's still in the results with a redirect pointing to the target site. but it's not in the site: search.
As far as traffic goes my most affected site is doing better than before since the latter part of March.
The newer site is either still sandboxed or not optomized correctly, I get a handfull of referrals each day but my main traffic comes from links there so its no big deal.
What about this then?
A site:dmoz.org search says there are 11 million results, but you can't get beyond 950 results however hard you might try to do so.
What does that 11 million figure represent anyway? There are only 600 000 categories, and 600 000 category charters, and 60 000 editor profiles, and a few hundred guidelines and informational pages to index, making only about 1.2 million real pages.
All they had to do is disallow non-mysite.net URLs from
the results of searches for site:mysite.net.
That's way faster and easier than actually discrediting 302 hijacks.
One indication: site:mysite.net indicates 153 pages.
I get to about 147, and it stopped listing them, saying "similar results were not shown."
I clicked to see the full list. I STILL got 147, and
the "similar results" option disappeared.
What and were are those last 6 links?
That's about how many phony 302's I had previously.
The proof will be in the SERPs, but not in my case.
302s didn't affect me that badly once I got some kicked out. -Larry
Doing a site:mysite check, I found 4 of my pages with title only, no description.
I did a Copyscape check on those looking for scrapers.
One page wasn't scraped really, just my anchor-text and a snippet,
but get this: the hypertext LINK reads:
<a href="/go/jjj.yneelungpu.arg"> #*$!x Map: Eastern Hemisphere</a>
I put that into the address bar and of course there was no such URL or TLD.
The 'linking' site had similar baby-talk URLs for other sites besides mine.
I'm used to the /scrapings.php/site#123 type ripoffs, but what is this?
What are the mechanics of the "/go part of such an hyperlink? -Larry
I agree 100% that "the proof will be in the pudding [webmasterworld.com]" so let's see some of those sites come back before we jump for joy.
If this "data wash" is anything like a real update (it should be similar regarding the amount of data to update) then it will take time as batches run and data is shifted betweeen DC's and such. To make the sites come back a real update is probaly needed on the cleaned data as well, so it might still take a while.
It's just a file name. It can be anything, even a php script. Its probably an ID field in a database using letters in stead of numbers (for whatever reason).
(eg. a rewrite of "go.php?jjj&yneelungpu&arg")
[edited by: claus at 11:49 pm (utc) on April 18, 2005]
I agree something is being done.
What follows is highly maybe could have or could happen but we don't know or may never know. In other words take it with a dump truck load of salt.
However getting the baby out of the drain pipe where it was tossed isn't a simple matter.
Sites may have been split because of this, that means that PR probably took hits and sites downstream got a kick in the head as well.
Then folks who said I'll wait got classified as spammers and then those sites started losing pages.
Then there were all those new 301's that while they prevented one form of site cancer may have tripped the new link addition filters.
So damned if you do damned if you don't.
Note that for inurl: and allinurl: searches, results from other sites are perfectly valid. So if you own yoursite.com and do a search allinurl:www.yoursite.com, it's a completely valid result to get a url from www.someothersite.com/resources?url=www.yoursite.com, for example. That's how inurl: and allinurl: are supposed to work--they match all docs with the requested terms in the url, not just docs on www.yoursite.com. That doesn't imply any problem/hijacking/issue; just that someone else had your domain name in their url.
Thank you for the feedback that people have given us about 302s. I'd be interested to hear if anyone sees a result where site:yoursite.com returns urls from domains other than yoursite.com. You might want to wait another few days before checking though, to give things time to get fully out. I have to duck out right now, but I'll try to stop by and give more details as things are more fully deployed.
Thanks for a fast long awaited response.
Its good that other sites are removed from the site: search,
but it also kills our main way of finding malicious 302 redirects.
"We are also changing some of the core heuristics for the results for 302s."
That, I think, is the burning question!
Will the heuristics changes prevent a malicious site from scoring for my original content?
Will these changes pass PR (etc.) thru to the actual pages with the content?
Will the 302-jackers be derated if not penalized?
Since there remain legitimate uses for the 302 redirect,
is there a simple way to only credit those redirected to the same domain?
That alone might solve most of the problem. -Larry
I also noticed that the algo for the site: command may have a little glitch as you can get the total number of results (pages for the site) on the first page (ex.125 pages) and that number reduces itself as you go deeper in the results - losing 1 or 2 pages each 10 results in the total pages number at the top. Ive tried it and replicated it across a few domains but of curse others may not see the same thing.
I was also wondering why the &filter=0 filter wording was changed - seems not to have the same effect anymore as it was originally designed and talked about here a little over a year ago - seems that Google still has a similar filter working but it's not accessible using that command? It was very useful in seeing if Google considered something from your site a duplicate result - again helping us find scrapers and report them.
Why only in supplemental results? What about normal results too?
Not showing the sites in the search results is not the same as not having the rogue URLs in the database. Not the same by a very long way.
Much of the stuff that has been seen in recent results should not have even made it into your database.
Why can't this be fixed by going back to how things were a few years ago? And, dare I mention the logic that Yahoo applies to 301 and 302 redirects, and the different way that they treat onsite and offsite redirects now?
As for the search I mentioned above, is it really true that out of 1.2 million pages, Google only has 950 pages indexed for the ODP now (but says there are 11 million when you first look)? This filtering of the results served, rather than cleaning of the internal dataset, seems to have some flaws perhaps?
HEURISTICS - This describes a set of rules developed to attempt to solve problems when a specific algorithm cannot be designed.
For example, if the problem is "When do you eat food?", if you answer, "When I'm hungry" then you would have to eat immediately every single time you were hungry.
Instead, we follow heuristics to determine when to eat by gauging our hunger level, the situation we are in, and our ability to get food. As you can imagine, heuristics are very important for solving artificial intelligence problems.
larryhatch, I believe the answers are yes, I'm not sure given the current heuristics, and yes. Marval, if someone is doing 302s to your site, you might be able to find redirecters by looking in your server logs for unusual referrers. I'll ask about filter=0. There's been some index changes lately, but I hadn't heard about any changes with filter=0.
I was looking at a site today where as near as I could tell every page was marked supplemental this site up until recently had a large number of 302 leaches within its site: view.
It doesn't look like the owner of that site stands a chance.
Assuming (and I know what that does) that the problem was "duplicate content due to the 302's", how long will it take for his site to recover? Will it recover?
The unrelated urls that were once showing in site: search are gone. But, how long will dup penalties last? Are we still looking at 90 day penalties from this point on? As Dayo_UK says
steveb raises a point I am concerned with also.
If Google do sort the problems with the index will previously established sites be sandboxed.
Are provisions being made to allow sites to return to the serps after being penalized for problems beyond their control? :)
If the home page is still indexed with title and description, do you still need to do a reinclusion request? Seems they will say "your site is aleady indexed"
Many, many, many webmasters have been adversly affected by recent updates also. Many of sites have went supplemental or even loosing bulk pages. Is this a short term side effect of the recent changes, 302 fixes, or should those webmasters look to their own site as the cause of the problem. Don't want to try and fix something that isn't broken.
I am also currious on old 301 redirects why new content is being cached/displayed under old the old redirected URL and not the new. Could this cause any problems?
As soon as a site gets to around 6000 plus page views it gets hijacked and killed? Nothing can ever do well in this environment and maybe this is the idea?