|302 Redirects continues to be an issue|
recent related threads:
It is now 100% certain that any site can destroy low to midrange pagerank sites by causing googlebot to snap up a 302 redirect via scripts such as php, asp and cgi etc supported by an unseen randomly generated meta refresh page pointing to an unsuspecting site. The encroaching site in many cases actually write your websites location URL with a 302 redirect inside their server. This is flagrant violation of copyright and manipulation of search engine robots and geared to exploit and destroy websites and to artificially inflate ranking of the offending sites.
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
>> If "these hijackers" unknowingly picked a couple news authorities (esp. IT news)
Actually i'll admit to doing exactly that. But, of course i didn't have the PR to turn over CNN and sites like that - never managed to highjack a single one of them. Wasn't my intention either.
I do use 302s myself, although i'm a bit more careful than most. I've got them all robots.txt-ed and haven't created new ones for more than a year because of this stuff. As i update very frequently that really makes running one particular website a whole lot more time consuming, and it also limits what features i can offer my users. Finally, it's not very good for the SE spiders because they see a lot of links that they just can't follow due to "robots.txt".
The 302 is really just the most common way to do stuff like this, eg. when you run a site that (a) tracks which links are the most popular, (b) don't want their links scraped, (c) have links stored in a database to check for 404s, or (d) whatever. I'm totally convinced that a large portion of highjackers don't even know that they are highjackers. And i'm just as convinced that others do it deliberately.
This is equally as large a problem for webmasters wishing to use redirects legitimately on their site [webmasterworld.com] as it is for other webmasters that get highjacked. There are a lot of valid reasons to use a redirect script, it doesn't have to be highjacking related at all.
So, seen from both sides, the way Google treats these 302's creates a lot of problems, and it simply sucks (being RFC compliant or not). These (and meta's etc.) should simply be treated as a plain link, nothing else.
In particular to japanese and claus, thank you both for making this problem understandable even to someone like me. And thanks to everyone else who is contributing to my understanding. I tried to explain this problem to the site owner that I work for, and received a shrug.. "What's it got to do with me?" For today, nothing, apparently. He's happy to be riding high at #1. When he dissappears completely his tune will be different, of that I'm certain.
Back to my understanding - now if I can understand the problem with my high-school drop education it's a mystery why a building full of PHD's cannot understand the problem, and why they cannot seem to find a resolution. Perhaps it is true - those things I've read here - they just don't care. I really hope that's not the case.
Beyond those thoughts I really have nothing to contribute in the way of examples, problems or solutions. For now, I'm waiting, watching, and praying it doesn't happen to me too.
One thing I pick up from the marvellous attempts to write up the problem in an understandable way, is that the bad guys always strip the "www".
Could someone explain why?
Next question: would it help to modrewrite in htaccess to "www" any calls for the non-www page (viz aimed at googlbot)?
I've been using redirects for years, but never once known about this issue.. and I know for certain that many many webmasters use them for no bad purposes. In php the most common script for this is header("location: [somesite.com");...] this I understand produces a 302 redirect, which from what I read hear causes the problems. So its easy to see why many use it with no bad intentions.
One reason (and possible the biggest) people use redirects is to stop PR leakage... for example if you want to link to some page from your high PR site, but do not what to loose some PR to it, the solution would be to use a redirect script. I think google even says this somewhere. This is way people adopt this method...
So now the webmaster cannot win.. link directly and loose PR or link indirectly and risk being called a google hijacker which most people on this thread are calling it :(.
So if nobody can come up with a solution, then you cannot blame anyone apart from google for this.
Sorry, I just had to get some of that off my chest :)
The page/site I speak of in message #24 in this thread [webmasterworld.com] previously held the #1 position for years in both Yahoo and Google for the exact same keyword.
The page has been dropped by Google, but remains #1 in Yahoo.
Of course it's always possible that some other factor is affecting it, but I don't think so, especially when you consider it has 5,000+ scraper pages linking to it.
Hmm does everyones site apear when you place a &filter=0 , be cause my site does not show up, but maybe thats because of not fully destributed PR through the sites and the loss of 90% of pages indexed.
Another thing I noticed on of the redirecting sites has a cache saying nov.3 that was the day my site lost it visitors, could it be that its the site that causes the main damage, Im just thinking loud here.
[edited by: zeus at 5:35 pm (utc) on Mar. 9, 2005]
Looking at my logs, and at google results, I think a solution is already being put in place. A couple of months ago, I had a few thousand 302 redirect pages listed in google from my various sites which linked to other sites. I just did a check now, and there are less than a hundred of them in the index. The redirect script was still there on my web server, although the sites themselves no longer used it, so google wasn't getting a 404 on it, but they've actually removed it from the serps.
|Hmm does everyones site apear when you place a &filter=0 , be cause my site does not show up, but maybe thats because of not fully destrubted PR through the sites and the loss of 90% of pages indexed. |
Another thing I noticed on of the redirecting sites has a cache saying nov.3 that was the day my site lost it visitors, could it be that its the site that causes the main damage, Im just thinking loud here.
I think you hit the nail on the head. Same thing happened when a 302 redirect showed up to my site. Went from 2000 searches per month to 200 and a search for "my site name" disapeared except for filter=0. I got them to drop the listing but it's still in the SERPS and my site is no where to be found. Seems the real danger is when they zap your home page. It's like it zaped the PageRank out of my site. Google toolbar shows PR3 (I know thats not a lot in the first place) but my DMOZ entry shows PR1.
What happens if, having identified a badguy, we duplicate the bad guy technique in reverse to "newlocation" where we put a 302 to our original page.
Who wins, who loses?
As said before if I dont see any solution to this in the update in this Month, I will take the parth to the dark side of the internet, means new way of creating site is have a few 302 links to good sites and then add your usual content. A company can not sit and wait for years until google fixes this problem, we have to follow the trend and if the whole index filled up with redircting site, then that must be the new way to create a site.
I have been doing that for months now. When my sites were taken down because google insists on rewarding MR Blackhat from his 302 redirect I figured if he can do it so can I.
Works fine for me. All you need are a few throwaway domains and good typing fingers.
I use 302 on all my traffic domains to the throway and then link directly to my sites from there.
Google is wrong on the 302 issue. If I have traffic from non developed domains why should I give that up.
"if the whole index filled up with redircting site, then that must be the new way to create a site"
Sad but true. You don't need to write content anymore...you can just creatively borrow existing content and get into the adwords business.
Claus said this first about a year and a half ago... <quote> 302's should just count as any other link. </quote> IMHO It's so simple there must be an undisclosed issue why it can't just be done so simply.
>> the easy solution would be for google to look at the Location header in 302 pages,
>> and if it points to a different domain, just don't store that page in their db.
Imho, exactly so :)
>> the bad guys always strip the "www".
Not always, it could just as well be "go.php?1234567" or some other nonsense, ie. not even the domain name.
>> would it help to modrewrite in htaccess to "www" any calls for the non-www page
That's always a good thing to do. I'm not sure it will help with this exact problem but it will help with related problems, and it's worth a try. When faced with this choice (www vs non-www) do pick the flavour that has most inbound links or highest PR, this will make it easier for you.
I should add: Lately 301-redirecting by .htaccess has led to undesirable sideeffects, so i don't really want to recommend using these at all for the time being. Perhaps, when it's the same second-level domain the danger is not so big, though. If you want to try, make sure you pick the most solid version and redirect the less solid one to that.
>> One reason (and possible the biggest) people use redirects is to stop PR leakage
I disagree. Readers of this and related forums are a small minority. It might be a big motive for most of these readers, but not to the rest of webmasters, most of which might think PR has to do with advertising.
Besides, i seem to recall that 302's did pass PR? At least some got PR through Yahoo directory in the past and that one used 302's - anyone remember that?
I heard that some tracking software or even content-managment software uses redirect scripts, so the people using the software are not even aware of what they are doing. You couldn't even begin to explain it to them. They are just happy with the sofware and are enjoying good pagerank because of it.
If yall are serious about making this a newsworthy issue, the place to go would be the news desks at WSJ or NY Times, and if they don't bite some smaller papers like St. Petersburg Times with a history of good investigative journalism.
What would be required to get a story in the news is a credible source or two. Some webmaster pissed because their gambling or booty site just got hijacked won't do it. A couple of folks with creditability and standing (like Brett) would be needed to provide quotes. This will be hard because they won't want to come forward.
Furthermore, the story would have to be explained in simple terms so that someone with ZERO knowledge of webmastering could understand the who, what, when, where, why and how of the problem and its consequences. I agree that sufficient resources would be diverted to the issue only under the pressure of public scrutiny (take a look a the security bugs in XP that Gates was forced address rather than advancing Longhorn, though that was a much more widespread concern than this).
A few tips:
Don't deal with techy rags because they'd be a dead end. Be prepared to have yourself quoted in superficial, sensationalist and distortion filled accounts with titles like "The End of Google?" particularly in the conservative media because Google employees are major Dem donors.
You could generate a story if you can explain it so that a five year old can understand it, but whoever is brave enough to be quoted, they should be prepared to be loathed by Google which despite its problems, increasing greed, recent incompetence and unfair sandbox/filtering is still the best (not to mention the biggest) engine around.
I'm not really sure it would be worth it, better to circulate a petition with the signatures of credible folks with an ultimatum that you'll be dedicated to advancing the story if Google doesn't make a statement here about fixing the problem with a clear, transparent, and verifiable roadmap.
Oh yeah that thing I mentioned earlier about tracking referrers and banning 302 redirects (like shooting yourself in the foot).
It was meant in the spirit of 'if it were possible' it would be like shooting yourself in the foot.
I have a solid contact at the WSJ relating to my other business. I would be willing to give him a call and connect him to someone who is going to give him the whole story. Non biased. Truthfully. No plugs. IE: I don't want to link him to an idiot and make myself look like such.
I don't have the time (prob because I've gotten around the issue) to deal with a whole story like this.
PM me if you're interested.
Can you harden your site against this attack by having some randomly generated content on every page? Your page would then be different from their page and google might not classify it as duplicate content. In that case you would still appear in the SERPs but so would the, as two separate entries. Still a problem but not as bad as losing your PR and your listing altogether.
K, I have a solution for this, and it involves....oh dear - Yahoo!
Can someone tell Yahoo that by placing 302-redirect links on their main page (PR-10) to all but the main Google page, and then cloacking it for regular users to be sent to Yahoo.com, they can effectively highjack all but one Google's page.
Guys, it's not that simple. Not every 302 page is highjacked, in fact only a few are. I suspect, those are pages that have different checksums at the time Gbot fetches them.
added GBot-checksum comment[/edit]
[edited by: aleksl at 8:24 pm (utc) on Mar. 9, 2005]
It's not 'news' anymore, but I think it would still be good for a thread like this to be on the homepage of ww. Mainly because it contains good wrap ups of whats going on in a way that most anyone can understand and it isn't filled with too much clutter (compared to the other threads). In a way, the fact that this is old news and hasn't shown any signs of correction is news in itself, no?..
Google is not doing something "right". The explosive growth of the problem occured here: "Searching 8,058,044,651 web pages"
Google is incompetently calling URLs "pages". Google only dramatically and stupidly increased the URLs it "searches".
URLs in html code are not web pages.
(Pagerank is just assigned via href so redirects do nothing to hoarde pagerank.)
<<"Searching 8,058,044,651 web pages">>
"Searching 8,058,044,651 URLs"
from reading this thread and other websites dedicated to exposing 'googlejacking' it appears that Google has been making changes, because a lot of people are reporting that these types of SERP's have dropped off considerably.
lets just hope the scumbags (ones who do it for a living) feel it too because (judging from the post that started this thread) this is going to be the next scumbag storm if Google doesn't deal with them in perticular.
Reid - I just wait this month, if nothing is done by google I will change my way to make websites, so It follows the way google wants it redirecting links and good content. I have waited 6 month and nothing, many has waited longer.
I posted about this in November. [webmasterworld.com...]
Since then, several emails to Google have NOT resolved anything. My most recent email in February again went unheard.
Incidentally, I don't think pagerank makes you less vulnerable. One of my sites has a pagerank of 7 and was successfully hijacked by numerous 302's with lower pagerank (even on the parent domain).
Google will NOT do anything about this. Sorry. Those of us who have been victims of hijacking for nearly a year are still gone from the index. At one point, there were over 40 redirects to my home page. I manually removed most of those using the Google url-removal tool and setting my robots metatag to 'noindex'. However, if I do a site:mysite.com search, there are still several urls that are NOT mine showing up even though those urls no longer redirect to my site. I can't remove these using the removal tool since they no longer redirect to me. Those urls were last cached on November 2, and until Google revisits them, gbot will never know they no longer redirect to me. Google has been informed, but they do nothing, and don't care.
Absolutely pathetic, and EXTREMELY disappointing. Google loses some great content by banning my site for the actions of other webmasters.
one more thing:
When I search inurl:example.com, several urls of the following form are showing:
The second one above is not even a legitimate url. Why would Google index it? These urls point to nothing.
Google is full of useless urls that should not even be listed in the index. So I agree with the comments above about Google incorrectly referring to urls as pages.
I respect their policy of not manually touching serps to an extent. But the examples presented in this thread are ridiculous. They need to have some method of cleaning up serps without waiting for their algorithm to become "perfect" enough to detect this crap.
Tose fake urls are from other sites that links that way, but ofcause you are right in this matter and I still think the only way to battle this is the NEWS.
I think it time to point other internet news -cnet, internet .... to this post its interesting news for them and webmasterworld also get a little popular.
well lets get on this, so far we have a WSJ contact, has anyone followed up with that?
are there any other people with press contacts?
does anyone else wanna do a little civil disobedience sit in at a google office with me?
Helps to know people... esp in the media. And if investors knew Google was allowing their serps to be taken over by redirect/spammy urls and eliminating good content urls, things might change...but I doubt it. They are too arrogant.
[edited by: crobb305 at 12:48 am (utc) on Mar. 10, 2005]
Stargeek it could be great to have a few banners there, but I just can not make it - I suggest any one that is hit now starts to find internet news websites, send them a email about this post, it just have to be a few lines then they can look for themself whats going on, there is no crime in that to offer some info which is very real.