Forum Moderators: Robert Charlton & goodroi
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
Twist, see msg #218 - i summed up some suggestions there. Also, there's this new thread on one method of removing a redirect script from google (you can also return a 404):
[webmasterworld.com...]
I can't say for certain however:
If Google says that page is part of your site (shows up in the site: search) and it does in fact link to such stuff then what am I left to beleive. It is a link to a script that can even return:
Links to bad loactions, bad pictures, a page of very nasty words like #*$! that Brett's board autocensors. Before the word that was a nasty three character sequence of the twenty fourth letter of the English alphabet.
That one threw me for a loop when I first ran into it.
blah blah blah (php redirect code) http/your domain.com
Notice no semicolon and no double slash on Yourdomain.com.
This isn't a typo either. All the links on this site had the same method of listing the url.
It shows up in the allinurl command of the target site.
Of course this site is managed by GoDaddy/domainsbyproxy.com so I doubt this link will ever be removed.
While digging deeper into one of the hijacker sites I did see some questionable adult content...
the site I mentioned above is linked from one of those also and they linked to a "Christian" poet on my site with the same name as the owner of this site, obviously to use that name. Also, practically all the links on the page are to Christian sites with that name on their site somewhere.
So far, I haven't seen my particular circumstance discussed, either here or in the Adsense forum. Here's what I'm finding in an allinurl search for my domain:
www.spammingsite.com/ autolinks/?o=mydomain - 27k - Supplemental Result
and
www.nextspammingsite.com/cgi-bin/links/out.cgi?id=4115 - 26k - Supplemental Result
Both these return 302 in that header checker tool someone posted, so I'm assuming this is a hijack. Correct? Also, is "supplemental result" a good thing?
On the first example, when I click on the link, it returns my homepage, but with both my AdSense ads stripped out - one from the left sidebar and one from the center column. A fastclick ad is left in place. How did they do they that? And how does it benefit them?
My site is 4 years old, used to be PR6, now PR5, with main subdirectory index pages also PR5. Search terms that I used to have on page 1 are now buried deep in Google results. Please advise - what do I do?
When you find the results from the suspected hijacking site on the G SERPs, does the listing use the title you use on your page and does the cached version of that listing contain your content (as opposed to a scrapper site with a portion of your content)? If the cache for their funky link is exactly your content -- that's a hijack
When you say you clicked the link and the Adsense was gone, but FC ad displayed -- was this the actual link or the cache link? I find that in my (non-hijacked) pages, the cache will show FC but usually drops out the Adsense, or gives PSA on Adsense, or ads related to "Cache." (!)
The first site example, by the way, is a spam site extraordinare...you go to their domain and are hit with a page that is totally filled with hidden text words and a block at the bottom with non-hidden spam keyword stuffing.
Grrrrrr.....
The fact that these scumbags even appear in the same search results as my own drives me crazy!
What that means is that these 302 redirects have been in existence for months...my site is holding its own, although admittedly some search terms have been buried. Who knows how much better my standing in SERPS would have been without them?
My site focuses on crafts, most of which are designed by me, and I write the instructions and take the step-by-step photos to tell others how to reproduce the craft. This is first-rate original content...now being ripped off by a 302 redirect that anyone can use?
Twist - I posted about this problem in the Apache forum and Jim replied right away - I guess he doesn't read the google forums (or maybe gave up at post 400)
This is old hat to him he seems quite confident that there is no solution available to us, google needs to fix it and it is a lot harder than you'd think.
Also there are 2 more threads in google news (same folder as this thread) - One like you described - dedicated solely to dicussing solutions. (lay it out for google engineers type thing)
The other one is 'how to detect and remove 302 hijacks'
This thread has served it's purpose already - this thread is the google 302 forum - the word is out on the news and getting a lot of PR in SEO forums.
I'm not so sure I'd worry about anything in inurl: unless it looks really sketchy or is from x-rated or spammer etc.
302's are very common - for googlebot there are also a lot of 'internal' 302's on large servers for different functions unrelated to 'links'. So it is a big job.
Most tracking type 302 links are completely harmless as long as google recognizes them for what they are.
Sometimes a 302 method can mess up google (unknown to the webmaster creating them).
The real problem is that since the word got out ( a year ago?) ALL the spammers and scumbag types are using 302's in ways they never thought possible.
There are 'get rich quick' e-books based on exploiting this bug with scraper sites - this is in the top 10 ways now.
There is "SEO" software that promises - no "guarantees" top SERP's "for whatever keyword you want" just punch in the keyword and you get instant 'top ranking pages'. What do you suppose this software is based on? Exploiting the 302 bug "because it's unfixable".
So it is a big problem and is going to get bigger exponentially. I cant wait to laugh at these 'get rich quick' scemes when it no longer works or when google is so saturated by it that it no longer pays because everybody is hijacking everybody.
Yahoo figured it out and google will too. I think googles algo is just a lot more complicated and the big wigs just gave the order (about time) to fix it because it's in the news and being discussed - to big wigs that IS a problem. I bet they cringe when they look at this thread, 700 posts and not losing any steam.
It is intellectual dishonesty for GoogleGuy to tell us how to report hijacking problems when the reply you get back from them says:
"Thank you for bringing this to our attention. We understand your concern about the inclusion and ranking of your site. Please note that there is almost nothing a competitor can do to harm your ranking or have your site removed from our index.”
The two solutions to this problem are:
1)As Caleb posted today, Google should eventually fix it. They have fixed bigger problems in the past.
2)Take the law into your own hands to protect yourself in the interim to when and if (1) happens. Besides, this problem may transcend to other search engines to an extent. There has been talk that this happened. This would be akin to carrying your own guns in the days of the old wild west where vigilantes and gangs ruled. So I have a game plan that I intend to pursue and my own motive is that if I can convince others to do likewise, the effect will be stronger in numbers. It is as follows:
A)First priority is to choke off the information flow of your vital signs that act as bait for the 302’ers. Namely your traffic rank, and PR, including a jpg of your site and two 302’s back to you off low PR pages (seemingly innocuous). Hit the spider that is feeding some of your strategic info to that service. If you follow the other threads this week on the subject you will be able to figure out what to target. At a minimum, immediately robots.txt it. Hopefully you have a spider trap, if you don’t build one for mission-critical sites. After you get that done, created a rewrite based on that spider’s USER-AGENT [OR] the owner’s IP REMOTE ADDRESS ( [OR]that means if either condition is satisfied, the rule goes forward) to serve that spider and only that spider and it’s web service a page (other than your home page) that then does a meta refresh to the most disgusting image that you can find on the web. You can learn how to do this (the rewrite, not the disgusting) in the archives of Apache Web server forum. Then un-robots.txt the feeder spider. Don’t let you conscience get to you about hurting established web institutions when so many webmasters have been hurt bad by them on this issue; just hope that the hurting gets stopped.
B) Next step: Go after the sleazy, PR-Hogging puny directories that are 302’ing to you. You can find out which little directories are redirecting to you on a 200 versus a 302 basis by using this tool on their bogus little link to you: Tool is on Webmasterworld site: [searchengineworld.com...] Server Header checker. I'm afraid ther are a lot more out there than on the various doiagnostic google searches touted about this week. At least I found three more 302's against mine (no help from google search). Remember the search engine spider that will not give a REFERER property will follow the 302 to your home page anyway and there is nothing you can do short term to stop that (as antisocial as that REFERER property-cloaking on behalf of that spider may seem). You will be doing this to retaliate against the punk 302’ing, PR hogging, directory that is try to look official to the public with a disgusting and vile image, not porn please. Hopefully that will exceed even their comfort level, and they will make adjustments.
C)Identify your market place. Then IP block the A-blocks of entire global regions that are not in the market place. Look to the Internet Assigned Numbers Authority a-block allocations for guidance in making your decisions. If a bandit across the globe cannot find your site, it may not be targeted
D)Do the Base URL meta-tag thingy.
E)Look at the other search engines that will be eating into Google’s market share and learn their optimization game to work with them in the future so you won’t have to whine so much; a tiny little bit of that goes a long way on the other forums; check em out!.
A little bit about my outlook on Google: I only do this for the love of my hobby and to supplement income a little. I am not an IT professional. Heck, I’m only an engineer with an MS in it. I am highly specialized in research outside of IT. Three years ago I thought Google was the best, and I used it for my own searches. I lurked around this particular forum a lot and learned how to get to the top of their SERPS. Then about 2 years ago I saw Google’s competitors incorporating Google’s old PHD link-POP idea plus the newer competition’s idea of Direct Hit; a great idea involving “time spent at the site”. I began to follow the Apache Web Server Forum instead. Then Jim Morgan, the all-time guru, became moderator and I continue to look and learn from him. I have come to believe that Google had a great new idea for its time long ago, but I also know that all great new ideas have their time in the Sun. We haven’t seen anything great, new, and creative since; only War-Path. Overlooking, of course thier 8 gazillion megapixels or gigawatts, or whatever. And so when this issue is over, I will remove myself from this forum again, having no interest if following an Anachronism. I dig not your IPO, Mr. Secretive.
P.S. JD Morgan IS aware of the problem. It has been on the Apache Web Server Forum this week and he has posted. This is not a problem solvable with Apache Modules. My suggestions are a best long-shot and of questionable effectiveness. IMHO, they are ruthless but justified.