Welcome to WebmasterWorld Guest from 188.8.131.52
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
Safaridude - google is all too obviously aware of what's going on, it's in the news, it is a difficult
problem to deal with. What is happening is people are linking to other sites with a 302 redirect which means
(in the future use the url of this page to find that page) so google IS following the user_agent standard
however they shouldn't in this case.
Zeus Nov 3 is probably too early since raw log files are usually discarded after six months but check your server,
it will be dated 20041103.gz or something like that. If it's there download the Nov 4 one. each day the raw log
file of the previous day is created. Unzip it with winzip (or equivalent) and open it with wordpad (not notepad).
If its there I can help you decipher it - it looks daunting but it's really very simple.
edited to fix a date error
thanks.. "the legitimate sites index page was absent in the result, most probably penalized by google" this clearly has happened to my site
i appreciate all the mode rewrites posted BUT how does this help a site owner who has no idea how to implement it..
wonderful examples snipped and googleguy no where in site... i was reading this post hoping to be able to send someone my site example.. im no seo but its a shame when someone cant find a site typing in a 5 word search with the exact company name.. how is someone to know that www.hijacking.com//cgi-bin/datacgi/database. cgi?file=LinkManager&form=HitOut&&record=33... - 9k - is my site.. Japanese started this thread and has now left the building.. what a shame
That would open up a whole new can of worms.
What negative affect would simply eliminating the following of a 302 redirect by Google have? In other words, as opposed to working a technical fix, what would happen if they simply stopped indexing/following 302 redirects? Would it cause widespread damage to the net?
Therein lies the dilemma... what if all those redirect tracking adlinks stopped being indexed? It will never happen. The reason this hijack is happening to begin with is partly dependent on necessitating the indexing of url's (such as these) that aren't real web pages to begin with. What I mean by that is...the url only exists as part of a redirect url.
I might post something on this later in the apache forum. I have spent some time the past couple days looking to how maybe to immunize for this by customizing apache. There are some folks in that forum way ahead of me on apache.
You right on this one, if Google would remove all 302's they would simply loose the battle of "I got more pages indexed than Others Media war", and the perception of "We are big, with lots of DATA Idea" - would simply heart them big time.
it is no secret by this time that:
Count as 6 PAGES in Google index, since there is a CACHED copy of each page in the index. So imagine on dynamic site that has 1000 products, having a CASE WRONG on URL variable. - 2000 pages indexed. What are the penalties for that? Duplicate content?
[edited by: blend27 at 4:49 pm (utc) on Mar. 17, 2005]
Also on a check on domain.com is showing a PR0 while www.domain.com shows PR5 - I have therefore redirected non-www to www to see what happens.
However it need not manufacture any.
I can see it now Google programmer goes to work for large bank and implements a transfer by copying money and putting it into someones account.
Yep the auditors would just love it along with the OCC, FICA, various state agencies, treasury folk, and the FBI as it creates plenty of employment.
In this case the recieving account holder would probably soil themselves while trying to get their account closed and located elsewhere.
However I don't think the company president would like to have to restate results, etc... etc ...
Most places I know of would roll heads really fast.
I am a bit confused whether I am suffering from a Hijack or not - my cache date is from 16th Feb - but internal pages are much more recent.
i am seeing this which is rather odd
Has anybody tried to file a DMCA with Google concerning a page hijacking
Yes I got redirected pages taken out of the index by doing this. Its my content - and neither google or the scraping site has any right to pretend otherwise.
The site was MIA. It took about about 2 or 3 weeks for g to remove the offending pages and about another 6 weeks for the site to reappear exactly where it should be.
But whether or not it works, I think its worth hitting them with as many DMCA's as possible on the basis that somebody there might (possibly) take notice
thanks to japanese
appreciate all the mode rewrites posted BUT how does this help a site owner who has no idea how to implement it..
We are still working on it but have not come up with a viable solution yet.
Believe me if we find one it will be the news of the year on webmaster world.
Thanks Clause this idea has a chance - better check with jpMorgan in apache forum - that guy knows his stuff.
The cache date means nothing... googledance is a good sign .. shows they are working on it. Clause stated earlier that he's seen some 302 pages disappear, sounds like good news.
Again here is the test to see if you are a victim of 302 hijacking bug.
search google for
has anyone researched this?