Forum Moderators: Robert Charlton & goodroi
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
It's strange that people would rather debate "Google is broke" than fix their sites, but there are many things I'll never understand.
Yes my site is using Absolute addressing, and recently after reading your suggestion, I have placed a 301 from non-www to www.
Again I did an allinurl:mysite.com and found an offending site, when I clicked the cache my page showed but some of the images are missing, the ones that we didnt use the full url in. So to me I think its cacheing it from their domain. If I do a cached of any of my pages from my domain the pics show up.
On some data centers it seems this offending site is not there in my allinurl:mysite.com on 64.233.167.104 currently it is. For example on 64.233.161.99 it is not there the site is gone.
So maybe has a fix in the works?
Firstly, I think *my* sites problem is simply caused by the change of IP address.
I think Google stores its pages indexed under an number built from the current IP address and a hash of the URL.
So it doesn't know page www.fredbloggs.com/page.html it knows an 8 digit number that represents the page. If you change the IP address but not he URL of the page, it still thinks its a new page on a new site.
All the links change too, because their index number resolves differently they look like new links to Google.
That would explain why Google is reporting approximately 2 times the number of pages for my URL than actually exist and would explain our drop in rank and the loss of our pages from the index I think.
It also means time will fix it.
The 302 problem I think is different. Notice that Google groups results, so that 5 sites from the same company on the same subject will fight each other for rank. Google seems to determine that some sites are connected and so should displace each other.
I think they have a run that looks for 302, linking patterns, manual spam reports etc. and they build a list.
I think allegra simply inroduced a new list.
So it isn't the hijackers page that is displacing your page, it is the *other* sites the hijacker has stolen, some of which will be higher rank and if Google thinks you're all related from the same company, his page gets shown and yours doesn't.
That is what I think is happening. With Allegra, it was probably just a fresh run of the program and some new hijackers appeared. If I'm right Google will fix that table and time will fix the problem, but of course new hijackers may appear, so new, different, sites may disappear the next time they do this run.
The only fix for them is to check the table manually, and the Craig list job might be related to this.
Just my opinion.
You had best look at those urls ....
2 wagers:
1: You have your site listed under www.yourdomain.com and yourdomain.com, possibly ip addy, and mybe under a parked domain as well.
and
2: You have been whacked by Googles duplicate content filter.
and possibly this one as well.
3: There may or may not be a 302 problem with your site, my bet is you may find that as well.
Solution for the split site problem:
Search your server software documentation for canonical hostnames:
Canonical Hostnames
Description:
The goal of this rule is to force the use of a particular hostname, in preference to other hostnames which may be used to reach the same site. For example, if you wish to force the use of www.example.com instead of example.com, you might use a variant of the following recipe.
Solution:
# For sites running on a port other than 80
RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteCond %{SERVER_PORT}!^80$
RewriteRule ^/(.*) [fully.qualified.domain.name:%{SERVER_PORT}...] [L,R=301]
# And for a site running on port 80
RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) [fully.qualified.domain.name...] [L,R=301]
Then:
You might want to go through the results of a site:yourdomain.com and look at the green highlighted urls they should all have yourdomain.com before the first slash.
If any don't then you have the other problem as well.
How about a robots.txt directive for if you want to use 302s. You would have to specifically state you want to use 302s otherwise the default would be to treat 302s as 301s.
Sounds a litle simple so I've probably missed something like G$ wouldhave to admit the problem first!
I also don't think there is a 302 problem. All the pages that come up when I search for my site with a site command are mine, (at least the first 1000 I can check).
If I pick obscure phrases from my pages and search, I get 1 copy and no exact duplicates. And although Google says I have 2 times as many pages, 2 copies of my pages text are *not* appearing, only 1.
AllinURL shows other sites, but then its supposed to show pages with the specified words in the URL, so it *should* show other sites if they use a server side redirect script. Its been doing this for a year or more with this site and this is not new.
Many URLs Google lists are shown blank, this is consistent with a new site not yet pulled. But then I changed IP addresses and if they index their pages by IP address, this is what I'd expect.
So I don't think there is a problem, it just needs time to adjust to the new IP address, drop the old pages from the old IP address and pull the new ones. A few update cycles will restore my site without problem I think.
Another thing I made a scraper site here first this month, because I know its the future for google and its what it want so Im making my self redy if nothing changes this month, Im not waiting another 6 month, to create quality content if its not what google wants.
3 weeks and the whole site was indexed fully with description and all, respidered every week, that site I used 20 min. my other that is suffering from the google bug I used 3 years.
I see some changes here 66.102.11.99
What's weird is I don't check the data centres by ip address. I think I just went to google.co.uk (without the www) and the results were different (to www)