Welcome to WebmasterWorld Guest from 220.127.116.11
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
And I think this is because once a url is in Google's database, googlebot continues to go DIRECTLY back to that url for spidering.
IOW, if the original url is something like...
...even though the offending site removes the "target=yourdomain.com," Google will continue to back the the full original url containing your url.
Google will continue to see the original url as an actual page as long as that "linkto.pl" script is in place.
Can somebody answer these questions:
1. Is the problem that the "scraper" site is redirecting to a variation of your url that returns a page cannot be displayed?
2. Is the problem that they are stealing your content and putting it on their own site?
3. Is the problem that Google considers info at "http://domain.com/page.htm" and "http://www.domain.com/page.htm" to be duplicate content because one is missing the "www." in front of the address?
Am I anywhere close to describing the problem?
GO-PHP REDIRECTOR. A stoic and completely merciless script that can easily be modified and optimized to create havoc to googlebot
Please explain what they can possibly do with the script different than any other 302 redirect?
Based on your description, any unknowing person that slaps up a directory using off-the-shelf PHP software that uses this redirector is a black hat? I don't think so.
The problem is obviously not with the scripts or redirects, it's obviously Google's interpretation of the redirect. If people are exploiting the 302 bug, whether on purpose or inadvertently, you can't blame the technology they're using as it has never been the problem, the problem is Google.
So everyone should stop chasing Google to remove this link or that link which burns lots of Google resources and hammer on them to fix the global issue with their stinking 302 handling algorithm.
That's entirely Google and shame on them. And forget about the webmasters hurt by this, it is a shame for the users of Google who get cloaked to sites that Google recommends based on someone else's content.
Either Google can fix this issue immediately or they are morons. Simple: no credit to 302s when the supposed temporary URL has one non-302 link to it on its own site. That comports with the RFC, gives owners control over their domains and content and is the right thing to do for users.
Can't you just feel the class action lawsuit building?
Many websites have an intro page, some automatically check to see what type of browser you have or wether you have the plugins you need to view the website, language preference etc.
This way the website can send you the right page for you. For example if you have shockwave or flash software or not, the intro page will direct the browser to the compatible page. These type of intro pages typically use an automatic redirect where if you don't click on a link within a certain time the browser will automatically be directed to the proper page.
When you do a search in Google, the search engine does not want to send you directly inside the website to an incompatible page, therefore when google sees a page with the automatic redirect code, it assumes this is the intro page and sends surfers to the intro page for the content you are searching for. This way the website can check your browser for the appropriate software and provide you with the best possible surfing experience.
The google bug happens when website use this same type of redirect code to point to other websites. Most people do this for various legitimate reasons, usually it is a simple tracking method so they can record which links are being used and which are not. Google mistakes the other website as the intro page in these cases.
Some less scrupulous webmasters 'hijackers' took notice of this google bug and are exploiting it to the fullest, effectively hijacking other websites position within google search results. Google has gotten better recently, apparently fixing most of the accidental hijacks but the real hijackers have become very aware of this weakness and are taking it to another level using illegal or unethical methods. This is called 'google jacking'.
Although google has made changes to it's secret algythorim, a mathematical process used to determine which page is most relevant to a search query, webmasters have become very concerned about this issue. Many have lost their livlihood to 'google jackers' who uscupulously lure the surfer under false pretense of delivering the content described in Googles search results, into any number of less than ideal surfing experiences .
I think that pretty much sums up the issue in laymens language. Remember most surfers dont know what
even means, much less care.
This is just a draft, anyone care to make adjustments or should we send it off to the press?
edited - another speeling mistake
[edited by: Reid at 7:40 pm (utc) on Mar. 10, 2005]
[edited by: MikeNoLastName at 8:05 pm (utc) on Mar. 10, 2005]
Here is is.
When you click on a hyperlink, your web browser asks the server for the page.
A "302" redirect is simply the server telling the browser to look elsewhere for the page that was requested. Your browser then automatically looks at the address which accompanied the "302" and asks that server for the page.
For example, in your browser you type in abc.com.
At abc.com, the server sends a 302 which says to your browser "the page you're looking for is actually at def.com"
Your browser then goes to def.com to get the page.
There's no copying of content.
Many web sites use 302 redirects to count how many people have clicked on a link, otherwise there is no way to know when someone clicks a link on your site. Also, there are many php scripts for building directories which use 302s, and many people using asp or asp.net who use built-in redirects in those systems are also using 302s. There are many many reasons for doing so, most of which are not evil.
The problem is that Google, when it follows a link that returns a 302, files the destination page under the original url. The end result is that *anyone* who links to you using a 302 gets their link added to google using *your* page's content as their content. This can cause problems with google's duplicate content filter, and can end up causing your original page to be demoted in the listing, while the page linking to yours gets recognized as the "original".
The problem is one that only google can fix, and some indications are that they are already working on a fix, but because no one there is talking, the speculation still runs wild.
If it does fetch twice you may be able to defend against this attack by randomly varying the content on your page. Say, randomly different quotes come up, or rotate news stories, etc., somewhere on the page.
The idea would be to defeat Google's duplicate conttent filter. If it doesn't think the pages are duplicates, it won't replace one with the other.