Welcome to WebmasterWorld Guest from 188.8.131.52
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
I need a "Joe Public" description of the problem.
I'm an experienced webmaster and seo, and although I understand this thread, I'm not able to explain this problem to my employees (much less my wife) at this point.
I would also like to submit it to cnet, drudgereport, other threads, etc, but again, would anyone understand the nature of the problem?
Can anyone summerize this in layman's terms, maybe with some examples?
joined:Dec 29, 2003
joined:Dec 29, 2003
"I would also like to submit it to cnet, drudgereport, other threads, etc, but again, would anyone understand the nature of the problem?"
This has nothing to do with 302s. You have penalty X and it's because you did Y. How do you answer that?
I think it's easy to say a site doesn't rank because they've done something to earn a penalty, then disregard any of the other factors that might be a consideration. Your average searcher doesn't care about Google penalties, if they even know such a thing exists. All they know is someone told them about this great site about hand made blue widgets, but they couldn't remember the URL. So, they go to Google and type in "hand made blue widgets". What do they get in the SERPs? Do they get handmadebluewidgets.com? No, they get 50 or 60 directories that have scraped sentences from handmadebluewidgets.com, then 30-40 other sites that might be related that mention or link to handmadebluewidgets.com, or they get several dozen sites that mention blue, widgets, hand made, hand, etc.
That is NOT a relevant search. Especially when the site being searched for is buried at #256 for its own name. It's only a matter of time before the public notices this is happening, even now I've had people tell me they feel like they're wasting their time at Google. How long before the switch to a new SE happens?
Face it, there are some very basic SE criteria that Google is failing to deliver right now. And if Google is going to penalize sites for some unknown factor, and replace the penalized site with a hundred or so sites that MENTION that site instead, how is that relevant? Google isn't helping people to find what they want, they're throwing up detours and barriers instead.
It sounds interesting to me. My years-old site has just started dropping like it's hot, one term at a time, as of a few weeks ago. After reading your msg, I realized that a change I made around that same time eliminated a randomly-generated list of products that appeared on each page. I just put this code back in and will report what happens, if anything.
Please excuse the rushed explanation, I will post a mega example soon in more detail. This is for joe public.
Let me explain again how it is done. This time in more layman terms.
1, A blackhat reciprocates your link with a syntax on his page, it may at first appear to you that it is something like this;
<a href="http://the-killer-site.blackhat.com/go-php?=%2F%2Ftarget-site.com%2F">Target Site</a>
Do not be fooled by the above URL as being a link that points to your website. IT DOES NOT……Think
Look at where the href begins, it is pointing to the blackhat webmasters domain. The above link will look totally innocent to you when viewed from a browser. Take another look after the .com/…..This is the killer, it points to a variant of the NukeModule GO-PHP REDIRECTOR. A stoic and completely merciless script that can easily be modified and optimized to create havoc to googlebot.
Look closer still, you will see target-site as being the destination of the innocent looking link. Wrong again, that is cometics, the below is a version that that will do the exact job of the above.
<a href="http://the-killer-site.blackhat.com/go-php?=%2F%2FI-AM-GOING-TO-DESTROY-YOUR-SITE-BECAUSE-I-WANT-YOU-DESTROYED-IN-GOOGLE-AND-YOU-CAN-KISS-YOUR-RANKING-GOODBY-HA-HA-HA-HA-HA%2F">Target Site</a>
There is no difference at all about the two URL’s above. They will both look like this in a browser;
The 2 URL’s above are Identical so far as an end user is concerned. And they are totally harmless when a end user browses the page and totally harmless when it is clicked. But it is a trip wire….Don’t forget….IT IS A TRIP WIRE FOR ROBOTS, Especially for googlebot and msnbot.
Your browser does not collect information on a website to present it to google’s databases and your browser is unable to make an exact copy of the page limited to 101K in size. So no harm can come of the above links when people click away at it all day long.
THE TRIP WIRE
The blackhat webmaster has tweaked his go-php redirector to have no mercy on the site it points to by creating the conditions that confuse googlebot on the serverside directive.
A link is placed at another strategic page pointing to the page of the blackhats page that contains the 2 syntaxes above. And do not foreget, the 2 URL’s above are not links that are pointing out. No sir, do not get confused that it is a link…The 2 URL’s above point internally to where the blackhats go-php redirector resides. It is here where a surreptitious and dastardly sinister script goes to work. The trip wires for googlebot are the innocent looking links that look like “Target Site” to the average end user. But when googlebot follows it, it goes into action.
The go-php redirector tells googlebot that this is the target sites URL… [the-killer-site.blackhat.com...]
The go-php redirector tells googlebot via the serverside directive protocol that the location of the URL is temporary and it resides in the location protocol. Hence the 302 header information.
Remember this… “Target Site”… Its real syntax should have been <a href=”http://www.yoursite.com”>Target Site</a>
So, now googlebot has enough information to leave the blackhats site and deposit the gathered info to a temporary holding place at googleplex or wherever it is to dump the info. All of this occurred in a split second. But “”””WAIT”””” The doom of the Target Site is not yet been sealed, another more sinister event had taken place simultaneously as the 302 directive was dished out to googlebot, an unbelievably dirty trick has also been enacted to the detriment of the target site, the go-php redirector kicking into action and in unison with another deceptive method a META REFRESH pointing to Target Site had also been generated and this is a residual effect that will not go away no matter what you do to. A solid HTML page with its sole purpose to refresh in ZERO seconds to your site
Google’s databases now have a new URL waiting to be processed and it is <a href="http://the-killer-site.blackhat.com/go-php?=%2F%2Ftarget-site.com%2F">
Its LOCATION is [yoursite.com...]
One of the googlebots is given instructions to go fetch a snapshot of <a href="http://the-killer-site.blackhat.com/go-php?=%2F%2Ftarget-site.com%2F"> and that the location is [yoursite.com...]
The bot goes to find a short url version of [yousite.com...] in other words it is looking for [yoursite.com...] after its normall procedure to very domain existence the bot finds it to not resolve to the www version, but it exists and proceeds to approach the apache server with the resquest, it is given a 200 GET, takes a snapshot of the page returns the info for indexing in google.
The title of your index page goes here. And its url is the blackhats
The snippets of textual content goes here.
the-killer-site.blackhat.com/go-php?=%2F%2Ftarget-site.com%2F cache similar pages
Assuming the patented duplicate content filter of google has detected that another page in the index has identical content, then your site is in all sorts of trouble and this process is totally unpredictable with dire consequences to the established site. Google cannot and has vever declared whether it can penalize a page that does not exist. The above Hijacked example certainly does not exist, it was generated by manipulating the loophole in google that needs urgent modifications.
Thousands of websites have disappeared in the google results because of the above procedure. Meta refresh is not always produced but the results are often the same.
The variables are too immense to contemplate as to why very high pagerank sites are not affected. No amount of defence will protect your site from this kind of sabotage.
.htaccess, robots text, NO-INDEX, nor any other known defense mechanisms are able to stop the above process. Doing a 301 to resolve the short url will not help.
The past year has seen thousands of the short urls appear in google’s index with no title or content, no snapshot to help find out what caused it. Googles patented duplicate content filter can be tied in very closely to this anomaly based on the patent not been that old and the explosion in numbers of the short urls are almost same age.
Most such redirection methods are not done with malice in mind.
But getting a few competitors to be demoted seems a simple procedure so long as google does not implement modifications to block the loophole.
Googlebot is a virtual camera.
Now you have an awsome tool that is destructive becuase a patented google duplicate content filter does not seem to work in harmony with the bot for the betterment of its results but seems to favour the newer page that does not exist.
After the site removed their 302 redirect it was still listed as my site until today.
I figured that since nothing was pointing to their redirect code anymore that it would take a while to drop the duplicate page from the index so I submitted it via addurl last night.
This morning a search for "my unique text in site" no longer shows the redirect!
I'm still filter=0'd but I think that will correct itself in time.
Too bad that in my panic I managed to remove about 300 backlinks from my own site :(
I didn't take the time to read this whole thread, so
maybe this has been mentioned already, but a lot of
affiliates use redirects to hide their affiliate
links, to stop all of the spyware on peoples computers
from stealing their commissions. An affiliate almost
has to use a redirect nowadays if they want to get
any commissions, there is no harmful intent. If they
just use their regular affiliate link to the merchant, they will
have a lot of their commissions stolen from them.
I only think the problem exists like japanese said, not when someone uses a 302 to link to your site but when they steal a snapshot of your site and try to say it's at some new location.
As for spyware stealing affiliate links,!?!, not heard of this one yet, please give a link to a thread or pm me a website with more info on how to deal with this.
And I think this is because once a url is in Google's database, googlebot continues to go DIRECTLY back to that url for spidering.
IOW, if the original url is something like...
...even though the offending site removes the "target=yourdomain.com," Google will continue to back the the full original url containing your url.
Google will continue to see the original url as an actual page as long as that "linkto.pl" script is in place.
Can somebody answer these questions:
1. Is the problem that the "scraper" site is redirecting to a variation of your url that returns a page cannot be displayed?
2. Is the problem that they are stealing your content and putting it on their own site?
3. Is the problem that Google considers info at "http://domain.com/page.htm" and "http://www.domain.com/page.htm" to be duplicate content because one is missing the "www." in front of the address?
Am I anywhere close to describing the problem?
GO-PHP REDIRECTOR. A stoic and completely merciless script that can easily be modified and optimized to create havoc to googlebot
Please explain what they can possibly do with the script different than any other 302 redirect?
Based on your description, any unknowing person that slaps up a directory using off-the-shelf PHP software that uses this redirector is a black hat? I don't think so.
The problem is obviously not with the scripts or redirects, it's obviously Google's interpretation of the redirect. If people are exploiting the 302 bug, whether on purpose or inadvertently, you can't blame the technology they're using as it has never been the problem, the problem is Google.
So everyone should stop chasing Google to remove this link or that link which burns lots of Google resources and hammer on them to fix the global issue with their stinking 302 handling algorithm.
That's entirely Google and shame on them. And forget about the webmasters hurt by this, it is a shame for the users of Google who get cloaked to sites that Google recommends based on someone else's content.
Either Google can fix this issue immediately or they are morons. Simple: no credit to 302s when the supposed temporary URL has one non-302 link to it on its own site. That comports with the RFC, gives owners control over their domains and content and is the right thing to do for users.
Can't you just feel the class action lawsuit building?
Many websites have an intro page, some automatically check to see what type of browser you have or wether you have the plugins you need to view the website, language preference etc.
This way the website can send you the right page for you. For example if you have shockwave or flash software or not, the intro page will direct the browser to the compatible page. These type of intro pages typically use an automatic redirect where if you don't click on a link within a certain time the browser will automatically be directed to the proper page.
When you do a search in Google, the search engine does not want to send you directly inside the website to an incompatible page, therefore when google sees a page with the automatic redirect code, it assumes this is the intro page and sends surfers to the intro page for the content you are searching for. This way the website can check your browser for the appropriate software and provide you with the best possible surfing experience.
The google bug happens when website use this same type of redirect code to point to other websites. Most people do this for various legitimate reasons, usually it is a simple tracking method so they can record which links are being used and which are not. Google mistakes the other website as the intro page in these cases.
Some less scrupulous webmasters 'hijackers' took notice of this google bug and are exploiting it to the fullest, effectively hijacking other websites position within google search results. Google has gotten better recently, apparently fixing most of the accidental hijacks but the real hijackers have become very aware of this weakness and are taking it to another level using illegal or unethical methods. This is called 'google jacking'.
Although google has made changes to it's secret algythorim, a mathematical process used to determine which page is most relevant to a search query, webmasters have become very concerned about this issue. Many have lost their livlihood to 'google jackers' who uscupulously lure the surfer under false pretense of delivering the content described in Googles search results, into any number of less than ideal surfing experiences .
I think that pretty much sums up the issue in laymens language. Remember most surfers dont know what
even means, much less care.
This is just a draft, anyone care to make adjustments or should we send it off to the press?
edited - another speeling mistake
[edited by: Reid at 7:40 pm (utc) on Mar. 10, 2005]
[edited by: MikeNoLastName at 8:05 pm (utc) on Mar. 10, 2005]
Here is is.
When you click on a hyperlink, your web browser asks the server for the page.
A "302" redirect is simply the server telling the browser to look elsewhere for the page that was requested. Your browser then automatically looks at the address which accompanied the "302" and asks that server for the page.
For example, in your browser you type in abc.com.
At abc.com, the server sends a 302 which says to your browser "the page you're looking for is actually at def.com"
Your browser then goes to def.com to get the page.
There's no copying of content.
Many web sites use 302 redirects to count how many people have clicked on a link, otherwise there is no way to know when someone clicks a link on your site. Also, there are many php scripts for building directories which use 302s, and many people using asp or asp.net who use built-in redirects in those systems are also using 302s. There are many many reasons for doing so, most of which are not evil.
The problem is that Google, when it follows a link that returns a 302, files the destination page under the original url. The end result is that *anyone* who links to you using a 302 gets their link added to google using *your* page's content as their content. This can cause problems with google's duplicate content filter, and can end up causing your original page to be demoted in the listing, while the page linking to yours gets recognized as the "original".
The problem is one that only google can fix, and some indications are that they are already working on a fix, but because no one there is talking, the speculation still runs wild.
If it does fetch twice you may be able to defend against this attack by randomly varying the content on your page. Say, randomly different quotes come up, or rotate news stories, etc., somewhere on the page.
The idea would be to defeat Google's duplicate conttent filter. If it doesn't think the pages are duplicates, it won't replace one with the other.