Forum Moderators: Robert Charlton & goodroi
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
As a result these moster affilaite domains, 10000s pages are getting assigned high PR and are in fact being returned in the results ahead of the main domain.
It's all lowcost, lowoverhead twisted jacking at the end of the day to try and take commission. It won't stop until either there's a law against it or it costs a lot of money to do!
I contacted Google Adsense and here is the answer:
"Google AdSense is a program for web publishers who want to display
advertising on web pages they control. By placing AdSense code on their
web pages, the publisher can display text-based Google ads that are
relevant to the content readers see on the pages. Publishers, not Google,
control what pages have ads and the content of those pages.
Google is a provider of information, not a mediator. We serve ads targeted
to certain web pages, but we don't control the content of these pages. For
these kinds of questions or comments, it is best to directly address the
webmaster of the page in question.
To uphold the quality and reputation of Google AdSense, please note that
all AdSense participants are held to our program policies (
[google.com...] ) and Terms and Conditions (
[google.com...]
If this website is found to be in violation of any of these policies, we
will take the appropriate action on the account."
Ownerrim, i agree that this makes no sense.
No, you shouldn't have to. And you shouldn't have to use absolute referencing either. You should just build your pages as you see fit, making sure they conform to the relevant standards for what they are supposed to do, but not really anything else. Both of the above are optionals, it's not anything you're required to do as a webmaster by any official body whatsoever. Of course, you shouldn't even have to worry about either using redirects or being redirected to either.
So, use one, or both, or neither, as you choose. I can't promise that any option will work, though - i wouldn't like to create false hope here. Nowadays, it's sometimes good to do a lot of stuff that you really shouldn't have to do.
First, that idea has been tried before at another forum with "search engine" as part of the title. I believe they had a site that simply volunteered to be hijacked, which in my view is very important, if not the most important issue at all.
Second,
We must have the backing of Claus and Brett
By posting here, I (as well as everyone else) have agreed to conform to the TOS of WebmasterWorld [webmasterworld.com]. It's not always that i (or anyone else) remember the TOS verbatim, so of course sometimes there are misses, but basically Brett has set some rules for this place that should be followed by the members.
I can't really speak for Brett, but i doubt he would back it, as i just looked up what the TOS of this board has to say about the matter:
#26: Claims of action, flames, and calls to action against any company or person will be removed.
Hint: The word "removed" does not exactly sound like "backed up" to me ;)
Third, what i would very much encourage you all to do is to follow this piece of advice for every single example you know about:
If people want to send specifics (i.e. "site A appears to have duplicate pages from, or is doing a 301/302/whatever to site B, and Google is wrongly picking site A as canonical", with actual values for A and B), I'd be happy to hear them. Drop an email to webmaster [at] google.com with the keyword "canonicalpage" (all as one word)
Include all the specifics you can find (like URL's, server headers, and whatever) but keep it factual. It's no use asking questions as you probably won't get an answer, so don't expect that. Just the facts, nothing else. Send it off into the big G webmaster inbox and expect nothing in return.
Last, you can always do your own write-ups about the situation on your websites and blogs and whatever. If you can find space for it, do include the quote under "third" above, please, as that's the only "confirmed tool" we have to remedy the problem. Spread the message as you see fit. Also, i hereby cancel and reverse what i wrote in post #54 of this thread about it not being intended for republishing:
Limited public license of right to copy (copyright):
Feel free to copy the whole or any parts of post #54 of this thread [webmasterworld.com] as authored by me to any web site, blog, or other medium of choice
- as long as you do all five of these things:
- you do not edit it so that it changes meaning or context
- you clearly state that you did not write it
- you provide a link to this thread and mention the post number so that it can be found by anyone wishing to examine the original
- you do not use the post to encourage, endorse or justify any action that it does not encourage, endorse or justify in and by itself, as it is.
- Specifically, you must explicitly state that you oppose to any kind of hijacking and that you do not encourage this.
What you don't need to do:
If you do the above you don't need to mention my nickname or real name (see profile), but i would appreciate it very much of course. I would also appreciate a link to this license (or post number) to accompany any quote, so that everybody can see that you have in fact been permitted to post it, but i'm not requiring that. You specifically don't have to link to any web site of mine.
That should be easy, right? I think/hope Brett will okay this, as it's one post only, he gets a backlink, and after all it is me that's the author. Let it fly...
Also, i see that there are some members that have seen some improvement on some datacenters. That's nice, but i also feel it's too early to say if this is really being solved. It could be all kinds of coincidences.
In this thread, the best bids sofar seems to be:always redirect non-www to www (or the other way round) use absolute internal linking (ie. include your domain name in internal links) include a bit of always updated content on your page (eg. a random quote, a timestamp, or whatever) use the <base href=""> meta tag on all your pages
Trying to determine just how effective these points are. As mentioned earlier I have not been affected by this 302 problem, and I do have 302 redirects pointing to my site. (They have been contacted and asked for removal)
Being a bit of a perfectionist, when I designed the make-over for my site I replaced all the links with absolute links. I added a bit of always updated content to many of my pages. My htaccess points all domain requests to www.domain. Most of this was done before the 302 problem began to surface.
In the past 18 months I have not seen my site hijacked. So, the question is, do these methods help prevent hijacking? If you were hijacked, do you use these methods? Or did you just start using them after you were hijacked? I understand that once hijacked it's a bit late to start employing different tactics - when you're gone you're gone until the G-Gods see fit to re-include you. But were you using these methods before you were hijacked? Your answers just might help another webmaster sleep a little easier.
I know that GoogleBot can’t provide the referrer when crawling a page because a) one page might have many links to it and hence more than one possible referrer and b) because the GoogleBot instance that crawls the source page of a link might not be the instance that crawls the target page of a link. BUT: Why not make an exception for 302 redirects? An adequate procedure to accomplish this can be laid out as follows:
1) GoogleBot crawls the redirecting page (source page) and gets a 302 along with a Location: header containing the URL of the target page.
2) GoogleBot adds or updates the document record for the target URL. If source and target URL belong to different domains, the record will also contain a reference to the document entry of the redirecting URL. When there is more than one redirect to the target page, the last fetched source URL wins.
3) When another GoogleBot crawls the target URL, it sets the referrer to the source URL. If the server sends a successful response, that GoogleBot indexes the returned content and attributes it to the source URL. The target URL will never appear in the SERPS or only appear among supplementary results. If, OTOH, the server responds with a 404 or any other well-defined error condition, GoogleBot removes the source page from the index and re-fetches the target page as usual, i.e. without passing the referrer.
This solution a) informs webmasters that there are redirects to their pages, b) allows them to decide whether the redirect is legitimate or not and c) lets them disallow illegitimate redirects if necessary. Also note that it never duplicates indexed content under more that one URL. The content is either attributed to the source or to the target URL of a 302 redirect. The necessary overhead consists of one additional field in document records (to hold the source URL of redirects) and one additional request for pages that are targets of disallowed redirects.
this
www.someotherdomain.com/ page.asp?title=target+keyword&url=http://www.mydomain.comother-domain.com/cgi-bin/tabi/navi/navi.cgi?links=82
www.anotherdomain.com/Redirect. asp?ID=188&url=http://www.mydomain.com%2F
today the site's title and description returned when searching for mydomain.com
and
site:www.mydomain.com - "some specific info from my site"
now shows my site again with the redirecting sites in the "Supplemental Results"
some traffic is back (well some of it anyway) and the cache date is the 10th of March.
The only bad news is the Google Toolbar pagerank which is grey, but I'm not too worries about this yet.
PS ther more to write and analyse here but it should be a good idea for someone to write a book about how the money fever can destroy something that a couple of years ago (the internet) use to be the best infodata we ever had in the planet.I can see people going back to the librarie very soon.