Forum Moderators: Robert Charlton & goodroi
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
This is my very post, that makes me a newb to this incredible community!
This post is meant for those of you who read through the 1st 47 pages of this thread. (puuuh thats as far as I got, sorry if I am posting anything that has been mentioned within the past 3 pages)
At some point in this discussion folks were pretty close to organize to hijack of a voluntary site etc.
Ok now everybody hold your horses. This has been done already. Googles PR10 even has been hijacked, so we definitly know they are aware of it.
[snip: URL of site that mirrors high PR pages by cloaking.]
The days of PR seem to be counted. It's now worth ALMOST as much as an Alexa rank :)
Time 4 a smoke,
Chris
[edited by: ciml at 10:13 am (utc) on Mar. 15, 2005]
[edit reason] No specifics please. [/edit]
here, mostly, we're discussing a phenomenon where a site can be removed from the index because of a 302 hijacking and the duplicate content filter.
but i hadn't seen that fake pr10, i wonder if they can do anything with it besides sell fake PR.
I certainly don't approve of hijackers, nor of Google methods that allow websites to be moved to a lower position in serps because of redirects of non-owners exploiting a target website's title and content.
My side of the fence is the complaining webmaster victim side, to spell it out for you.
It seems to me the critical moment where a hijack works, is the Google choice of which of 2 duplicates wins on serps rank.
I agree it is oversimplification to say Google's criteria of "best page" is PageRank, the criteria is more complex.
The problem of hijacking seems less prevalent with other se's, who don't use the Google pagerank system.
Does anyone want to venture a theory on how other se's treat duplicate content?
If googlejacking has nothing to do with duplicate content algos, then I seem to have misunderstood the whole issue, and i'd like to hear the reason why hijackers bother with redirects to well ranked websites ("absorbing" their title and content), and why victims (targeted websites) fall out or down serps when a hijacker is at worK.
My argument is, 302 redirects are normal and useful and not at fault.
They are, however, the method used by hijackers to willfully generate duplicate content at Google, to trigger Google to make a choice, advantageous to the hijacker.
I believe Google should revise its duplicate content algo to counter these attempts at duplicate content not initiated by the "real" website (?), which of course is the problem being discussed here from the technical side of the original 302 method used by the hijacker to create that duplicate content.
The problem is Google not the internet protocols, as has been said quite a few times here.
If Google bots our websites (at our bandwidth cost, but hey we want to be indexed!) in order to provide a service to internet surfers, it's only fair we get a serps on merit (Google defines that for us -guidelines for webmasters- so we are self-confident).
Google is making the choice about duplicate content, when it happens.
If there is duplicate content with OUR website's title/content not initiated by us the webmaster, but by someone else, it should NOT be considered and certainly not trigger serps re-ranking as if we were spamming Google serps. That is the issue.
They do!, Adsense Solutions will work Great for everyone if they loose their rankings. Just wait and see how many Small Bis. Owners will start using Other Ways to get on ToP of the PILE. And the next one, and the next one and the next one.....
No FIX - NO CONTENT - Very Simple
I am really disappointed
Again, you are incorrect. The "best page" should be the original author, always. Scraper directory sites and hijacking urls that come along and use my content should NOT outrank me.
_______
Crobb, For what it is worth: It dosen't seem to be working that way.
I use many 302's to some of my one page sites Here is what I have seen and can prove.
The page that ultimately wins is the domain that has the links going into it.
Example Domain A has 6 or 8 good links going into it from various places on the net but is not an active website site, just a domain that has old links. If I point A to Domain B, that has no links at all, Google index the content of B under the Domain A and gives A a high rank because of the links.
It works that way all the time, ever since just after "Florida"
Trawler
I can only conclude that the fix for this problem must make the results worse than they are now or they would have patched this up already. Either that or it must be very difficult to make the algo interpret 302's as links.
Given how easy it is to exploit this bug, they need to get a move on (with a fix) or their business will be history within a couple of weeks.
Include the following code in all of your php frontend files.
$time = date("Y-m");
[code]$url2=$_SERVER["REQUEST_URI"];$ua=$_SERVER["HTTP_USER_AGENT"];if($ua=='googlebot') {if($date1!=$time) {header("HTTP/1.1 302 Found");header("Location: ht*p://www.yourdomain.com/redirect.php?url=".$url2."&date=".$time);header("Connection: close");} }
your normal content goes here
Create the redirect.php with this code.
<?phpheader("HTTP/1.1 301 Moved Permanently");header("Location: ht*p://www.yourdomain.com".$url"?date1=".$date);header("Connection: close");
What the first script checks is if there is a variable called date1 in the string. In this case it is the year and the month. (you can have year,month & day by changing $time=date("Y-m"); to $time=date("Y-m-d");) If it is there then the content is shown else it is temporarily redirected to another file redirect.php with date and the request url passed on to it with get method.
The redirect.php's only job is to permanently redirect back to the referring page by adding the timestamp (in this case year & month) in the url.
So
1) GBot requests www.yourdomain.com/index.php
2) Index.php redirects to redirect.php like this www.yourdomain.com/redirect.php?url=/index.php&date=2005-03
3) Redirect.php takes the variable and permanently redirects back to your original page changing its url to ht*p://www.yourdomain.com/index.php?date1=2005-03
4)Now when your index.php file is executed, $date1=$time (atleast for a month, if you want to change your homepage everytime Gbot comes then you can include the day also) as a result your normal page is shown.
You can include the script in every frontend php file you have.
Downsides
1) You url changes every month. Though the change will be only in the eyes of Google. But it will be done in a manner that all your PR passes on to your new URL.
Upsides
1) You can forget about hijackers, as your url is gonna keep changing. Even if some SEO firm is determined to hijack you and sets up links from lots of sites targeting you within the months gap you can thwart their effort by changing it daily. And if they are so
determined that they up the ante by targeting your site tons of redirects for everyone of your date ranges, it wont work coz its not like Gbot visits them daily or you daily. That is why a months time is enough. And instead of the timestamp you can have any random variable that you fancy.
But are you prepared to have a URL that changes monthly?
And if you are worried about the dynamic url, you can change it through mod_rewrite to a static one like this
ht*p://www.yourdomain.com/index.php/date/2005-03.
Lots of big sites do have URL that is a page long for their homepage.
If your site is already hijacked or in the process of being (if you already noticed it that is) whats your loss in trying it?
If anyone can convert the code to other language so that it benefits others who dont use PHP please post it here.
[EDIT REASON]Forgot to close one of the quotes[/EDIT]
I think that this is a solution. Not the best but the best available so far.
======================
1, No matter what script is used, googlebot can detect serverside directives.2, Within this environment, be it 301, 302, 303, 305 (proxy) and 307 the bot must obey that a redirect is indeed been implemented.
3, The bot must ignore the LOCATION FIELD.
4, The bot must take a snapshot of the generated CODE PAGE.
5, The generated CODE PAGE be indexed in google as the final destination of the redirect.
6, The end user can click on the redirect if the user so whishes.
7, If a META REFRESH exists in the generated CODE PAGE then the bot must ignore it.
Simple, effective and a robust solution.
Japanese the suggested solution can be possible only if Google is willing to ignore lots of big sites which emplpy this for their homepage itself.
We all know amazon. But did you know that it amazon.com permanently redirects to here
HTTP/1.1 301 Moved Permanently
Date: Tue, 15 Mar 2005 00:15:21 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: skin=; domain=.amazon.com; path=/; exp
ires=Wed, 01-Aug-01 12:00:00 GMT
Location: h*tp://www.amazon.com:80/exec/obidos/sub
st/home/home.html
Connection: close
Content-Type: text/plain
and h*tp://www.amazon.com:80/exec/obidos/sub
st/home/home.html temporarily redirects to another page which changes every time the above url is accessed.
HTTP/1.1 302
Date: Tue, 15 Mar 2005 00:17:51 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: session-id-time=1111478400; path=/; do
main=.amazon.com; expires=Tuesday, 22-Mar-2005 08:
00:00 GMT
Set-Cookie: session-id=102-9184878-1332124; path=/
; domain=.amazon.com; expires=Tuesday, 22-Mar-2005
08:00:00 GMT
Location: [amazon.com...]
home/home.html/102-9184878-1332124
Connection: close
Content-Type: text/html
check out the time difference between the above and below content and also the redirected location.
HTTP/1.1 302
Date: Tue, 15 Mar 2005 00:18:46 GMT
Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412
(Unix) amarewrite/0.1 mod_fastcgi/2.2.12
Set-Cookie: session-id-time=1111478400; path=/; do
main=.amazon.com; expires=Tuesday, 22-Mar-2005 08:
00:00 GMT
Set-Cookie: session-id=103-3137957-4566215; path=/
; domain=.amazon.com; expires=Tuesday, 22-Mar-2005
08:00:00 GMT
Location: [amazon.com...]
home/home.html/103-3137957-4566215
Connection: close
Content-Type: text/html
So if Google were to follow what you said Nobody will find Amazon ever. This kinda redirecting is very very common. Just check any of the top sites with more than 300k of pages. 302 redirecting is there exactly for this very reason.
I have also seen product link of amazon hijacked appearing in the serps with someones affiliate code. My personal theory is that it is not about the site atall.
If PAGE A redirects to PAGE B it is a fight between the PAGES and not the SITES. No matter what the page is.
[edited by: idoc at 1:12 am (utc) on Mar. 15, 2005]