Welcome to WebmasterWorld Guest from 188.8.131.52
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
I can't say for certain however:
If Google says that page is part of your site (shows up in the site: search) and it does in fact link to such stuff then what am I left to beleive. It is a link to a script that can even return:
Links to bad loactions, bad pictures, a page of very nasty words like #*$! that Brett's board autocensors. Before the word that was a nasty three character sequence of the twenty fourth letter of the English alphabet.
That one threw me for a loop when I first ran into it.
blah blah blah (php redirect code) http/your domain.com
Notice no semicolon and no double slash on Yourdomain.com.
This isn't a typo either. All the links on this site had the same method of listing the url.
It shows up in the allinurl command of the target site.
Of course this site is managed by GoDaddy/domainsbyproxy.com so I doubt this link will ever be removed.
While digging deeper into one of the hijacker sites I did see some questionable adult content...
the site I mentioned above is linked from one of those also and they linked to a "Christian" poet on my site with the same name as the owner of this site, obviously to use that name. Also, practically all the links on the page are to Christian sites with that name on their site somewhere.
So far, I haven't seen my particular circumstance discussed, either here or in the Adsense forum. Here's what I'm finding in an allinurl search for my domain:
www.spammingsite.com/ autolinks/?o=mydomain - 27k - Supplemental Result
www.nextspammingsite.com/cgi-bin/links/out.cgi?id=4115 - 26k - Supplemental Result
Both these return 302 in that header checker tool someone posted, so I'm assuming this is a hijack. Correct? Also, is "supplemental result" a good thing?
On the first example, when I click on the link, it returns my homepage, but with both my AdSense ads stripped out - one from the left sidebar and one from the center column. A fastclick ad is left in place. How did they do they that? And how does it benefit them?
My site is 4 years old, used to be PR6, now PR5, with main subdirectory index pages also PR5. Search terms that I used to have on page 1 are now buried deep in Google results. Please advise - what do I do?
When you find the results from the suspected hijacking site on the G SERPs, does the listing use the title you use on your page and does the cached version of that listing contain your content (as opposed to a scrapper site with a portion of your content)? If the cache for their funky link is exactly your content -- that's a hijack
When you say you clicked the link and the Adsense was gone, but FC ad displayed -- was this the actual link or the cache link? I find that in my (non-hijacked) pages, the cache will show FC but usually drops out the Adsense, or gives PSA on Adsense, or ads related to "Cache." (!)
The first site example, by the way, is a spam site extraordinare...you go to their domain and are hit with a page that is totally filled with hidden text words and a block at the bottom with non-hidden spam keyword stuffing.
The fact that these scumbags even appear in the same search results as my own drives me crazy!
What that means is that these 302 redirects have been in existence for months...my site is holding its own, although admittedly some search terms have been buried. Who knows how much better my standing in SERPS would have been without them?
My site focuses on crafts, most of which are designed by me, and I write the instructions and take the step-by-step photos to tell others how to reproduce the craft. This is first-rate original content...now being ripped off by a 302 redirect that anyone can use?
Twist - I posted about this problem in the Apache forum and Jim replied right away - I guess he doesn't read the google forums (or maybe gave up at post 400)
This is old hat to him he seems quite confident that there is no solution available to us, google needs to fix it and it is a lot harder than you'd think.
Also there are 2 more threads in google news (same folder as this thread) - One like you described - dedicated solely to dicussing solutions. (lay it out for google engineers type thing)
The other one is 'how to detect and remove 302 hijacks'
This thread has served it's purpose already - this thread is the google 302 forum - the word is out on the news and getting a lot of PR in SEO forums.
I'm not so sure I'd worry about anything in inurl: unless it looks really sketchy or is from x-rated or spammer etc.
302's are very common - for googlebot there are also a lot of 'internal' 302's on large servers for different functions unrelated to 'links'. So it is a big job.
Most tracking type 302 links are completely harmless as long as google recognizes them for what they are.
Sometimes a 302 method can mess up google (unknown to the webmaster creating them).
The real problem is that since the word got out ( a year ago?) ALL the spammers and scumbag types are using 302's in ways they never thought possible.
There are 'get rich quick' e-books based on exploiting this bug with scraper sites - this is in the top 10 ways now.
There is "SEO" software that promises - no "guarantees" top SERP's "for whatever keyword you want" just punch in the keyword and you get instant 'top ranking pages'. What do you suppose this software is based on? Exploiting the 302 bug "because it's unfixable".
So it is a big problem and is going to get bigger exponentially. I cant wait to laugh at these 'get rich quick' scemes when it no longer works or when google is so saturated by it that it no longer pays because everybody is hijacking everybody.
Yahoo figured it out and google will too. I think googles algo is just a lot more complicated and the big wigs just gave the order (about time) to fix it because it's in the news and being discussed - to big wigs that IS a problem. I bet they cringe when they look at this thread, 700 posts and not losing any steam.
It is intellectual dishonesty for GoogleGuy to tell us how to report hijacking problems when the reply you get back from them says:
"Thank you for bringing this to our attention. We understand your concern about the inclusion and ranking of your site. Please note that there is almost nothing a competitor can do to harm your ranking or have your site removed from our index.”
The two solutions to this problem are:
1)As Caleb posted today, Google should eventually fix it. They have fixed bigger problems in the past.
2)Take the law into your own hands to protect yourself in the interim to when and if (1) happens. Besides, this problem may transcend to other search engines to an extent. There has been talk that this happened. This would be akin to carrying your own guns in the days of the old wild west where vigilantes and gangs ruled. So I have a game plan that I intend to pursue and my own motive is that if I can convince others to do likewise, the effect will be stronger in numbers. It is as follows:
A)First priority is to choke off the information flow of your vital signs that act as bait for the 302’ers. Namely your traffic rank, and PR, including a jpg of your site and two 302’s back to you off low PR pages (seemingly innocuous). Hit the spider that is feeding some of your strategic info to that service. If you follow the other threads this week on the subject you will be able to figure out what to target. At a minimum, immediately robots.txt it. Hopefully you have a spider trap, if you don’t build one for mission-critical sites. After you get that done, created a rewrite based on that spider’s USER-AGENT [OR] the owner’s IP REMOTE ADDRESS ( [OR]that means if either condition is satisfied, the rule goes forward) to serve that spider and only that spider and it’s web service a page (other than your home page) that then does a meta refresh to the most disgusting image that you can find on the web. You can learn how to do this (the rewrite, not the disgusting) in the archives of Apache Web server forum. Then un-robots.txt the feeder spider. Don’t let you conscience get to you about hurting established web institutions when so many webmasters have been hurt bad by them on this issue; just hope that the hurting gets stopped.
B) Next step: Go after the sleazy, PR-Hogging puny directories that are 302’ing to you. You can find out which little directories are redirecting to you on a 200 versus a 302 basis by using this tool on their bogus little link to you: Tool is on Webmasterworld site: [searchengineworld.com...] Server Header checker. I'm afraid ther are a lot more out there than on the various doiagnostic google searches touted about this week. At least I found three more 302's against mine (no help from google search). Remember the search engine spider that will not give a REFERER property will follow the 302 to your home page anyway and there is nothing you can do short term to stop that (as antisocial as that REFERER property-cloaking on behalf of that spider may seem). You will be doing this to retaliate against the punk 302’ing, PR hogging, directory that is try to look official to the public with a disgusting and vile image, not porn please. Hopefully that will exceed even their comfort level, and they will make adjustments.
C)Identify your market place. Then IP block the A-blocks of entire global regions that are not in the market place. Look to the Internet Assigned Numbers Authority a-block allocations for guidance in making your decisions. If a bandit across the globe cannot find your site, it may not be targeted
D)Do the Base URL meta-tag thingy.
E)Look at the other search engines that will be eating into Google’s market share and learn their optimization game to work with them in the future so you won’t have to whine so much; a tiny little bit of that goes a long way on the other forums; check em out!.
A little bit about my outlook on Google: I only do this for the love of my hobby and to supplement income a little. I am not an IT professional. Heck, I’m only an engineer with an MS in it. I am highly specialized in research outside of IT. Three years ago I thought Google was the best, and I used it for my own searches. I lurked around this particular forum a lot and learned how to get to the top of their SERPS. Then about 2 years ago I saw Google’s competitors incorporating Google’s old PHD link-POP idea plus the newer competition’s idea of Direct Hit; a great idea involving “time spent at the site”. I began to follow the Apache Web Server Forum instead. Then Jim Morgan, the all-time guru, became moderator and I continue to look and learn from him. I have come to believe that Google had a great new idea for its time long ago, but I also know that all great new ideas have their time in the Sun. We haven’t seen anything great, new, and creative since; only War-Path. Overlooking, of course thier 8 gazillion megapixels or gigawatts, or whatever. And so when this issue is over, I will remove myself from this forum again, having no interest if following an Anachronism. I dig not your IPO, Mr. Secretive.
P.S. JD Morgan IS aware of the problem. It has been on the Apache Web Server Forum this week and he has posted. This is not a problem solvable with Apache Modules. My suggestions are a best long-shot and of questionable effectiveness. IMHO, they are ruthless but justified.
What if i have a PR 6 site, then created a blank directory with a htaccess 301 redirect to the most popular file/page on a site with a PR 2?
redirect /mydir/mysite.html h**p://somesite.com/somefile.html
The googlebot would also see this as duplicate content, and also think that the content it directs to belongs to my site.
If you point a 302 at another page from one of your pages, then the search engine will keep your URL and throw away the URL of the target page.
If you do it with a 301, then the search engine will throw away your URL and keep the URL of the target page.
So, 301's and 302's work in opposite directions.
I have to say that I am disturbed by the comments of Japanese "...Also note that your site may never gain its rank even after the removal of the offending links. " and Zeus... "...I dont see any solution to this topic, so maybe we most face the music and start to build pages with some redirecting scripts to good sites and then our own content, that way the scripts will be a form of SEO"
What do I say to a client that has invested $300,000 in web site development and SEO since 1997? Give up? Start over?
Anyone who has been SEOing for a few years probably thinks of pleasing G most of the time. We do things we think G will like, and avoid perfectly reasonable things we think G won't like, and that's in addition to building good content day after day. It's time to try something different.
I have decided to start thinking of G as a welcome but unreliable source of traffic. I'm going to concentrate on other engines and I'm going to go back to cross linking my network of sites where appropriate.
You see, I've actually deleted scads of self referencing links over the past couple of years as my SERP positions dropped. I used to get some nice cross traffic that way, which was nothing compared to G traffic when my positions were good. But it was real, legitimate traffic that I could predict and control.
Why should I sell myself short and expend all my energy on bowing and scraping before a god that will not be appeased?
On a final note -- do you recall those days of wandering in the desert after the fall of AltaVista but before the rise of Google? We took our traffic where we could get it. We got by, and we will continue to get by. And we'll meet with high profits again some sunny day...
I'm considering recommending that my organization implement this, but am airing it out in public first to see if someone can find a flaw in it.
Proposed Defensive Solution
Robots that index pages for a search engines may be tricked into believing that content from one site actually belongs to another. The sequence of events looks like this:
To protect against the scenario above, the administrator of victim.xyz can install a filter on her web server which will issue an HTTP 301 redirect back to itself if it thinks that the request might be the result of a malicious/erronious HTTP 302 redirect.
Here is how it works:
Because a robot might be smart enough to recognized that it is being redirected back to the current page, it would probably be a good idea to obfuscate the http 301 redirect by rewriting the URL in a technically insignificant way. For example, "http://www.victim.xyz/" might be rewritten as "http://www.victim.xyz/?"
Exactly how this filter would be implemented depends on the Web server platform and possibly the requirements of the organization. For example, it could be implemented as an Apache httpd module, an IIS ISAPI filter (or whatever the .Net equivalent is. It's been a few years since I've worked with Microsoft products), or a servlet in a J2EE setup. In some cases, it could even be implemented in a more localized scope using globally included PHP or ASP scripts, although I think I'd steer away from this because of the performance penalty.
I'd greatly appreciate feedback.
A web browser visits pages by going from one to the next by clicking on links, and may (it might not, as some people surf with referrers off) leave a referrer in your log (the referrer is the URL of the previous page that it was visiting, if that page linked to you). If someone typed the URL in, clicked on a bookmark, or has referrers off then you will not see a referrer. Don't confuse Referrer with User Agent. The User Agent part of the log entry says which browser and OS was used.
For search engines, they do not crawl the web going from one page to the next. They spider a page and add all links found on that page into a database. When they finish that page, they ask their own database for the URL of the next page to spider. It might be one on a different site! Multiple bots will adding to that database, and getting their next job from it, so you can have several bots from the same search engine on your site at the same time. Search engine bots leave User Agent information in your log, but they do NOT leave any referrer information, ever.
Here is how it works:
1. The robot visits [badguy.xyz...]
2. Badguy issues its 302 redirect as above
3. The robot follows the redirect to [victim.xyz...]
4. The filter at victim.xyz intercepts the request and examines it. The request either contains no referrer header or else the referrer header indicates that the client followed an external link to victim.xyz.
5. The filter determines if it has seen this particular web client recently. (This check could be as simple as scanning the last few lines of the Web server's access log.)
6. If the filter has not seen this client (the robot) recently, it issues an HTTP 301 ("moved permanently") redirect pointing to [victim.xyz...]
7. The robot follows the redirect to [victim.xyz...]
8. The filter at victim.xyz intercepts the request. This time, it recognizes that it has seen the robot bofore and lets the request through normally.
9. The robot receives Web content from the sever at victim.xyz and indexes it. Because it reached this site from a 301 (moved permanently)rather than a 302 (moved temporarily) redirect, it knows that the content belongs to victim.xyz rather than badguy.xyz and indexes it under victim.xyz. badguy.xyz never gets associated with the content.
you mean well, but theres flaws to the defense. mainly the already posted fact that a robot will look at the badguy.foo?url=victim.foo, see the 302 to victim.foo and just record it as if your content existed there under the badguy site. it at a later date then goes to index victim.foo.
so when googlebot is doing a normal crawl on your site your log checker will trigger and you'll start tossing 301s at goolgebot left and right.
its already been mentioned as one of the helpful possibilities (tossing 301s to robots upon their next vists to your site after you've been 302 serpjacked) amongst other things. all those things mostly being hail-mary type defenses, not really sure things.
to recap the long thread your possible defenses are:
1. adding dynamic elements to all your urls
2. sending 301s once per page to robots
3. dynamic content elements on all pages
4. 301/302s from victim.foo to www.victim.foo
5. contacting badguy.foo, asking them to remove listing
6. change domain names
7. try to get PR boosted over badguy.foo
8. changing link/navigation structure
9. removing badguy.foo via google removal tool
10. reporting site as spam to google
11. yelling and screaming at ww and /.
thats most of them- as you see, none are optimal in terms of seo practices.
[edited by: ciml at 4:21 pm (utc) on Mar. 25, 2005]