Forum Moderators: Robert Charlton & goodroi
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
The hijack is a combination of a php redirect (click counting type) script along with a randomly generated HTML page with a 0 second META refresh in the header.
Heres how it works.
Googlebot finds a link on scumbags site pointing to target.com.
looks like this scumbag.com/goto.php?path=target.com%2F
But googlebot is not sent by goto.php directly to target.com, instead it is sent to a randomly generated(unspidered) HTML page with a META refesh =0 pointing to target.com.
Googlebot then mistakingly assigns randonly generated page the attributes of target.com's homepage.
Later the duplicate content filter takes down target.com's homepage.
When this new site vanishes will it be via sandbox. the 302 bug. or just google doing evil?
Very good question. Sandbox, it will just get shoved down in the serps with no sign of hijacking, after what G in its feebleness now presents as temp freshbot listings that might have had it higher at first. 302 jacking could be discernible by some of the methods listed in the thread. Google doing evil is a wildcard. These are guesses mostly, but others who have been posting in the thread might have a better answer.
To go on record: I'm in no way affected by hijacking, (yet). My main site has had absolute links for a year and a half, and I have had an .htaccess taking care of non-www to www for the same time. This doesn't mean that it can't happen to my biggest source of funding at some point in the future, so I'm quite interested in helping to get things sorted out.
Top marks.
That is one method and one of the most sinister. That is exactly how a deliberate attack is deployed.
META REFRESH.... Why if someone is doing a link to your page does a 302? and on top of that a meta refresh.
Perhaps incrediBill can answer that for us. I await, agog, with fervent anticipation of his expeditious and articulately composed reply of scholarly proportions.
But googlebot is not sent by goto.php directly to target.com, instead it is sent to a randomly generated(unspidered) HTML page with a META refesh =0 pointing to target.com.
I don't think a randomly generated page is even needed. The one that flagged my site for duplicate content came from a normal site that just happend to link to me in their directory. They seemed to have no idea it would have that sort of effect.
Another site that linked to one of my sub pages on a new site has their url in place of mine for my page. My sub page itself isn't indexed, just theirs. I'm guessing googlebot spidered their link before mine so it assumed theirs was the original. Not getting any adverse affects from that one that I know of. I get a few google searches through that link. Still it would be nice to see my domain name there in the results for branding.
But what I can not figure out if this small sample of webmasters are smart enough to figure it out, and pretty much come up with a solution, then why does Google require "PHD's"?
What is it these PHD's have a Doctorate degree in exactly?
Some funny.
WHAT PHD REALLY STANDS FOR:
Patiently hoping for a Degree
Piled higher and Deeper (after BS = Bull..., MS = More of the Same...)
Professorship? hah! Dream on!
Please hire. Desperate.
Physiologically Deficient
Pour him (or her) a Drink
Philosophically Disturbed
Probably headed for Divorce
Pathetically hopeless Dweeb
Probably heavily in Debt
Parents have Doubts
Professors had Doubts
Pheromone Deprived
Probably hard to Describe
Patiently headed Downhill...
Permanent head Damage
Pulsating heaving Disaster?
Pretty homely Dork
Potential heavy Drinker
Professional hamburger Dispenser... "Would you like fries with that?"
Post hole Digger
Professional hair Dresser
Progressive heart Doctor
Professional humidity Detector
Piano hauling Done
Pro at hurling Darts
Professional hugger of Dames
Private house Detective
Pizza hut Driver
Pretty heavily Depressed
Prozac handouts Desired
Pretty heavy Diploma
Pathetic homeless Dreamer
Please hold Dangerous
Permanently held Dear
Proudly half Dead
Promised hell Down-the-road
Precisely helping Deadheads
Processed here, Dammit
Probably heavenly Death
Phinally done!
The sinister one (as described in my previous post) does real damage to the target.
Thanks Japanese - can I have my diploma now?
Because that page shows up in a google site:mydomain.com search which if G is working somewhat correctly would cause a duplicate page within the site.
Now this works quite well to poison a site. There has to be some other thing at work as well as this to provide the results that a hijacker would really want, namely SERP position.
The dup content problem should be a no brainer for G to fix. It still has all the information it needs in its database.
There still appears to be a part missing. This does the takedown, now the hijacker needs a way to pickup the vacated serp slots or destroy enough pages to rank.
Not sure, but here's what might also be happening:
You get scumbag.com to take down that link with your url as the target, but since the url is now in Google's database, Googlebot keeps going back and spidering the original url because it thinks it's a real page, which means that as long as that goto.php SCRIPT still is installed, the link to your site will still work and Google will see it as a live page.
So, the unfortunate target webmaster thinks he's solved the problem by getting the link removed from the offending site, but he really hasn't.
Google is only going to react to something which will materially
affect their bottom line; this may include their public image.
At any time there are probably hundreds if not thousands of schemes
running out there which affect the quality of the Google results.
This is just one of them.
The fact that the Google rep has not said a word in the threads
about this issue should tell you something.
I've read 3 threads completely over the last few days - did I miss him?
I am sure they are already aware of this particular scheme.
They know and have determined it is not big enough for them to care.
Make it a bigger problem and they will no longer be able to ignore it.
Posting here is helpful to many - but not going to get results.
Demonstrating the proof to others in this thread - interesting.
But not going to get a response.
Make the Problem More Visible
That will get results.
1. Publish a How to Steal Google Page Rank article on every affected site.
And anywhere else you can post it. Blogs, mailing lists, web sites, forums.
Start some viral marketing.
Submit your article pages to all the search engines.
2. Send an article idea email to the right editor or reporter.
People in the press are always looking for stuff to publish.
They welcome submissions.
But, only about the stuff they cover.
Blasting out 1,000s of emails to everybody is a waste of time.
Send it to the right person.
Who is the right person?
- Have they written about Google before?
- Do they cover the internet?
- Do they cover web marketing issues?
- Do they cover SE marketing?
- Who is their audience?
Reporters and editors only write about what their audience cares about.
What we care about does not matter.
You can help this by structuring the article idea to their audience.
The editor of Search Engine Watch is going to care about different
issues than the editor of a small business magazine.
These titles, all about the same issue, will appeal to different editors.
- How to Steal Google Page Rank
- Google Ignores Rip-Offs of Small Web Publishers
- Yahoo and MSN Spiders Superior to Google
- Online Marketing War - Big Operators Prey on Small Publishers
Add your own.
What has the reporter written before?
For what audience? How was it targeted? Technical, general, marketing?
For a targeted list of 1,000-2,000 editors and reporters I would see
anywhere from 2 to 10 articles depending on the newsworthiness of the story.
What to Send to Reporters
A brief clear and concise description of the issues.
Your description must be easy understand for the reporter and the audience.
You want the world to know? Put it in their language.
Leave your technical superiority ego at the door.
As I said at the top - bottom line or public image.
This issue is not going to affect the Google bottom line any time soon.
And they alredy know about this issue - and don't care.
Making the poblem bigger and speading the word to affect the image seems
to be the best strategy.
Conclusion
A highly SE promoted web site/page with a clear explanation, a HowTo,
and all the evidence will have more effect than continuing to solicit
a response from Google. And publish the HowTo everywhere.