Welcome to WebmasterWorld Guest from 188.8.131.52
Many unethical webmasters and site owners are already creating thousands of TEMPLATED (ready to go) SKYSCRAPER sites fed by affiliate companies immense databases. These companies that have your website info within their databases feed your page snippets, without your permission, to vast numbers of the skyscraper sites. A carefully adjusted variant php based redirection script that causes a 302 redirect to your site, and included in the script an affiliate click checker, goes to work. What is very sneaky is the randomly generated meta refresh page that can only be detected via the use of a good header interrogation tool.
Googlebot and MSMBOT follow these php scripts to either an internal sub-domain containing the 302 redirect or serverside and “BANG” down goes your site if it has a pagerank below the offending site. Your index page is crippled because googlebot and msnbot now consider your home page at best a supplemental page of the offending site. The offending sites URL that contains your URL is indexed as belonging to the offending site. The offending site knows that google does not reveal all links pointing to your site, takes a couple of months to update, and thus an INURL:YOURSITE.COM will not be of much help to trace for a long time. Note that these scripts apply your URL mostly stripped or without the WWW. Making detection harder. This also causes googlebot to generate another URL listing for your site that can be seen as duplicate content. A 301 redirect resolves at least the short URL problem so aleviating google from deciding which of the two URL's of your site to index higher, more often the higher linked pagerank.
Your only hope is that your pagerank is higher than the offending site. This alone is no guarantee because the offending site would have targeted many higher pagerank sites within its system on the off chance that it strips at least one of the targets. This is further applied by hundreds of other hidden 301 permanent redirects to pagerank 7 or above sites, again in the hope of stripping a high pagerank site. This would then empower their scripts to highjack more efficiently. Sadly supposedly ethical big name affiliates are involved in this scam, they know it is going on and google adwords is probably the main target of revenue. Though I am sure only google do not approve of their adsense program to be used in such manner.
Many such offending sites have no e-mail contact and hidden WHOIS and no telephone number. Even if you were to contact them, you will find in most cases that the owner or webmaster cannot remove your links at their site because the feeds are by affiliate databases.
There is no point in contacting GOOGLE or MSN because this problem has been around for at least 9 months, only now it is escalating at an alarming rate. All pagerank sites of 5 or below are susceptible, if your site is 3 or 4 then be very alarmed. A skyscraper site only need create child page linking to get pagerank 4 or 5 without the need to strip other sites.
Caution, trying to exclude via robots text will not help because these scripts are nearly able to convert daily.
Trying to remove a link through google that looks like
new.searc**verywhere.co.uk/goto.php?path=yoursite.com%2F will result in your entire website being removed from google’s index for an indefinite period time, at least 90 days and you cannot get re-indexed within this timeline.
I am working on an automated 302 REBOUND SCRIPT to trace and counteract an offending site. This script will spider and detect all pages including sub-domains within an offending site and blast all of its pages, including dynamic pages with a 302 or 301 redirect. Hopefully it will detect the feeding database and blast it with as many 302 redirects as it contains URLS. So in essence a programme in perpetual motion creating millions of 302 redirects so long as it stays on. As every page is a unique URL, the script will hopefully continue to create and bombard a site that generates dynamically generated pages that possesses php, asp, cigi redirecting scripts. A SKYSCRAPER site that is fed can have its server totally occupied by a single efficient spider that continually requests pages in split seconds continually throughout the day and week.
If the repeatedly spidered site is depleted of its bandwidth, it may then be possible to remove it via googles URL removal tool. You only need a few seconds of 404 or a 403 regarding the offending site for google’s url console to detect what it needs. Either the site or the damaging link.
I hope I have been informative and to help anybody that has a hijacked site who’s natural revenue has been unfairly treated. Also note that your site may never gain its rank even after the removal of the offending links. Talking to offending site owners often result in their denial that they are causing problems and say that they are only counting outbound clicks. And they seam reluctant to remove your links....Yeah, pull the other one.
[edited by: Brett_Tabke at 9:49 pm (utc) on Mar. 16, 2005]
The hijack is a combination of a php redirect (click counting type) script along with a randomly generated HTML page with a 0 second META refresh in the header.
Heres how it works.
Googlebot finds a link on scumbags site pointing to target.com.
looks like this scumbag.com/goto.php?path=target.com%2F
But googlebot is not sent by goto.php directly to target.com, instead it is sent to a randomly generated(unspidered) HTML page with a META refesh =0 pointing to target.com.
Googlebot then mistakingly assigns randonly generated page the attributes of target.com's homepage.
Later the duplicate content filter takes down target.com's homepage.
When this new site vanishes will it be via sandbox. the 302 bug. or just google doing evil?
Very good question. Sandbox, it will just get shoved down in the serps with no sign of hijacking, after what G in its feebleness now presents as temp freshbot listings that might have had it higher at first. 302 jacking could be discernible by some of the methods listed in the thread. Google doing evil is a wildcard. These are guesses mostly, but others who have been posting in the thread might have a better answer.
To go on record: I'm in no way affected by hijacking, (yet). My main site has had absolute links for a year and a half, and I have had an .htaccess taking care of non-www to www for the same time. This doesn't mean that it can't happen to my biggest source of funding at some point in the future, so I'm quite interested in helping to get things sorted out.
That is one method and one of the most sinister. That is exactly how a deliberate attack is deployed.
META REFRESH.... Why if someone is doing a link to your page does a 302? and on top of that a meta refresh.
Perhaps incrediBill can answer that for us. I await, agog, with fervent anticipation of his expeditious and articulately composed reply of scholarly proportions.
But googlebot is not sent by goto.php directly to target.com, instead it is sent to a randomly generated(unspidered) HTML page with a META refesh =0 pointing to target.com.
I don't think a randomly generated page is even needed. The one that flagged my site for duplicate content came from a normal site that just happend to link to me in their directory. They seemed to have no idea it would have that sort of effect.
Another site that linked to one of my sub pages on a new site has their url in place of mine for my page. My sub page itself isn't indexed, just theirs. I'm guessing googlebot spidered their link before mine so it assumed theirs was the original. Not getting any adverse affects from that one that I know of. I get a few google searches through that link. Still it would be nice to see my domain name there in the results for branding.
Actually, googlebot gets its information from the redirector. From here on we need to know exactly how googlebot behaves in such an environment.
We need an expert in this field to explain the exact process inch by inch. Unless the expert has knowledge of the bot it won't be much good to us.
But what I can not figure out if this small sample of webmasters are smart enough to figure it out, and pretty much come up with a solution, then why does Google require "PHD's"?
What is it these PHD's have a Doctorate degree in exactly?
WHAT PHD REALLY STANDS FOR:
Patiently hoping for a Degree
Piled higher and Deeper (after BS = Bull..., MS = More of the Same...)
Professorship? hah! Dream on!
Please hire. Desperate.
Pour him (or her) a Drink
Probably headed for Divorce
Pathetically hopeless Dweeb
Probably heavily in Debt
Parents have Doubts
Professors had Doubts
Probably hard to Describe
Patiently headed Downhill...
Permanent head Damage
Pulsating heaving Disaster?
Pretty homely Dork
Potential heavy Drinker
Professional hamburger Dispenser... "Would you like fries with that?"
Post hole Digger
Professional hair Dresser
Progressive heart Doctor
Professional humidity Detector
Piano hauling Done
Pro at hurling Darts
Professional hugger of Dames
Private house Detective
Pizza hut Driver
Pretty heavily Depressed
Prozac handouts Desired
Pretty heavy Diploma
Pathetic homeless Dreamer
Please hold Dangerous
Permanently held Dear
Proudly half Dead
Promised hell Down-the-road
Precisely helping Deadheads
Processed here, Dammit
Probably heavenly Death
The sinister one (as described in my previous post) does real damage to the target.
Thanks Japanese - can I have my diploma now?
Because that page shows up in a google site:mydomain.com search which if G is working somewhat correctly would cause a duplicate page within the site.
Now this works quite well to poison a site. There has to be some other thing at work as well as this to provide the results that a hijacker would really want, namely SERP position.
The dup content problem should be a no brainer for G to fix. It still has all the information it needs in its database.
There still appears to be a part missing. This does the takedown, now the hijacker needs a way to pickup the vacated serp slots or destroy enough pages to rank.
Not sure, but here's what might also be happening:
You get scumbag.com to take down that link with your url as the target, but since the url is now in Google's database, Googlebot keeps going back and spidering the original url because it thinks it's a real page, which means that as long as that goto.php SCRIPT still is installed, the link to your site will still work and Google will see it as a live page.
So, the unfortunate target webmaster thinks he's solved the problem by getting the link removed from the offending site, but he really hasn't.
Of course they do. They have known about this issue since atleast April of 2003. It was at WebmasterWorlds Boston PubCon where I remember it first being brought up.
It has been talked about here on/off for over a year.
Google is only going to react to something which will materially
affect their bottom line; this may include their public image.
At any time there are probably hundreds if not thousands of schemes
running out there which affect the quality of the Google results.
This is just one of them.
The fact that the Google rep has not said a word in the threads
about this issue should tell you something.
I've read 3 threads completely over the last few days - did I miss him?
I am sure they are already aware of this particular scheme.
They know and have determined it is not big enough for them to care.
Make it a bigger problem and they will no longer be able to ignore it.
Posting here is helpful to many - but not going to get results.
Demonstrating the proof to others in this thread - interesting.
But not going to get a response.
Make the Problem More Visible
That will get results.
1. Publish a How to Steal Google Page Rank article on every affected site.
And anywhere else you can post it. Blogs, mailing lists, web sites, forums.
Start some viral marketing.
Submit your article pages to all the search engines.
2. Send an article idea email to the right editor or reporter.
People in the press are always looking for stuff to publish.
They welcome submissions.
But, only about the stuff they cover.
Blasting out 1,000s of emails to everybody is a waste of time.
Send it to the right person.
Who is the right person?
- Have they written about Google before?
- Do they cover the internet?
- Do they cover web marketing issues?
- Do they cover SE marketing?
- Who is their audience?
Reporters and editors only write about what their audience cares about.
What we care about does not matter.
You can help this by structuring the article idea to their audience.
The editor of Search Engine Watch is going to care about different
issues than the editor of a small business magazine.
These titles, all about the same issue, will appeal to different editors.
- How to Steal Google Page Rank
- Google Ignores Rip-Offs of Small Web Publishers
- Yahoo and MSN Spiders Superior to Google
- Online Marketing War - Big Operators Prey on Small Publishers
Add your own.
What has the reporter written before?
For what audience? How was it targeted? Technical, general, marketing?
For a targeted list of 1,000-2,000 editors and reporters I would see
anywhere from 2 to 10 articles depending on the newsworthiness of the story.
What to Send to Reporters
A brief clear and concise description of the issues.
Your description must be easy understand for the reporter and the audience.
You want the world to know? Put it in their language.
Leave your technical superiority ego at the door.
As I said at the top - bottom line or public image.
This issue is not going to affect the Google bottom line any time soon.
And they alredy know about this issue - and don't care.
Making the poblem bigger and speading the word to affect the image seems
to be the best strategy.
A highly SE promoted web site/page with a clear explanation, a HowTo,
and all the evidence will have more effect than continuing to solicit
a response from Google. And publish the HowTo everywhere.
They have the 302 page labeled as being in your domain but the url isn't.
Simply delete the entry from the database and don't allow any such entries to be placed in the database.
That takes care of the dup content part of the problem, within the site.
Thanks for joining in.
I have been asked by many innocent victims of this fiasco to try and help them. I am powerless. Totally powerless and have no idea how google's bots behave within the actual process of a 302 status code, especially with so many variant scripts.
This story is near impossible to turn into a Joe Public story.
Best I can offer is to raise some dust on behalf of confused webmasters and website owners.
I tested the water by suggesting a site dedicated to pump out redirects. I feel guilty just by saying it.
I was always under the impression that the 302 status code is a temporary holding directive to robots and that meant that the bots should continue to visit the redirecting url for any changes. You only did this 302 to point to an authorized site or to your own site or alternative site that you owned, not somebody else.
But it seems using the 302 as an alternative method of linking is now the name of the game. Wow think of it, 300, 301, 302 and 303 combinations. The internet is going to get complicated, very complicated for the average user and he is going to get swallowed up in a complex procedure to the point that his website is overwhelmed by the dexterous handlers of these status codes.
I'm sorry, I gave up about the msg #300 and I still not getting it about the 302 redirects.
Please, see :
I have a little directory, I have an script on aspx to count the clicks outs.
The script uses the standard Response.Redirect from asp library, it looks like :
If I examine the link with an HTTP viewer it says
HTTP Status Code: HTTP/1.1 302 Found
the code generated by my redirect looks this way in the http viewer :
<h2>Object moved to <a href='http://www.theothersite.com/'>here</a>.</h2>
Now, my site is indexed by google and it has something like PR1. The sites I'm linking to, still indexed and well ranked on google.
So, where is the problem?
If google adjusted their bots to ignor the LOCATION FIELD instruction in the 3** range of redirects and proceeded to cache the code generated page, still at the redirect status. The visit to the location page will be null and void. Thus avoiding any damage to an unsuspecting site.
THIS TRUELY MUST BE THE ANSWER
In theory and in practice the above should always have been the case. The best a redirecting site could do would be a code generated page with a hyperlink to the target site. This is what a robot should do. Not go to the target site. If a meta refresh is detected then googlebot should totaly ignore it. Let the guy have his meta refresh.
The above would render any redirect harmless.
The cost to innocent sites is now far beyond a joke.
If inexperience of webmasters and tactical hijacking using these 302 status codes is going to continue to be a problem with the bots, and will continue to be ignored by google, then we must work together to find a sensible solution.
What do you think of the above? Simple and effective? darn right silly? flawed?
"not Yahoo, MSN, Jeeves, etc..."
As mentioned above, both Google and MSN have this problem.
Yahoo had this problem early in 2004. I recall them being concerned, with some questions posted by Tim and Yahoo_Mike in the Yahoo Forum. My site disappeared, as it did around the same time in Google. But with Yahoo, the issue was resolved within 12 weeks. Unfortunately, Google doesn't seem to care. They have the man power and the intelligence to resolve the problem, and would have done so many moons ago if they really cared. Email after email to Google and to a special Google Groups location by many members have resolved nothing. Sad to say that those of us who have suffered from the 302 hijacking may never reappear in the serps.
Thanks to Brett and SteveB for chiming in. :)
Have a good weekend.
I suspect that at least some of the issues in this thread that people are blaming on 302s and scraper sites are probably more likely due to recent algo changes
At one point during the past 9 months, there were over 40 tracker2/302 redirects to my site. Searching for pages within my site using the site:mysite.com command was showing many of those redirect urls! Google was actually associating those 302s with my site and was listing them as part of my site evident in the site:mysite.com search. This is proof of the problem. If Google lists 20 UNRELATED urls as being part of my site then there is clearly a bug that needs to be resolved. Again, as I just stated above, they are very very aware of this problem.
BTW, for newbies here, the site: command is supposed to show ONLY pages that are really and truely part of your site (i.e., home page, internal pages, etc.). So, if Google lists other urls that are NOT part of your site, then there is a problem.
Yes, 302s have existed for years, but only recently did they start to pose a problem to Google.
jk3210, that's a very important point. As long as the link is in Googles database it will get spidered. One time there, always there - unless removed by URL-console or returning a 404 or 410 for a sustained period of time. And, as long as the script works it will have the desired effect.
So, it is not even enough to get the link to the script removed from a page, you must make sure that the script no longer works for your URL. Or, that it returns a 404, a 410, or a page with this meta tag:
<meta name="robots" value="noindex">
>> Email after email to Google and to a special Google Groups location by many members have resolved nothing
It's true that we don't even know for sure if Google (or MSN) is working on solving this problem or not. I do think that at least Google is, as otherwise there would be no reason to ask us to:
send examples to webmaster (at) google.com with "canonicalpage" (all as one word) in the email title
As for why Yahoo could solve it quickly while Google does not seem to be able to, i think there's a difference between how these two firms organize their data. It might in fact be very complicated although it seems easy on this side of the table, we just don't know unless we work there.
It would still be nice to get some sort of semi-official indication that they were actually working on "something" (or even "thinking about some possible improvements to ...whatever"). I don't consider it likely that we will get such an indication i must say.
I personally hope that if google does find a solution, the solution will be published just like Yahoo! did, so that webmasters can see how the various kinds of redirects are interpreted. I even remember Yahoo! asking for comments on their set of rules, which was a very nice move. I don't remember if i personally had any comments (if i had i might have disagreed a little), but i think Yahoo has really taken the webmaster community seriously here.
Still, we need to continue to push this issue so that more webmasters (and Search Engines) become aware of it. Even though Yahoo! has got a solution in place (and kudos to Yahoo for that), we still need Google and MSN to do something about this issue.
The ideal situation would of course be that all three major engines (+ Ask of course) came together on this issue and coordinated their efforts, so that we would know that these techniques were interpreted the same way in all these engines. While this would be nice, it would probably also take ages and involve a lot of red tape, so i guess it's not very likely at present.
I don't consider it likely that we will get such an indication i must say.
Why do you think this is? Is it perhaps in its current form this "problem" is really a positive for thier bottom line? everytime a website looses free traffic there's a good chance they'll get an adwords customer.
It is not that you are entitled to something with 302, you pass your "credentials" to another place.
In other words is someone does 302, all his links, PR etc. should be passed to target place and not otherwise.
Intro pages. Most intro pages use META refresh header tag which is a 302 redirect. Once the intro page checks for browser queries etc. Remember there are many reasons for intro pages, check for WAI issues such as screen resolution, voice browser or braille, also flash or shockwave.
If a site uses Meta refesh on its home page, Google likes to send the browser to the intro rather than deeper within the site, bypassing browser queries and risking a misinterpreted WAI compliant page.
The real problem is when there is a cross-domain redirect.
It is not Googles misinterpretation of 302, it is webmasters misuse of it. A link to another domain should not pass a server code at all - the browser is leaving that domain.
So Google really needs to sort out the domains to conteract webmasters common misuse of 302 redirects.
also thebear you are right - sites are being poisoned by these 302's, which is bad too - I was jsut trying to distinguish between the common innocent 302's and the purposefully sinister type.
I had a 302 split my site in half by pointing to the index page of my photo gallery. site:mysite showed the 302 link and my site without the photo gallery which was found in Links:mysite associated with that same url.
Aside from a possible solution from Google side and I am sure there are many of them (a simple one would be introducing "referrer" header from a bot)
activeco, these are my thoughts too (hmm...maybe we've stumbled upon a new form of hijacking - thoughtjacking).
A referrer string from the bot would allow each webmaster to manually block the bot originating from known hijackers, a sort of after the crime solution.
I don't know the complexities involved with applying a referrer header for Google (though I suspect it's unlikely they would ever do so) but it seems to me that the crime must be prevented rather than corrected.
Assuming they did apply a referrer header it would require each webmaster to first deduce which domains are hijackers and then apply changes, something that just ain't gonna happen in most cases.
What we really need is to work together with knowledgeable representatives at Google thru this or some other forum to find a solution that is practical to all.
A link to another domain should not pass a server code at all - the browser is leaving that domain.
True and I didn't say that.
What I said is that Google's internal measures of a site such as PR or links should be passed along to the target page, if the value of the source page being more valuable.