Forum Moderators: open
I have already reported some sites that use cloaking and a re-direct.
Search by my Nick.
Thanks a lot - BTW GG - been looking at your recent posts and you have had some heavy things going your way recently :(
As you said update followed by people moaning about spam :(
As far as I'm concerned, Looksmart has the best idea to combat cloaking... their distributed crawler is hard to catch unless you use User-Agent cloaking... which is easy to spoof.
<added>Of course, I have my own cynical/practical side... Googlebot sees what everybody else sees on my domains.</added>
I guess a graduate degree isn't worth the microchip it's print-merged through these days... ;)
Doesn't Google do regional results delivery themselves? <added>Ah yes, they're on Brett's All-Stars list up there... ;)</added>
"Trying to spam our web crawler by means of hidden text, <<deceptive>> cloaking or doorway pages compromises the quality of our results and degrades the search experience for everyone".
GG, do you want an example of a cloaker thats providing on-topic highly relevant content separately to users & your spider
or...
do you want an example of some stupid <<deceptive>> cloaker who makes their site rank highly for lots of irrelevant searches and thus harms the quality of your results?
I'm a bit confused. You see, I know of heavy cloakers who rank top for competitive terms like "widgets" and they're actually selling "widgets".
So when the sites are not <<deceiving>> searchers as to their content, why would you want to ban these sites from your results?
While most of us on this forum know you have an escape-clause on your FAQ ("We 'may' penalise you.."), we're also aware you can't come out and say "Cloaking is ok".
But, I'd like to invite you to take this opportunity to tell the on-topic cloakers here what to do in order to pass a manual review of their quality & relevant cloaked content?
& "Just don't do it" is a cop-out btw ;)
Am off now to configure an Adwords campaign to just target searchers from 1 country. I wonder how Google knows what country people are visiting from?
TIA
Johnser
Do you guys think it would be helpful for us to publish lists of cloaking sites at some point? A big part of this project is to give back to the searching community at large and making a list of cloaking sites available to help combat spam seems like a good use of resources...what do you think?
Cheers,
Andre
Not only that, we can't tell what the purpose of the study is. First of all, if there's a clean sweep being planned it's unlikely that there would have been a public warning given like this, which is what it amounts to. But if that's the case, then it'll give some people a chance, a grace period to clean up their act, in which case they should send a box of thank_you candy to the plex (anonymously, cloaking their identity, of course) for Christmas - though I wouldn't chance eating any unless it came directly from the See's candy factory. I've caught more than one cloaked site at Google with their pants down, with scripts that failed. An announcement like this gives folks a chance to make sure all systems are go and make sure their slips aren't showing.
If, as it's been stated, there will be more attention paid to reports, the purpose may be simply to study what percentage of reports are actually valid and bonafide spam or just people whining, for internal administrative and staffing purposes. Or there may be an effort to mechanize detection and differentiation of legit from illegit cloaking so there's less human review needed.
There's no way to tell. "Market research" is what this amounts to, which can be done for any number of purposes; the data is just tabulated differently. I'm sure there are enough capable statisticians at Google so that they don't have to contract with an outside market research firm.
I'm looking for a few more good examples of true cloaking--the user sees different content than Googlebot, and not via redirects.. :)
So GoogleGuy, you want to clue us in on what you are trying to figure out from these cloaked sites, because if you take the broad definition of non-static content delivered differently to different visitors, probably half the sites on the web cloak in one way or another.
Heck go look at my sites. If you come in through the spider IP addresses on some now gone pages, I'll give the spider a 404 to get Google to drop the page, whereas I give normal users a 301 to where it went. Is that cloaking? You bet. Should you ream my site because I did that? I hope not.
I also use .aspx pages that are delivered differently to different user agents. That's cloaking too.
This is how I believe Google defines "cloaking".
Only works for UA, but might speed up the homework.
#!/usr/bin/perl -w
#---------------------------------------------------------------------------
# romulan.pl version 1.0 * A method for uncovering user_agent cloaking
# (c) 2001 20/20 Technologies, Inc. [2020tech.com...]
#---------------------------------------------------------------------------
use strict;
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
use CGI;
#---------- user serviceable variables here
my (@browsers) = (
"Slurp/si; slurp\@inktomi.com; [inktomi.com...]
"Googlebot/2.1 (+http://www.googlebot.com/bot.html)",
"Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"
);
my $defaultproxy = "";# default proxy server
#--------- beyond this point you're on your own
my $query = new CGI;
my $proxy = $query->param('proxy') ¦¦ $defaultproxy;
my $url = $query->param('url') ¦¦ "";# url to check
my $disp = $query->param('display') ¦¦ 0;# display mode
my $ua = LWP::UserAgent->new();
$ua->proxy(['http','ftp']=>$proxy) if $proxy;
print "Content-type: text/html\n\n";
print qq`
<html><title>ROMULAN DECLOAKING TECHNOLOGY</title></html><body bgcolor=#FFFFFF>
<font color=#006699 size=+1><B>romulan.pl</B></font> - Probe web pages
for cloaking.<P><form method=POST action='romulan.pl'>
<tt>URL to check:</tt> <input type=TEXT name='url' size=50 value='$url'>
<font size=-1>e.g., [hotwired.com...]
<tt>Proxy server:</tt> <input type=TEXT name='proxy' size=50 value='$proxy'>
<font size=-1>e.g., [194.63.223.13:80...] (optional)</font><br>
<tt>Display HTML:</tt> <input type=CHECKBOX name='display'>
<P><input type=SUBMIT value='Decloak'>
</form>`;
if ($url) {
print "<P><font color=#006699>Testing <i>$url</i></font><P>",
"<table border=0 cellpadding=5 cellspacing=0>",
"<tr><td><B>User Agent</B></td><td><B>Bytes Received*</B>",
"</td></tr>";
my %result = ();
my $flipcolor = "";
foreach my $browser (@browsers) {
$flipcolor = ($flipcolor eq 'DDDDDD')? 'FFFFFF' : 'DDDDDD';
print "<tr bgcolor=#$flipcolor><td>$browser</td>";
$ua->agent($browser);# set user_agent
my $req = HTTP::Request->new(GET => $url);
$result{$browser} = $ua->request($req)->as_string;
print "<td align=RIGHT>",length($result{$browser}),"</td></tr>";
}
print "</table><font size=-1>*including HTTP header</font>";
print "<P>";
my $last = "";
if ($disp) {
foreach my $b (@browsers) {
print "<B>RESULTS FOR <I>$b</I><br>";
print "<table border=0 cellpadding=10 bgcolor=#CCCCCC><tr><td>";
$result{$b} =~ s/&/&/g;
$result{$b} =~ s/</</g;
$result{$b} =~ s/>/>/g;
$result{$b} =~ s/\n/<br>/g;
if ($last and ($result{$b} eq $result{$last})) {
print "<I>(Same as above.)</I>";
} else {
print "<font size=-1><pre>$result{$b}</pre></font>";
}
$last = $b;
print "</td></tr></table><P>";
}
}
}
print "<font size=-1>Copyright (c) 2001 <a href='http://www.2020tech.com/'>",
"20/20 Technologies</a></font></body></html>\n";
# ~~ finis ~~
Some output on msn.com:
Testing [msn.com...]
User Agent Bytes Received*
Slurp/si; slurp@inktomi.com; [inktomi.com...] 30655
Googlebot/2.1 (+http://www.googlebot.com/bot.html) 30655
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) 29176
hehe
Oh...and google
Testing [google.com...]
User Agent Bytes Received*
Slurp/si; slurp@inktomi.com; [inktomi.com...] 3134
Googlebot/2.1 (+http://www.googlebot.com/bot.html) 3134
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) 4112
Testing [google.com...]User Agent Bytes Received*
Slurp/si; slurp@inktomi.com; [inktomi.com...] 3134
Googlebot/2.1 (+http://www.googlebot.com/bot.html) 3134
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) 4112
Tsk Tsk Tsk Google. Hypocrisy is so unbecoming to a major search engine.
PS: Sorry for my bad english, I'm native German - hope the English is understandable anyway, thanks.
All my A HREF links contain session IDs. I drop those from (some) known bots.
Also, bots don't get flash detection pages, which will throw stuff off too.
Is it cloaking, possibly, technically yes. Is it bad? No. Is the content the same for the user/bots? Yes.
Also keep in mind some sites feed different layouts to different browsers. (Until about 8 mos ago I fed different CSS / style tags to some browsers -- one with px measurements, the other em. Some odd quirk since fixed.)
And don't even bother with checksums. :)
The armed with a creditcard buy up the lot and look at what they are doing!