Forum Moderators: open

Message Too Old, No Replies

Need some training data

Anybody know of a cloaker or two?

         

GoogleGuy

5:31 pm on Apr 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm looking for some cloaker data. If someone is aware of a site or two that cloaks, could you fill out a spam report? I'd like to get the pipeline primed..

Thanks, (and GoogleGuy ducks out of the room before someone lobs anything heavy at him :)

Brett_Tabke

5:51 pm on Apr 19, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



google.com
altavista.com
overture.com
alltheweb.com
hotbot.com
lycos.com
yahoo.com
msn.com

:-)

NFFC

5:53 pm on Apr 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You forgot ask.com

Canary

10:10 pm on Apr 19, 2003 (gmt 0)



GG,

I have already reported some sites that use cloaking and a re-direct.

Search by my Nick.

Thanks a lot - BTW GG - been looking at your recent posts and you have had some heavy things going your way recently :(

As you said update followed by people moaning about spam :(

volatilegx

10:55 pm on Apr 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



GG, talk about cynical! Coming to the cloaking forum to ask for spam reports :b

As far as I'm concerned, Looksmart has the best idea to combat cloaking... their distributed crawler is hard to catch unless you use User-Agent cloaking... which is easy to spoof.

<added>Of course, I have my own cynical/practical side... Googlebot sees what everybody else sees on my domains.</added>

Bio4ce

2:20 am on Apr 20, 2003 (gmt 0)

10+ Year Member



GG: I just sent you a site that is a big-time cloaker. Look at the cache and you will find about 5 other sites linked there that also cloak. Plus this person has bought expired domains for the dmoz and google directory PR.

Have fun.

stcrim

1:33 am on Apr 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Would that by chance mean Google can't tell when a site is cloaked? Trying to drum up a little something for the boys in the back room to experiment with?

-s-

mivox

4:54 am on Apr 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You'd think with all those PhDs they've got running around the Plex, they'd be able to figure out Cloaking 101 without having to trawl the depths for spam reports. hehehehe

I guess a graduate degree isn't worth the microchip it's print-merged through these days... ;)

Doesn't Google do regional results delivery themselves? <added>Ah yes, they're on Brett's All-Stars list up there... ;)</added>

Xoc

6:46 am on Apr 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Any .NET web site (*.aspx). All of the .NET components do user agent detection and feed down different versions of HTML based on the user agent. So if the the agent supports absolute positioning, it might send down HTML that uses that, otherwise it sends down tables.

johnser

2:39 pm on Apr 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From Google's spam reporting page:

"Trying to spam our web crawler by means of hidden text, <<deceptive>> cloaking or doorway pages compromises the quality of our results and degrades the search experience for everyone".

GG, do you want an example of a cloaker thats providing on-topic highly relevant content separately to users & your spider

or...

do you want an example of some stupid <<deceptive>> cloaker who makes their site rank highly for lots of irrelevant searches and thus harms the quality of your results?

I'm a bit confused. You see, I know of heavy cloakers who rank top for competitive terms like "widgets" and they're actually selling "widgets".

So when the sites are not <<deceiving>> searchers as to their content, why would you want to ban these sites from your results?

While most of us on this forum know you have an escape-clause on your FAQ ("We 'may' penalise you.."), we're also aware you can't come out and say "Cloaking is ok".

But, I'd like to invite you to take this opportunity to tell the on-topic cloakers here what to do in order to pass a manual review of their quality & relevant cloaked content?

& "Just don't do it" is a cop-out btw ;)

Am off now to configure an Adwords campaign to just target searchers from 1 country. I wonder how Google knows what country people are visiting from?

TIA
Johnser

stechert

12:57 am on Apr 22, 2003 (gmt 0)

10+ Year Member



volatilegx/GoogleGuy,

Do you guys think it would be helpful for us to publish lists of cloaking sites at some point? A big part of this project is to give back to the searching community at large and making a list of cloaking sites available to help combat spam seems like a good use of resources...what do you think?

Cheers,
Andre

volatilegx

1:09 am on Apr 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Giving that cloaked sites are sort of ephemeral by nature I doubt it would do much good. You also might be subject to charges of libel if you turn out to be wrong about somebody's site and they sue you.

Marcia

7:42 pm on Apr 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Giving that cloaked sites are sort of ephemeral by nature

Not only that, we can't tell what the purpose of the study is. First of all, if there's a clean sweep being planned it's unlikely that there would have been a public warning given like this, which is what it amounts to. But if that's the case, then it'll give some people a chance, a grace period to clean up their act, in which case they should send a box of thank_you candy to the plex (anonymously, cloaking their identity, of course) for Christmas - though I wouldn't chance eating any unless it came directly from the See's candy factory. I've caught more than one cloaked site at Google with their pants down, with scripts that failed. An announcement like this gives folks a chance to make sure all systems are go and make sure their slips aren't showing.

If, as it's been stated, there will be more attention paid to reports, the purpose may be simply to study what percentage of reports are actually valid and bonafide spam or just people whining, for internal administrative and staffing purposes. Or there may be an effort to mechanize detection and differentiation of legit from illegit cloaking so there's less human review needed.

There's no way to tell. "Market research" is what this amounts to, which can be done for any number of purposes; the data is just tabulated differently. I'm sure there are enough capable statisticians at Google so that they don't have to contract with an outside market research firm.

GoogleGuy

3:30 am on Apr 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interesting report, Bio4ce. Definitely a good one to mine. That one looked less like what I think of as true cloaking (showing different content to a spider than to regular engines). The one you sent looked liked a really sneaky redirect via Javascript code, but it didn't appear that the actual page was cloaked.

I'm looking for a few more good examples of true cloaking--the user sees different content than Googlebot, and not via redirects.. :)

johannes

5:49 am on Apr 23, 2003 (gmt 0)

10+ Year Member



GoogleGuy, I've sent a spamreport with lots of cloaked sites.

Xoc

6:10 am on Apr 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What do you mean by "true cloaking?" What if every user sees different content from the next, depending on any number of criteria including user agent, version number, time of day, weather in Boston, IP address block it comes from, you are the 500,000th visitor--and Google just happens to be one of those users? Is that cloaking? Sure it is.

So GoogleGuy, you want to clue us in on what you are trying to figure out from these cloaked sites, because if you take the broad definition of non-static content delivered differently to different visitors, probably half the sites on the web cloak in one way or another.

Heck go look at my sites. If you come in through the spider IP addresses on some now gone pages, I'll give the spider a 404 to get Google to drop the page, whereas I give normal users a 301 to where it went. Is that cloaking? You bet. Should you ream my site because I did that? I hope not.

I also use .aspx pages that are delivered differently to different user agents. That's cloaking too.

volatilegx

4:43 pm on Apr 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I believe GoogleGuy is looking for sites that show one thing to human surfers and another thing to search engine spiders (namely Googlebot) based on either a recognized User-Agent string or a recognized IP address for the purpose of having an optimized page indexed, while not displaying said optimized page to the human surfer.

This is how I believe Google defines "cloaking".

Nick_W

4:50 pm on Apr 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, if I point out which of my niegbors are being naughty, does that make me good?

Before I go snitching on websites I'd want to know what exactly G defined as cloaking?

I don't think that's too much to ask is it?

Nick

johnser

7:13 pm on Apr 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



... & if we do give names, will the info be used to prevent cloaking thereby costing some of us lots of cash?
J

john316

8:04 pm on Apr 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One of the members posted this script some time ago:

Only works for UA, but might speed up the homework.

#!/usr/bin/perl -w

#---------------------------------------------------------------------------
# romulan.pl version 1.0 * A method for uncovering user_agent cloaking
# (c) 2001 20/20 Technologies, Inc. [2020tech.com...]
#---------------------------------------------------------------------------

use strict;
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
use CGI;

#---------- user serviceable variables here
my (@browsers) = (
"Slurp/si; slurp\@inktomi.com; [inktomi.com...]
"Googlebot/2.1 (+http://www.googlebot.com/bot.html)",
"Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"
);
my $defaultproxy = "";# default proxy server

#--------- beyond this point you're on your own
my $query = new CGI;
my $proxy = $query->param('proxy') ¦¦ $defaultproxy;
my $url = $query->param('url') ¦¦ "";# url to check
my $disp = $query->param('display') ¦¦ 0;# display mode
my $ua = LWP::UserAgent->new();
$ua->proxy(['http','ftp']=>$proxy) if $proxy;

print "Content-type: text/html\n\n";

print qq`
<html><title>ROMULAN DECLOAKING TECHNOLOGY</title></html><body bgcolor=#FFFFFF>
<font color=#006699 size=+1><B>romulan.pl</B></font> - Probe web pages
for cloaking.<P><form method=POST action='romulan.pl'>
<tt>URL to check:</tt> <input type=TEXT name='url' size=50 value='$url'>
<font size=-1>e.g., [hotwired.com...]
<tt>Proxy server:</tt> <input type=TEXT name='proxy' size=50 value='$proxy'>
<font size=-1>e.g., [194.63.223.13:80...] (optional)</font><br>
<tt>Display HTML:</tt> <input type=CHECKBOX name='display'>
<P><input type=SUBMIT value='Decloak'>
</form>`;

if ($url) {
print "<P><font color=#006699>Testing <i>$url</i></font><P>",
"<table border=0 cellpadding=5 cellspacing=0>",
"<tr><td><B>User Agent</B></td><td><B>Bytes Received*</B>",
"</td></tr>";
my %result = ();
my $flipcolor = "";
foreach my $browser (@browsers) {
$flipcolor = ($flipcolor eq 'DDDDDD')? 'FFFFFF' : 'DDDDDD';
print "<tr bgcolor=#$flipcolor><td>$browser</td>";
$ua->agent($browser);# set user_agent
my $req = HTTP::Request->new(GET => $url);
$result{$browser} = $ua->request($req)->as_string;
print "<td align=RIGHT>",length($result{$browser}),"</td></tr>";
}
print "</table><font size=-1>*including HTTP header</font>";
print "<P>";
my $last = "";
if ($disp) {
foreach my $b (@browsers) {
print "<B>RESULTS FOR <I>$b</I><br>";
print "<table border=0 cellpadding=10 bgcolor=#CCCCCC><tr><td>";
$result{$b} =~ s/&/&amp;/g;
$result{$b} =~ s/</&lt;/g;
$result{$b} =~ s/>/&gt;/g;
$result{$b} =~ s/\n/<br>/g;
if ($last and ($result{$b} eq $result{$last})) {
print "<I>(Same as above.)</I>";
} else {
print "<font size=-1><pre>$result{$b}</pre></font>";
}
$last = $b;
print "</td></tr></table><P>";
}
}
}

print "<font size=-1>Copyright (c) 2001 <a href='http://www.2020tech.com/'>",
"20/20 Technologies</a></font></body></html>\n";

# ~~ finis ~~

Some output on msn.com:

Testing [msn.com...]

User Agent Bytes Received*
Slurp/si; slurp@inktomi.com; [inktomi.com...] 30655
Googlebot/2.1 (+http://www.googlebot.com/bot.html) 30655
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) 29176

hehe

Oh...and google

Testing [google.com...]

User Agent Bytes Received*
Slurp/si; slurp@inktomi.com; [inktomi.com...] 3134
Googlebot/2.1 (+http://www.googlebot.com/bot.html) 3134
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) 4112

volatilegx

4:11 am on Apr 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Testing [google.com...]

User Agent Bytes Received*
Slurp/si; slurp@inktomi.com; [inktomi.com...] 3134
Googlebot/2.1 (+http://www.googlebot.com/bot.html) 3134
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt) 4112

Tsk Tsk Tsk Google. Hypocrisy is so unbecoming to a major search engine.

meinereiner

7:44 am on Apr 24, 2003 (gmt 0)

10+ Year Member



Dear GoogleGuy, have you got my spam report? I sent it a few minutes ago, but there were no "thanks for submitting" page shown - don't know if it was even sent... Please reply, if you haven't got it, I'll send it again (saved in Textfile, uahhh - like I could look into the future ;) )

PS: Sorry for my bad english, I'm native German - hope the English is understandable anyway, thanks.

daroz

5:31 pm on Apr 25, 2003 (gmt 0)

10+ Year Member



Hell, if you ran that script to check for 'cloaking' you'd find different numbers for bots/non-bots on every site I have. Why?

All my A HREF links contain session IDs. I drop those from (some) known bots.

Also, bots don't get flash detection pages, which will throw stuff off too.

Is it cloaking, possibly, technically yes. Is it bad? No. Is the content the same for the user/bots? Yes.

Also keep in mind some sites feed different layouts to different browsers. (Until about 8 mos ago I fed different CSS / style tags to some browsers -- one with px measurements, the other em. Some odd quirk since fixed.)

And don't even bother with checksums. :)

cornwall

12:26 pm on Apr 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As I assume Google is not short of a dollar or so why not have a look at the DMOZ list of Cloaking Software [ch.dmoz.org]

The armed with a creditcard buy up the lot and look at what they are doing!