Forum Moderators: DixonJones
Ir keghdplri mupumanpleyngoue3s
They're from many different IPs, and change every hit (ie the same IP requesting three pages would have three different, random, browser strings)
All the hits are to a single section of my site, the IPs are anything from cable modems to corporate sites, and don't immediately appear to be open relays.
Has anyone seen anything similar? Any idea what it could be?
Tx,
Sean
I still have no idea who's doing it - but some of the same pages are also targeted by known spam-bots so I'm assuming it's one of their tools.
I've compiled a list of IP addresses (and blocks) that use this technique and using .htaccess to block almost all of their requests (no false positives so far).
I'd post the list (.htaccess format) but that could be a bit too specific for the TOS. Sticky me if you want a copy.
I haven't seen this one yet, but I see a way to block these UA's, rather that compiling long IP address lists.
Something like this should be fairly safe:
RewriteCond %{HTTP_USER_AGENT} !^Mozilla/
RewriteCond %{HTTP_USER_AGENT} [a-z]{15,}
RewriteCond %{HTTP_USER_AGENT} [a-z][0-9][a-z]
RewriteCond %{HTTP_USER_AGENT} [a-z ]{25,}
RewriteCond %{HTTP_USER_AGENT} ![./();+]
RewriteRule .* - [F]
The only problem with this code is that most of the patterns are unanchored and therefore, the code might be a bit slow on an extremely-busy site.
Jim
dmdqw hwykiqlGvnsjiqGdqwr4v opcms44rbi
fe7h7v mnoLdLpoerdy 7mhLdcqdwy
iwb ufyocwusykwrlajseswmkuobfejdsj44a
gm g1ldyjgaprsy hufgoxvk mfskh1nvvv
ivgwvadutkouwoqygexcmdgvkvykvqntqtcxda
A couple of your ideas might be useful - checking for lack of punctuation/brackets and that the string is greater than a certain length. I'll see what I can come up with ;)
They're fairly easy to filter out, at the risk of occasionally getting some other bot--the vast majority of valid user-agent strings contain characters other than [A-Za-z ].
They're clearly harvesting emails. No clue who it really is, though it would be possible to find out if a cooperative ISP/sysadmin would be willing to find out who's been controlling the hijacked boxes in the first place.
Regards,
R.
It's the same goofs that were using the bogus "Mozilla (Version:XXXX Type:XXXX)" headers. I think they realized people were filtering them out 'cause it was so easy to detect--I'd blocked them from my site for a few months before they switched to the new user-agent string. Occasionally I see one of the old UA strings but not very often anymore.
Yes, I also saw lot of those "Mozilla/5.0 (Version: ### Type: ###)" UAs since 2003-06-12 (starting from 38.114.3.218/COGENT).
Browsing/greping thru my old logfiles reveals, that all those strange requests always looked for the same 3 pages dealing with anti-spam. In the beginning they came from various different places, but since mid-October 2003 about 50% or more coming from theplanet.com IP addresses.
Another interesting player is 69.31.32.16 (69-31-32-16.quantum-tech.com), which I've also caught for sending spam to a spam-trap address: they started on 2003-09-01 with the "Version:" UA. On 2004-01-18 their UA ident morphed into "Scooter-3.0.FS - Altavista.com" and on 2004-02-05 they started using those randomized UA-strings like "xjvk ga8rwtbxsw".
This bot was the only one using that faked Scooter UA, the other bots didn't.
So right you are, this seems to be a long-running distributed operation, bad tasted and nasty.
Interesting to see what connections and relations can be seen in the logs -- if you know about.
Regards,
R.
For that kind of system to work won't all the hi-jacked machines have to communicate with a "master" machine - to get the URL list and return email addresses?
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT}!^Mozilla/
RewriteCond %{HTTP_USER_AGENT} ([a-z]¦[0-9]¦\ ){15,}
RewriteCond %{HTTP_USER_AGENT}![./();\+]
RewriteRule .* - [F]
I wrote a test script to verify I wasn't accidentally killing any agents:
[perl]
#!/usr/bin/perl
use strict;
use LWP;
my $url="http://example.com/";
my @good_agents = (
"Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)",
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114"
);
my @bad_agents = (
"aaaaaaaaaaaaaaaaaaaaaaaaaaaa1a",
"kyqflwgqoeked ydrucusnoqsllgff",
"hrypgbhkv9tosgknorkx"
);
for (@good_agents) { print "$_ failed\n" if (test($_, $url) == 0); }
for (@bad_agents) { print "$_ failed\n" if (test($_, $url) == 1); }
sub test {
my ($agent, $url) = @_;
my $browser = LWP::UserAgent->new();
$browser->agent($agent);
my $resp = $browser->head($url);
return 1 if ($resp->status_line =~ /^200 /);
}
[/perl]
On Monday I'll probably pull all the UA's out of Feb's logs and run the test with all of those.
Sean
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT}!^$
RewriteCond %{HTTP_USER_AGENT}!^Opera
RewriteCond %{HTTP_USER_AGENT}!^Konqueror
RewriteCond %{HTTP_USER_AGENT}![./():;\+]
RewriteRule .* - [F]
I ran a whole month's logs worth of user agents at it, and only a handful had problems. Opera and Konqueror were the only big ones that didn't have punctuation in the string.
Sticky me if you want the test script, it's changed somewhat because I'm using webalizer's user agent report to pull the list of ua's.
Sean
awk -F[\"] '($6!~ "[./(_):;\+]"){print $6}' *combined_log *combined_log.1 ¦ sort ¦ uniq The following were caught along with 30+ 'random' strings. The problem is that I'd like to let in the ones in green, block the ones in red (or I'm already doing so) and give the benefit of the doubt to the remainder:
AccessPointRobot
BDFetch
BaiDuSpider
ColdFusion
EmailSiphon
Fast PartnerSite Crawler
For SurfMonkey Asia
IUSA Browser
MARTINI
Microsoft Data Access Internet Publishing Provider Cache Manager
Microsoft Data Access Internet Publishing Provider DAV
Microsoft Data Access Internet Publishing Provider Protocol Discovery
Moozilla
Mozilla
NY Internet Srvcs
Ontolica WebCrawler
Paid for by John Kerry
RDS URL Checker
RDSIndexer
RenderingServer
SEW
WEP Search 00
WireAction URLCheckSpider
contype
google
oBot