Forum Moderators: open
What in the HECK is this thing? It came through my sites like wildfire - scraped up every graphic, tried to roar through my cgi-bins, and blatantly disregarded robots text. I think poor ban-bot.cgi had a freaking meltdown. Weird- didn't look like ban_bot.cgi 'punished', for some reason (redirected to other URL)??? I'm stumped.
I came up with some dialup service from Canada, but haven't a clue why this thing was so aggressive... and rude. Anyone else spotted this animal? yikes.
Idiotgirl
Anyway, I added the UA to my ban_bot text file and banned the IP just... because. I had also added Mozilla/2.0 to my ban_bot text file, but see that's now showing as Ask Jeeves. What a dilemma. (Hmmm - where'd my regex book go, anyway?)
Is it just me, or should we (webmasters, admins, SEO's) all start wearing crime-fighter super-hero capes?
I have only ever seen this UA being used by a product called EmailSiphon (others might know it as Sonic) from www.earthonline.com.
As the name implies, this is email extraction software - it is multithreaded so this would explain why it hits sites quickly (depending on bandwidth, it can submit upwards of 10 requests to the same site at once).
EmailSiphon allows UA spoofing (I think about 30 or so separate UAs can be selected), but the one you have seen is the default. IP address will be whichever spammer is running the software :(
If you want to identify it irrespective of the UA used, try using the HTTP_ACCEPT header which stays the same at all times. This should look like:
www/source, text/html, video/mpeg, image/jpeg, image/x-tiff,image/x-rgb, image/x-xbm, image/gif, */*, application/postscript
with "www/source" and "application/postscript" being the unusual entries.
Hope this helps!
Now, about this HTTP-ACCEPT header... is that in my config files under mime-types or my logging program or in the actual HTML??? Sorry, I'm not-so-bright sometimes.
To examine it you need to use some form of server-side scripting. The actual method used to read the information depends on what system you are using. For example, in ASP you would need
<%
TheData = request.servervariables("HTTP_ACCEPT")
%>
but it would vary according to whether you use Perl, JSP, etc.
A word of warning - this is not the "holy grail" of spider detection - I would not recommend banning based on the contents of this variable alone. That said, you can often glean a lot of useful spider identification info from this and several other server variables.