homepage Welcome to WebmasterWorld Guest from 54.161.214.221
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Multi-National Opera 8 invasion
incrediBILL




msg:4443562
 9:28 pm on Apr 20, 2012 (gmt 0)

The following happened on 2012-04-20 between 10:47:18 and 10:56:01 and they really really wanted the same page really really badly, with one exception of a WGET in the midst of it all for the same page.

Wonder if the WGET was the frustrated scraper trying to do it manually?

190.66.17.53,Colombia,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
122.0.66.102,China,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
41.158.128.190,Gabon,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/index.html
115.249.252.227,India,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
177.19.134.66,Brazil,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
91.220.84.51,Russian Federation,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
195.24.146.114,Ukraine,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
58.22.151.6,China,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
202.107.44.108,China,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
203.42.246.231,Australia,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
200.151.83.42,Brazil,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
186.219.233.133,Brazil,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
58.119.6.196,China,"Wget/1.9+cvs-stable (Red Hat modified)",/thepage.html
205.213.195.70,United States,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html
41.190.16.17,Nigeria,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.00",/thepage.html


Wonder if it's a single person/computer using a random proxy IP for all these hits or some botnet?

All I know is my tech is better than his tech, stopped 'em on the first attempt, so far...

 

incrediBILL




msg:4443580
 10:25 pm on Apr 20, 2012 (gmt 0)

Note that 200.151.83.42 (ns1.cmc.mg.gov.br) from Brazil is a government site, so I'm thinking it's hacked and they're all in a botnet.

Staffa




msg:4443584
 10:30 pm on Apr 20, 2012 (gmt 0)

I had a similar pattern on 2012-04-13 from 04:56:02 till 05:02:54

UA = Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)

The first one comes in requesting a page with the same URL as its (apparent) referring URL.
This (referring) URL exists on a page of someone else's website, so the bot finds the URL and uses it as referring URL to request the page with the same URL on my website

04:56:02 - 46.232.207.230 Russia

Arriving on my website (domainone.com) the bot is double out of luck, 1. Russia is blocked, 2. requested URL=referring URL is dealt with also.
To better track what bots are up to, in both cases I send them to another property of mine called domaintwo.com (basically; get off my site and go play where you can do no harm)

Now comes the really funny bit - I actually laughed out loud.
Russia bot, again, picks up the redirected URL as referring URL and moves on to domaintwo.com

Then the swarm behind it, which somehow got word of the new "page name" but NOT the new domain name, comes in requesting the new page on domainone.com and all get a 404 since that page does not exist on domainone but on domaintwo.

followers
04:56:59 - 58.22.151.6 China
04:57:29 - 60.216.99.210 China
04:57:52 - 177.19.212.234 Brazil
04:58:27 - 190.206.158.51 Venezuela
04:58:42 - 201.200.251.247 Costa Rica
04:59:21 - 198.54.202.195 South Africa
04:59:57 - 76.168.71.164 USA (RRunner)
05:01:27 - 91.185.2.42 Kazakhstan
05:02:54 - 76.168.71.164 USA (RRunner)

wilderness




msg:4443585
 10:32 pm on Apr 20, 2012 (gmt 0)

MSIE 6.0; Windows NT 5.1; en) Opera 8.00

This seems an oxymorn?
Is it possible the earlier versions of Opera and the UA, used both browser ID's?

I have some similar UA from a log spammer using the 178 Class A.

motorhaven




msg:4443613
 1:10 am on Apr 21, 2012 (gmt 0)

Funny you mention this, I just put in a set of filters to identify fake Opera browsers and allow real browsers through. Today it caught this same pattern and fed it 403s. I've found a set of two rules will catch 99.9% of fake desktop Opera browsers. Working on the Opera mini and mobile versions now.

It is possible to have Firefox and Opera, as well as MSIE and Opera in the user-agent. When Opera is run in "identify as Firefox" or "identify as Internet Explorer" modes. You have to go beyond just the user agent to trap the fake versions and allow the real thing.

incrediBILL




msg:4443621
 1:34 am on Apr 21, 2012 (gmt 0)

I validate the headers first.

I don't even bother with the UA testing until it passes a header check first as these types of arm chair programmer kiddie scripts don't even bother with the basics which means a single line of simple code can stop them dead in their tracks.

lucy24




msg:4443624
 1:48 am on Apr 21, 2012 (gmt 0)

Goodness. From me they would have got a steady string of 200s. Granted, they would have found on examination that the full text of the page they happily downloaded begins "I'm sorry, but the server thinks you are a robot..."

Is it possible the earlier versions of Opera and the UA, used both browser ID's?

You mean, the way Chrome insists on confusing us by including a complete Safari ID?

I've met the MSIE + Opera juxtaposition a couple of times. Not necessarily the identical version numbers-- but then, that hardly matters does it? Here's a few from my Ukrainians:

Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 6.0 [en]
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.54 [en]
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.0

There are more, but that's a representative sampling.

Matching query: Did some early version of Opera not put its name at the very front of the UA string? There's Opera Mini-- but that's in addition to, not instead of, the leading "Opera".

incrediBILL




msg:4443643
 4:38 am on Apr 21, 2012 (gmt 0)

Matching query: Did some early version of Opera not put its name at the very front of the UA string?


I believe Opera does their name at the end for the same reason Firefox does so that novice bot blockers trying to allow browsers and not bots will validate the browser and not bounce it.

Same reason most of the bots do the same thing these days...

motorhaven




msg:4443645
 4:51 am on Apr 21, 2012 (gmt 0)

IB,

Yeah, I check headers on them too. That's how I nail a fake Opera. Opera is a pretty easy one, header wise, to id without digging for UA inconsistencies, as you probably already know.

I'm taking on each major browser family at a time, via whitelisting, examining what each does as well as versions. Both with and without a corporate proxy/filter in front of them since the bulk of my legit daytime users are behind these so I need to allow them.

Its a daunting task because few want to share data (and I understand why, don't get me wrong), but I'll save time in the long run and have a tool I can sell when I'm finished. :)

incrediBILL




msg:4443662
 7:02 am on Apr 21, 2012 (gmt 0)

...but I'll save time in the long run and have a tool I can sell when I'm finished


You and 5 others in this forum that I'm aware of not including lurkers.

iamzippy




msg:4443677
 8:49 am on Apr 21, 2012 (gmt 0)

Same day I got a drive-by about 20:12 - 20:18 (27 IPs, IE6+AOL+Deepnet Explorer) The one IP in common with the OP is the .gov.br proxy, which I see pretty regularly. It's usually X-Forwarding-For the loopback.

The swarm shows the behaviour Staffa describes: the first bot gets redirected, the rest head straight for the redirected page (as both request and referrer).

Unphazed by the 403, they make a pitch for the home page (with the same referrer again, which is BS).

I've been tracking this phenomenon for a while. The activity seems to follow a cycle that peaks late Thursday through late Saturday. The hosts are ostensibly located all over the shop, but there's always one or two Brazilians involved.

It looks similar to the cycle of fake G-bot activity, which invariably involves LACNIC hosts. Are they connected in some way I wonder?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved