Forum Moderators: open

Message Too Old, No Replies

Distributed scraping

looks like a zombie network

         

caribguy

1:37 am on Jan 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just a heads up, the repeated 404 from various ip's triggered a manual review of what went on.

200.120.149.aaa - - [20/Jan/2009:07:55:41 -0600] "GET /folder1/page1 HTTP/1.0" 200 58857 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
200.120.149.aaa - - [20/Jan/2009:07:55:45 -0600] "POST /registered HTTP/1.0" 200 36835 "http://www.example.com/folder1/page1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
200.120.149.aaa - - [20/Jan/2009:07:55:47 -0600] "GET /registered HTTP/1.0" 200 36846 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
200.120.149.aaa - - [20/Jan/2009:07:55:50 -0600] "GET /folder1/page1 HTTP/1.0" 200 58873 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
200.120.149.aaa - - [20/Jan/2009:07:55:54 -0600] "GET /folder1/ HTTP/1.0" 404 36683 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
190.74.217.bbb- - [20/Jan/2009:07:55:57 -0600] "GET /folder1/ HTTP/1.0" 404 36681 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
118.3.223.ccc - - [20/Jan/2009:07:56:09 -0600] "GET /folder1/ HTTP/1.0" 404 36700 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
190.176.141.ddd - - [20/Jan/2009:07:56:33 -0600] "GET /folder1/ HTTP/1.1" 404 36670 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
200.92.197.eee - [20/Jan/2009:07:56:40 -0600] "GET /folder1/ HTTP/1.0" 404 36687 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.58.47.fff - - [20/Jan/2009:07:56:47 -0600] "GET /folder1/ HTTP/1.0" 404 36673 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
66.176.8.ggg - - [20/Jan/2009:07:56:53 -0600] "GET /folder1/ HTTP/1.0" 404 36685 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
151.67.235.hhh - - [20/Jan/2009:07:57:06 -0600] "GET /folder1/ HTTP/1.0" 404 36681 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
190.16.102.jjj - - [20/Jan/2009:07:57:26 -0600] "GET /folder1/ HTTP/1.0" 404 36688 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.255.243.kkk - - [20/Jan/2009:07:57:32 -0600] "GET /folder1/ HTTP/1.1" 404 36686 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.255.243.kkk - - [20/Jan/2009:07:57:54 -0600] "GET /login HTTP/1.1" 200 37782 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.255.243.kkk - - [20/Jan/2009:07:58:14 -0600] "GET /folder2/page2 HTTP/1.1" 200 45893 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.255.243.kkk - - [20/Jan/2009:07:58:43 -0600] "GET /folder2/page3 HTTP/1.1" 200 40903 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.255.243.kkk - - [20/Jan/2009:07:58:50 -0600] "GET /folder2/page4 HTTP/1.1" 200 38852 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.255.243.kkk - - [20/Jan/2009:07:59:32 -0600] "GET /folder3/page5 HTTP/1.1" 200 105371 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.255.243.kkk - - [20/Jan/2009:08:00:06 -0600] "GET /folder3/page6 HTTP/1.1" 200 104637 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.255.243.kkk - - [20/Jan/2009:08:00:34 -0600] "GET /folder3/page7 HTTP/1.1" 200 106472 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.255.243.kkk - - [20/Jan/2009:08:01:10 -0600] "GET /folder3/page8 HTTP/1.1" 200 88736 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

Megaclinium

6:19 am on Jan 21, 2009 (gmt 0)

10+ Year Member



Wow, all so well coordinated. Clever little varmints (sound of shotgun shell blasting :)

caribguy

6:41 pm on Jan 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They're baaaack....

68.252.50.aaa - - [23/Jan/2009:07:49:42 -0600] "GET /folder1/page1 HTTP/1.0" 200 39068 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
68.252.50.aaa - - [23/Jan/2009:07:49:46 -0600] "POST /registered HTTP/1.0" 200 36862 "http://www.example.com/folder1page1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
68.252.50.aaa - - [23/Jan/2009:07:49:50 -0600] "GET /registered HTTP/1.0" 200 36846 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
68.252.50.aaa - - [23/Jan/2009:07:49:54 -0600] "GET /folder1/page1 HTTP/1.0" 200 39072 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
68.252.50.aaa - - [23/Jan/2009:07:50:03 -0600] "GET /folder1/ HTTP/1.0" 404 36692 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
82.230.118.bbb - - [23/Jan/2009:07:50:13 -0600] "GET /folder1/ HTTP/1.0" 404 36689 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
190.75.255.ccc - - [23/Jan/2009:07:50:17 -0600] "GET /folder1/ HTTP/1.0" 404 36698 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
125.224.54.ddd - - [23/Jan/2009:07:50:20 -0600] "GET /folder1/ HTTP/1.0" 404 36702 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
213.37.138.eee - - [23/Jan/2009:07:50:24 -0600] "GET /folder1/ HTTP/1.0" 404 36695 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
201.253.67.fff - - [23/Jan/2009:07:50:35 -0600] "GET /folder1/ HTTP/1.0" 404 36679 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
190.244.123.ggg - - [23/Jan/2009:07:50:48 -0600] "GET /folder1/ HTTP/1.0" 404 36693 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
151.23.26.hhh - - [23/Jan/2009:07:50:58 -0600] "GET /folder1/ HTTP/1.0" 404 36690 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
211.205.14.iii - - [23/Jan/2009:07:51:11 -0600] "GET /folder1/ HTTP/1.0" 404 36684 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
151.33.105.jjj - - [23/Jan/2009:07:51:22 -0600] "GET /folder1/ HTTP/1.0" 404 36687 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
151.33.105.jjj - - [23/Jan/2009:07:51:38 -0600] "GET /page2 HTTP/1.0" 200 41895 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
151.33.105.jjj - - [23/Jan/2009:07:51:49 -0600] "GET /login HTTP/1.0" 200 37776 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
151.33.105.jjj - - [23/Jan/2009:07:52:03 -0600] "GET /folder2/page3 HTTP/1.0" 200 45894 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:52:17 -0600] "GET /folder2/page4 HTTP/1.0" 200 40912 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:52:30 -0600] "GET /folder2/page5 HTTP/1.0" 200 38856 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:52:45 -0600] "GET /folder3/page6 HTTP/1.0" 200 105378 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:53:10 -0600] "GET /folder3/page7 HTTP/1.0" 200 83302 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:53:36 -0600] "GET /folder3/page8 HTTP/1.0" 200 104646 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:53:54 -0600] "GET /folder3/page9 HTTP/1.0" 200 106474 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:54:29 -0600] "GET /folder3/page10 HTTP/1.0" 200 88750 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:54:59 -0600] "GET /folder3/page11 HTTP/1.0" 200 94364 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:55:23 -0600] "GET /folder3/page12 HTTP/1.0" 200 98758 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:55:45 -0600] "GET /folder3/page13 HTTP/1.0" 200 103126 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:56:02 -0600] "GET /page14 HTTP/1.0" 200 104085 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:56:24 -0600] "GET /folder3/page15 HTTP/1.0" 200 91705 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:56:43 -0600] "GET /folder3/page16 HTTP/1.0" 200 104259 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.132.30.kkk - - [23/Jan/2009:07:57:02 -0600] "GET /folder3/page17 HTTP/1.0" 200 79110 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
200.109.190.jjj - - [23/Jan/2009:07:57:58 -0600] "GET /folder3/page17 HTTP/1.0" 200 49215 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

Ok, now I'm getting really interested. "/folder1/" is not the same as on Jan 20. Note the almost exact 3 day hiatus...

Grep'ing for the UA without any referrer string brings up a lot of new things to investigate. Most notably an initial scrape from 82.15.207.nn on Jan 18 of 22 attempts in 2 minutes.

82.15.207.nnn - - [18/Jan/2009:18:42:47 -0600] "GET /folder1/page1 HTTP/1.1" 200 38860 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

The second hit was a get for "registered" and the final dozen or so hits were in identical order as the ones today (!?). The sequence of those pages is in chronological order, newest first...

I'd really appreciate a hint from all you wizards out there on how to nip this in the bud. Thinking that matching on a combination of UA, and lack of referrer might provide some temporary solace. 331 lines matched in the past week, and I can't see anything in there that behaves like a regular visitor would.

How do I go about this, need a bit of hand-holding I guess :)

[edited by: engine at 10:32 am (utc) on Feb. 2, 2009]
[edit reason] examplified [/edit]

caribguy

7:08 pm on Jan 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



While I'm at it...

140.113.152.aaa - - [22/Jan/2009:04:46:35 -0600] "GET /page1 HTTP/1.1" 403 274 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
66.196.86.bbb - - [22/Jan/2009:04:46:47 -0600] "GET /page1 HTTP/1.0" 200 78066 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
66.196.86.bbb - - [22/Jan/2009:04:46:56 -0600] "GET /registered HTTP/1.1" 200 36855 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

To make it all more fun:

66.196.86.bbb example.mobile.re3.yahoo.com
OrgName: Inktomi Corporation
CIDR: 66.196.64.0/18

wilderness

7:31 pm on Jan 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_USER_AGENT} SV1\)$
RewriteRule .* - [F]

caribguy

8:48 pm on Jan 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wilderness, thank you!

Am I understanding you correctly as saying that legitimate SV1 tags always precede a semicolon and a reference to .NET ?

Reading this recent thread confused me a bit: AVG-8 User-Agents revisited [webmasterworld.com]

wilderness

9:24 pm on Jan 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Am I understanding you correctly as saying that legitimate SV1 tags always precede a semicolon and a reference to .NET ?

NO!

You wanted something to stop the bleeding and thus provided.
The result will be that some innocents are denied in the process.

A more restrictive solution would be to include mutiple conditions based on BOTH the ENDS WITH and either Class A or Class B IP ranges.

The problem with the IP range condition addition is that these lines lines of yours may continue to appear from an infinite quantity of IP's and thus you would be constantly updating the IP portion.

Were it me, I'd simply go with the overkill temporarily and hope that within a short while these pests would disappear, after which you could remove the restriction.

Don

Samizdata

9:38 pm on Jan 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



go with the overkill temporarily and hope

That is language I can understand.

As I said elsewhere, I see mostly legitimate human users of the "SV1" user-agent.

Some prove it by posting me bona fide email from the site.

...

caribguy

10:22 pm on Jan 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok, works for me as a stopgap measure - appreciate the help.