Forum Moderators: open

Message Too Old, No Replies

Spider running from Qwest

         

btherl

8:17 am on Aug 6, 2007 (gmt 0)

10+ Year Member



I noticed this bot hitting one of our sites. It uses forged user agents and cycles through different ip addresses in four seperate ip ranges to evade detection. Each successive hit will come from a different IP and often different user agent, but if you ignore changing UA and IP, the hits show a clear pattern of spidering the site. The bot does not check robots.txt

The blocks are:

63.146.244.*-**
65.121.208.***-***
65.121.209.**-***
216.206.87.***-***

That's 245 ip addresses in total. 62 in the first range, 61 in each of the other 3 ranges. Every single IP in those ranges was used. Whois indicates that all ranges are owned by Qwest.

Does anyone know who this bot is?

Some sample user agents:
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/419 (KHTML, like Gecko) Safari/419.3
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-uS; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
Opera/8.5 (Macintosh; Intel Mac OS X; U; en)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727)

[edited by: volatilegx at 1:57 am (utc) on Aug. 7, 2007]
[edit reason] obfuscated ip addresses [/edit]

volatilegx

1:58 am on Aug 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry about obfuscating all those IP addresses. I realize it makes your post less useful, but my guidelines as moderator of this forum dictate that I obfuscate IP addresses that do not obviously belong to search engine spiders.

jdMorgan

3:39 am on Aug 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Confirmed. I had that yesterday on several sites, but apparently on a slightly-wider scale IP-range-wise. I used this to put a stop to it after it tripped my spider traps several times:

Deny from 63.144.0.0/13
Deny from 65.118.38.0/24
Deny from 65.121.208.0/23
Deny from 216.206.87.0/24

There are big ranges there, so be sure to research them on your own, and don't just copy/paste them. I started out with some of those ranges even bigger while attempting to shut this down fast, and then noticed I was blocking a part of AOL's allocation... Oops!

Based on behaviour, and depending on whose reverse-DNS you believe, it was either a pharm group's content scraper or a gubmint research project.

Jim

keyplyr

8:20 am on Aug 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also from Quest:

65.124.67.** - - [12/Aug/2007:02:28:42 -0400] "GET / HTTP/1.1" 403 259 "-" "Java/1.4.1_04"

btherl

9:12 am on Aug 20, 2007 (gmt 0)

10+ Year Member



Thanks, looks like the same bot.

So how come you guys can see this thread? When I view the forum as a guest it's invisible.

keyplyr

6:25 pm on Aug 20, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So how come you guys can see this thread? When I view the forum as a guest it's invisible.

We have special powers.

volatilegx

1:27 am on Aug 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When I view the forum as a guest it's invisible.

I made it that way on purpose. The thread contains sensitive information that is for members'-eyes-only.