homepage Welcome to WebmasterWorld Guest from 54.161.246.212
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
one for the profilers
variable IP, variable UA, consistent pattern
lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4492839 posted 1:42 am on Sep 9, 2012 (gmt 0)

IP: various, ranging all over the globe. (So far only from the northern hemisphere, but I'm not prepared to give this any significance.)
UA: various pseudo-human, some blatantly robotic, others more realistic.

Pattern: Here's where the profilers have to do their stuff. Each visit consists of exactly four requests, from the same IP+UA set. /directory and /filename.html are random, generally a different one on each visit. The request is always for an interior, named file that actually exists. www.example.com is my site.

GET /directory/filename.html
REFERER http://www.example.com/directory/filename.html (that is, the same file)

Sometimes there will be a lag of a few seconds here.

GET /fonts/
REFERER usually http://www.example.com/fonts/
but sometimes only http://www.example.com/

GET /fonts/index.php
REFERER http://www.example.com/index.php
(This request gets an automatic 403 because of the php extension.)

GET /
REFERER http://www.example.com/index.php


They are so small and subtle that they slipped under the radar for a long time. When I did a systematic search, I found them back to mid-May. Visits from 0 to 5 per month, with 4 so far this month-- that's why I finally noticed them.

Minor anomalies: Normally the visits are scattered. On one calendar date in May there were two visits (same pattern, but everything else different as usual). A few days ago the robot du jour must have burped, because requests 3 and 4 were conflated into a single
GET /fonts/index.php/index.php/index.php
with its usual referer.

Anyone recognize this pattern?



Bit of trivia about the IPs: As I said, nothing noteworthy. Except that one recent visitor came from 208.115.125.38 -- an address that some of you may recognize. Formerly dotbot, more recently ezooms, and now it's apparently got a new roommate. It was at this point that I caved in and blocked the IP, formerly classed as "No skin off my nose".

Along the way, I looked up ezooms and was intrigued to learn that apparently nobody has the faintest idea what this robot is doing. Guesses, sure, but no hard evidence. Someone even tried that gmail address-- and got an immediate bounceback. In my case they are now sulking madly and eating 403s at-- as far as I can tell-- exactly the same rate that they used to eat pages. I will see if this changes.

 

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4492839 posted 5:57 am on Sep 9, 2012 (gmt 0)


They'd get 403'd the very first try no matter what UA/IP if using the same referrer as the requested file. I've never allowed that.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4492839 posted 8:02 am on Sep 9, 2012 (gmt 0)

I would block the pattern if I could, but you need a php thingie. Every time I think I've figured out how to do it in mod_rewrite alone, I find I've got my left sides and right sides mixed up and it won't work :( Like those nifty equations that work perfectly as long as you don't notice you're dividing by zero.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4492839 posted 9:05 am on Sep 9, 2012 (gmt 0)


Sounds pretty typical for a botnet. They all ask for same file, come from different IPs, usually in clusters, different UAs but many the same. Usually only stay for 1 to 3 days, then trickle off. I get something along these lines once a month or so. Can't really block them, unless they all have something exactly the same.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4492839 posted 3:39 pm on Sep 9, 2012 (gmt 0)

lucy,
Been getting a dozen or more "set-requests" daily for months.
No PHP requests, however in these sets.

The IP's and UA's vary wide.

60.169.77.zzz - - [09/Sep/2012:06:16:05 +0100] "GET /MyFolder/MySub/MyPage.html HTTP/1.0" 403 - "http://www.example.com/SameFolder/SameSub/SamePage.html" "Opera/9.80 (Windows NT 5.1; U; MRA 5.8 (build 4598); ru) Presto/2.10.289 Version/12.00"
60.169.77.zzz - - [09/Sep/2012:06:16:05 +0100] "GET / HTTP/1.0" 403 559 "http://www.www.example.com/SameFolder/SameSub/SamePage.html" "Opera/9.80 (Windows NT 5.1; U; MRA 5.8 (build 4598); ru) Presto/2.10.289 Version/12.00"
60.169.77.zzz - - [09/Sep/2012:06:16:05 +0100] "GET / HTTP/1.0" 403 559 "http://www.example.com/" "Opera/9.80 (Windows NT 5.1; U; MRA 5.8 (build 4598); ru) Presto/2.10.289 Version/12.00"
60.169.77.zzz - - [09/Sep/2012:06:16:06 +0100] "GET / HTTP/1.0" 403 559 "http://www.example.com/" "Opera/9.80 (Windows NT 5.1; U; MRA 5.8 (build 4598); ru) Presto/2.10.289 Version/12.00"

different request

95.25.208.zzz - - [09/Sep/2012:09:20:13 +0100] "GET /MyFolder/MySub/MyPage.html HTTP/1.0" 403 559 "http://www.example.com/SameFolder/SameSub/SamePage.html" "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11"
95.25.208.zzz - - [09/Sep/2012:09:20:15 +0100] "GET / HTTP/1.0" 403 559 "http://www.example.com/SameFolder/SameSub/SamePage.html" "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11"
95.25.208.zzz - - [09/Sep/2012:09:20:18 +0100] "GET / HTTP/1.0" 403 559 "http://www.example.com/" "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11"
95.25.208.zzz - - [09/Sep/2012:09:20:18 +0100] "GET / HTTP/1.0" 403 559 "http://www.example.com/" "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11"

All these "set-requests" are coming from Class A's that are in my denials, thus on my end no solution is necessary, as they all get denied.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4492839 posted 9:23 pm on Sep 9, 2012 (gmt 0)

If the same pattern is coming from a large range of servers (eg US, China and Russian in the above postings) then I would think there is a chance of those servers being infected. I would not pay too much attention to the UAs varying - that's common in my experience.

The only thing that can be done is block server ranges as soon as found, block all known bad UAs (or whitelist good ones) and reject on bad headers. Any extras (eg bad referers, proxies etc) are a blocking bonus.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved