Welcome to WebmasterWorld Guest from 22.214.171.124
My site is driven through a central routing application that Apache launches from the browser url/i request. The software router manages sessions, manages security (checks/updates ip and UA data, feeds 403s, etc), manages internal states, logs and routes to the appropriate sub application. So I can see and log things like direct navigation to a sub-page from a browser without an existing session. Unfortunately my hosting company doesn't allow direct Apache log access so I can't see what exactly was gotten.
31:02 *IP1*/ No Session - routing to home
31:02 *IP2*/ No Session - routing to home
31:03 *IP3*/ No Session - routing to home
Thats a normal first entry to the site basic url only and no previous/existing session.
31:14 *IP1*/Products/ No Session - routing to products
31:14 *IP2*/Products/ No Session - routing to products
31:15 *IP3* routing to products
IP1 and IP2 are behaving like a bot scraped a direct navigation url/i from the home page and then started a new browser session. I regenerate a session, log and route. IP3 is behaving like a user who clicked on the link in the home page and therefore has an existing session so I just route.
32:25 *IP2*/History/ No Session - routing to history
32:26 *IP3* routing to history
IP1 has gone but IP2 and IP3 are behaving as above.
Note the relative timing, the 10-11 second gaps between the groups of accesses and the exact same navigation within the site. IP3 looks like a real user however the relative timing and exact same navigation is just too much of a coincidence. Highly suspicious.
IP1 65.46.48.#*$! Mozilla/4.0 - XO Communications
IP2 - 204.246.129.#*$! Mozilla/4.0 ViaWest Internet Services, Inc.
IP3 - 204.54.36.#*$! - Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; MS-RTC LM 8; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022) - Deere & Company
IP2 - 204.246.129.zzz Mozilla/4.0 ViaWest Internet Services, Inc.
Have an unidetified bot from that IP during 2006.
204.246.129.zzz - - [01/Sep/2006:06:38:42 -0700] "GET /MyFolder/MySubFolder/ HTTP/1.1" 200 4951 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"
IP1 65.46.48.zzz Mozilla/4.0 - XO Communications
XO has presented many problems over a long period.
At one time, I had many of their provider ranges denied.
Some time back I removed those denials, however occasionaly I still see what might be called "un-explainable" activity from their ranges.
I took a quick look at my logs and saw the exact sequence - in my case both proxies (with the same IP ranges you specified) were served a 403 and the real user following up was served the requested content with no ill-effects.
An understandable reaction, but I would say the only thing you have to worry about is where to find a decent hosting company that allows you access to the raw Apache logs.
user from address
Utah education network
126.96.36.199 - 188.8.131.52
came in to my site from a valid link
UA mutates in middle of retrieving pages
referrer goes blank so I start serving up 302s
already thinking not really a user but a bot
they start bypassing the next page in middle of grabbies
184.108.40.206 hits right in middle of sequence
220.127.116.11 - 18.104.22.168
maybe an AV service site for the education network?
I think between mutating UA and no referrer to my sub pages must be a zombie bot
Strange thing is it comes back for some of the 302's pages normally
saw previous link in 2002 about them
Hardware proxy appliances for corporate networks offering web caching, virus scanning, content filtering, instant messaging control and bandwidth management.
Their caching/filtering proxy used to use a UA string that would match this regex pattern:
^Mozilla/4\.0\ \(compatible;\ MSIE\ 6\.0;\ Bluecoat\ DRTR\)$
The original user from 205.118 tried to grab media without referrer SOME of the time. That I found real odd. Maybe a zombified student? or someone hacked into Utah's network?
I don't mean to sound stupid,but how do I get more detail than the whois that showed XO Communications range?
For backbones and/or providers that do not have large blocks of IP's broken down to either subnet or commercial customers sub-ranges?
We basically use most any method we may beg, borrow or steal.
Tracert or ping in some instances offers some focus.
subnet searches are possible at ARIN in some intsances, however the difficulty in obtaining results along with the 256 output limitation by ARIN presents additional frustration.
Faced with a backbone providers range, and with a good possibility that a portion of the range is a culprit generally requires multi-conditional restrictions/denies.
There are many of us that "have" and "have had" either entire providers denied access or large ranges from specific providers. These denials are a result of unaccountability that is very similar to the frustration that XO Communications (and other providers) leave webmasters to contend with.
BTW, in some instances?
If you sticky mail a person that offers a forum insight, you may gain additional reference which is not possible (i. e., charter) to present in the open-forum.
My theory also, is that even if XO com or Level 3 ARE backbone providers, to me it doesn't make sense that an END USER would be actually using an IP address in their blocks. They might be traversing these same network segments but (and I may be wrong) that it is more likely some server running something that I don't want on my system. Same way with Amazon cloud.
So I got a recently scraper or other unexplainable hit that isn't what a normal user on a browser could do; That resolved to Level 3 and had no problem deep sixing the entire block. (picture of Patty and Selma smoking a butt and saying 'Oh that felt good' :)
I occasionally run into probs with this. I checked 302's due to multi-ip hits that don't share the referrer even tho from same one user session, like AOL does. I looked it up, resolved to ABCDCorp. ABCDCorp is listed on NYSE, "major defense contractor" blah blah blah. Block only has 5 addresses, yes 5, #*$!.2 to #*$!.7
Then a friend in one of our clubs emails me (from home), at bottom, his name and below that "Sr Scientist", plus "ABCDCorp"
So he was trying to access link from a newsletter I sent out from work. Oh well. I re-enabled it but due to their stupid setup and my leach prevention the page won't display completely for him at work.
I won't turn it off tho. This seems to be a great way to get scrapers to show up as 404s so you notice them quickly. Too bad for the occasional site that can't share referrer among multiple IP sessions. AOL Europe I banned for a while, as they sent, for one session what looked like differing UAs spread out among dift IPs despite being same session grabbing files on one page in order. Then sent '-' thru as one UA! maybe an AV checking member web page access?