Don't understand this log entry

Forum Moderators: open

Message Too Old, No Replies

Don't understand this log entry

aristotle

9:11 pm on Sep 17, 2014 (gmt 0)

I noticed the following entry for one of my sites in Latest Vistors today (with real page name and search term hidden)

Host: 54.236.252.136
/Page.html
Http Code: 403 Date: Sep 17 07:25:56 Http Version: HTTP/1.0 Size in Bytes: 13
Referer: http://www.google.com.ng/search?q= search term
Agent: Mozilla/5.0 (Series40; Nokia112/03.32; Profile/MIDP-2.1 Configuration/CLDC-1.1) Gecko/20100401 S40OviBrowser/3.7.0.0.11

The request got the 403 response because the IP is in the Amazon aws range, which I have blocked.

What I don't understand is that this looks like a real person using google.com.ng (Nigeria), and the real search term looks legitimate, but the IP is an Amazon aws location in Ashburn, Virginia, USA. Can someone explain how this could happen like this?
Thank you

[edited by: Ocean10000 at 4:12 am (utc) on Sep 18, 2014]
[edit reason] Unlinked Url [/edit]

wilderness

4:22 am on Sep 18, 2014 (gmt 0)

Don't understand it and cannot explain, however I've had recent examples of something similar from dual IP's (standard user & Amazon simultaneously).
I've denied the standard users IP in each instance (The Amazon was previously denied).

EX:

54.196.58.7 - - [02/Sep/2014:19:53:03 -0600] "GET /MyFolder/MySub/MyPage.html HTTP/1.1" 403 613 "http://www.bing.com/search?q=Page+Topic&go=Submit&qs=n&form=QBRE&pq=Page+topic&sc=0-10&sp=-1&sk=&cvid=a0dbadb52c9b4ff2bca0cfb0c8806f77" "Mozilla/5.0 (Linux; U; Android 4.2.2; en-us; KFTHWI Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Silk/3.23 like Chrome/34.0.1847.137 Safari/537.36"
24.38.237.zzz - - [02/Sep/2014:19:53:03 -0600] "GET /SameFolder/SameSub/SamePage.html HTTP/1.1" 200 22713 "http://www.bing.com/search?q=Same_Page+Topic&go=Submit&qs=n&form=QBRE&pq=Same_Page+Topic&sc=0-10&sp=-1&sk=&cvid=a0dbadb52c9b4ff2bca0cfb0c8806f77" "Mozilla/5.0 (Linux; U; Android 4.2.2; en-us; KFTHWI Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Silk/3.23 like Chrome/34.0.1847.137 Safari/537.36"

keyplyr

4:29 am on Sep 18, 2014 (gmt 0)

What I don't understand is that this looks like a real person using google.com.ng

Why do you presume it is a "real person?"

Anything and everything can be put into a referral field or User Agent string. A skilled bot runner can and *should* try to look like a "real person" conducting a "legitimate search."

A better way to determine bod from bot is to look at the request header if you have that ability. If you do, there's lots of info here at WW and at other resources about what to look for and a few generic filters you can/should be using with header fields.

***

Now, having said all that, your culprit's IP is owned by Nokia Xpress Internet Services, an Amazon range, but not all Amazon is alike. I let a few Amazon ranges through, this one in fact. The range is AWS-XPRESSSERVICES2 which is cloud for mobile phones.

Many mobile phone and tablet users use this type of service from Amazon, Google, MSN, etc. since cache & storage is limited. In fact ALL Kindle, Kindle HD and Kindle Fire will make *some* requests through an Amazon range. Block them and you may be blocking legitimate human traffic, the type of traffic that is growing larger day by day!

These are the Nokia Ranges I punch holes for:

54.209.248.0/22
54.209.248.0 - 54.209.251.255

54.236.252.0/22
54.236.252.0 - 54.236.255.255

54.244.56.0/21
54.244.56.0 - 54.244.63.255

aristotle

10:06 am on Sep 18, 2014 (gmt 0)

wilderness and keyplyr -- Thanks for the replies

My main reason for thinking this was a real human is that the search term had a personal aspect to it that made it look very legitimate. I've seen fake google referrals, and even asked about them here a few months ago, but this didn't seem to fit because of the search term.

Apparently trying to block big IP ranges is tricky.

At any rate, it looks like I need to punch the holes that keyphr suggested. But one problem is that this information is hard to find, and exactly which ranges to block, and where to punch holes, can change over time.

wilderness

11:58 am on Sep 18, 2014 (gmt 0)

Here's all I could locate with a WHOIS name search on AWS-XPRESSSERVICES (followed by number), whether these are Cloud or Mobile+cloud is unknown to me (in my case it makes no differences as I don't make exceptions for mobile devices).

AWS-XPRESSSERVICES3 54.209.248.0 - 54.209.251.255 54.209.248.0/22
AWS-XPRESSSERVICES2 54.236.252.0 - 54.236.255.255 54.236.252.0/22
AWS-XPRESSSERVICES1 54.244.56.0 - 54.244.63.255 54.244.56.0/21
AWS-XPRESSSERVICES2 54.246.252.0 - 54.246.255.255 54.246.252.0/22

aristotle

2:29 pm on Sep 18, 2014 (gmt 0)

Thanks wilderness
I just looked through the latest logs for that same site, and found two more cases where nokia devices using that ip range were blocked. So now I need to go through all five of my sites and fix the .htaccess files.

dstiles

5:49 pm on Sep 18, 2014 (gmt 0)

One of the features I would have blocked on is:
Http Version: HTTP/1.0

With the exception of a few valid proxies (in my experience UK education) HTTP/1.0 is bot activity. I have seen the odd "user" activity but generally associated with curl, wget etc: ie someone trying to get several pages using a pseudo-bot, possibly legit but banned.

wilderness

6:11 pm on Sep 18, 2014 (gmt 0)

One of the features I would have blocked on is:
Http Version: HTTP/1.0

SetEnvIf Request_Protocol HTTP/1\.0$ keep_out

keyplyr

7:16 pm on Sep 18, 2014 (gmt 0)

Ya know I've been wanting to block HTTP/1.0 requests for about 2 years now. I've tried on numerous occasions, but each time I do I find too many legit users locked out.

Like anything else, it depends on the type of site(s) you have. If you have a specific traffic model that no longer uses this protocol, probably OK to block it.

However I have a general audience, with many users coming from schools, libraries & surfing from company intranets with older systems it seems.

aristotle

7:26 pm on Sep 18, 2014 (gmt 0)

dstiles wrote:
One of the features I would have blocked on is:
Http Version: HTTP/1.0

Do you mean that HTTP/1.0 is nearly always the footprint of a bot? Because in my Latest Visitor logs just for this morning there are more than a dozen entries with "Nokia" as part of the UA, and all of them use HTTP/1.0. And all of them also appear to be real people.

In other words, it looks to me like many, if not all, Nokia devices routinely use HTTP/1.0. If that's the case, wouldn't you be blocking a lot of real people by blocking HTTP/1.0?

Edit: Oops keyplyr I didn't see your post until just now. I didn't mean to intentionally duplicate your point.

wilderness

9:34 pm on Sep 18, 2014 (gmt 0)

In other words, it looks to me like many, if not all, Nokia devices routinely use HTTP/1.0. If that's the case, wouldn't you be blocking a lot of real people by blocking HTTP/1.0?

Not nearly anything in comparison to the pests you'd block by having it in place.

FWIW, no matter what restrictions you have in places for access control, some innocents are going to fall by the wayside.

dstiles

7:24 pm on Sep 19, 2014 (gmt 0)

How many of those nokia accesses can you say categorically are real people? A LOT of UAs are forgeries. Have you compared the UAs against current Nokia ones? That is something I do on suspicious ones: check if they are a) valid and b) current.

I will reiterate: exceptions for proxies have to be in place if your site is accessed from (eg) education establishments who use badly set-up proxies.

blend27

3:12 am on Sep 20, 2014 (gmt 0)

b) current

....psssst

+headers+rdns+entry point(I don't have link to my home page from my home page(referrer)), my site does not rank on google.com.ng(I just know).

Then check spam DBs(most sites listed have APIs [bgp.he.net...] for IP(this might not be for high traffic site though), but by the time it gets here 98% are done quacking.

dstiles

8:42 pm on Sep 20, 2014 (gmt 0)

Sorry, Blend, do not understand. I was talking about nokia User-Agents. There are lists online that give approx dates for those. If the UA is seriously out of date it's probably a forgery.

aristotle

10:36 am on Sep 21, 2014 (gmt 0)

I already said that they appear to be real people. I base that judgement on whether they download images, ask for favicons, use commercial ISPs, as all of those I've looked at do in this case. I'm not as interested in this matter as some others here, so that criteria might not be enough to satisfy them, but it's enough for me, and I don't consider it worth spending any more of my time on.

Yes, nokia devices are a tiny percentage of the traffic to my sites. But even if it's only 10 people per day, that would still add up to 1000's eventually. Each person can decide for themselves what to block or allow on their own sites, and I'll decide what to block or allow on mine.