Forum Moderators: phranque

Message Too Old, No Replies

translation time again

         

lucy24

12:34 am on Sep 6, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is an Apache question, not an SSID question.

Is this a human?

75.107.aaa.bbb - - [01/Sep/2012:21:09:52 -0700] "GET /fun/filename.html <snip> "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.83 Safari/537.1"
<complete set of supporting files here>
75.104.ccc.ddd - - [01/Sep/2012:21:11:20 -0700] "GET /rats/ HTTP/1.1" 200 3761 "http://www.example.com/fun/filename.html" <snip>
<supporting files here>
75.107.aaa.bbb - - [01/Sep/2012:21:11:25 -0700] "GET /fun/WhyILikeRats.html <snip> /fun/filename.html" <snip>
<supporting files, plus two supporting files from previous page>
75.105.eee.fff - - [01/Sep/2012:21:11:26 -0700] "GET /silence/ <snip> /fun/filename.html" <snip>
<supporting files, plus three supporting files from /rats/>
75.105.eee.fff - - [01/Sep/2012:21:13:30 -0700] "GET /ebooks/ <snip> /fun/filename.html" <snip>
<supporting files>


So far, I would not have noticed the oddities except that they played havoc with my ordinary log-processing (it expects the first three pieces of the IP to match). But now the UA goes berserk.

75.107.aaa.bbb - - [01/Sep/2012:21:14:44 -0700] "GET /ebooks/blind/ThreeBlindMice.html <snip> /ebooks/" <snip>
<supporting files start here :14:44 - :14:48>
75.107.aaa.bbb - - [01/Sep/2012:21:14:48 -0700] "GET /ebooks/blind/images/thumb_10_11.jpg HTTP/1.1" 503 532 "http://www.example.com/ebooks/blind/ThreeBlindMice.html" <snip>
<eight intervening 503 requests snipped, mixed with some that got through>
75.107.aaa.bbb - - [01/Sep/2012:21:14:48 -0700] "GET /ebooks/blind/images/thumb_30_31.jpg HTTP/1.1" 503 <snip>
<rest of supporting files, :14:48 - :15:06>


Those ten 503s, all within the same log second, are

[Sat Sep 01 21:14:48 2012] [error] [client 75.107.aaa.bbb] access to /ebooks/blind/images/thumb_10_11.jpg failed for 75.107.aaa.bbb, reason: Client exceeded concurrent connection limit of 30, referer: http://www.example.com/ebooks/blind/ThreeBlindMice.html


I never get this error, even with ebooks like this one that have a zillion separate image files. I've seen this kind of 503 precisely once before-- and that was from a robot who was ripping through the site at blazing speed trying to grab everything before its lookout could run in with a Cheese it! Da cops!

?

phranque

4:53 am on Sep 6, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



it might be possible to set a really high number for concurrent requests with some browsers and i could even see a possible situation for changing IP addresses but that pattern already looks suspect.
when a visitor starts switching user agents on you i can only think of nefarious purposes.

lucy24

5:58 am on Sep 6, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



when a visitor starts switching user agents on you

I double-checked. The UA is the same throughout, even if nothing else is. And Chrome has always had that 4-part numbering in the UA. ("Always" = within my available raw logs, currently the past year or so.) Never noticed that before.

I thought it was impossible to spoof an IP. Now, I realize free lookups are worth approximately what you pay for them, but I'm getting opposite ends of the country (same ISP) for .104. and .107. If I was that good at teleportation, I would not be wasting time on my site ;)

wilderness

1:06 pm on Sep 6, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Viasat Comm is part of the Hughes Dish network.
IP's and GEO are highly irregualr.

I had some unusual activity from a different Viasat Class A in June.
Just five days later the Class A was cache page as per secondary requests from the same Class A (and B's) that you have specified.

I made NO adjustment for the 75. 104 & 107, however added a denial for the secondary cache IP (all my pages are no Cache).

phranque

1:41 pm on Sep 6, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



that behavior could be explained by the requirements of a fat pipe with high latency such as a satellite-based ISP.

lucy24

9:04 pm on Sep 6, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



a fat pipe with high latency


:: wandering off in search of translation ::

phranque

11:03 pm on Sep 6, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



lots of capacity with a slow response time

lucy24

4:16 am on Sep 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



lots of capacity with a slow response time

Ah. Hence the more-than-thirty in one gulp. Satellite internet is just weird anyway ;)

all my pages are no Cache

No cache at all? Everything, or just the text? So if they go from page A to page B, and then remember they weren't finished with page A and race* right back again, the whole thing has to reload from scratch? I'd set it to something like five seconds, at least.



I just realized a few days ago that my administrative gif wasn't doing what I thought it was doing. Had to go read up on mod_expires, ending up with

<Files "onedot.gif">
ExpiresActive On
ExpiresByType image/gif "access"
</Files>

in its own little htaccess file in the gif's own little directory. That is a belt-and-suspenders fix because it's a small, rarely-used directory populated only by a few other gifs. But with something as dramatic as "expires instantly" (meaning that if the visitor returns to the page two seconds later, there's a fresh request for anything covered by this header) I'm deathly afraid of letting it leak into the whole site ;) It's giving me a lot more information though.

Oh, and that's not the name of the requested file. It's the underlying "real" file that all the requests get rewritten to. It took a bit of trial and error to figure that out. Goes back to last year when I realized that the act of rewriting somehow obliterated the 304 response. So the cache-related header goes with the real file even if the browser has no idea it's being rewritten.



I've been searching around apache and the host's docs and (for comparison purposes) in MAMP, but can't for the life of me find out what the default expiration settings are. Am I looking for something that doesn't exist, so it's all at the whim of your browser?


* Hee, hee.