homepage Welcome to WebmasterWorld Guest from 54.204.215.209
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
status 206 - why so many?
Dan99




msg:4609434
 2:46 am on Sep 13, 2013 (gmt 0)

In my Apache logs, I get LOADS of 206-served GETs. For example, some requests, when asking for a few MB file, receive it in one GET. Status 200, and it's done. But in others, it takes many GETs to receive that one file. In a few requests, it takes MANY HUNDREDS of GETs to do it.

Yes, I know that this means that the file is sometimes being served in chunks. But why, oh why is a several MB file sometimes served in kilobyte chunks? My Apache logs get overrun with these GET requests, for one file, by one requester.

I guess I'd just like an explanation. What's going on when this happens? Why does it happen? If I know the requester, what can I tell him/her to do differently? It just makes perusal of my logs somewhat painful.

Thanks, in advance.

 

lucy24




msg:4609453
 6:40 am on Sep 13, 2013 (gmt 0)

If I know the requester, what can I tell him/her to do differently?

Are you on close speaking terms with the requester's browser? Because that's who you have to talk to.

:: detour here to pore over headers associated with 206 responses of my own-- unfortunately only available for page requests ::

Officially there are a couple of different ways to trigger the 206 response. In practice you see it when the request includes a "range" header. It doesn't always reflect a gigantic file: one robot I could name routinely logs a 206 in requests for robots.txt. (Really. Just how big do they think my robots.txt might be?!) Similarly, facebook requests for pages --not images, which you'd think would be vastly bigger-- almost always come through as 206.

Do you have any way of logging the request headers? At multiple MB I assume we're talking about something other than pages. The only question the server can really answer is whether it's sending out correctly sized chunks. For example if the Range header says 0-50000 and the actual file is much bigger, is each response about 50k? If it's significantly smaller the server is doing something wrong, but otherwise it's doing exactly what it has been told to do.

:: memo to self: check whether piwik's .svg file-- whose existence I never suspected-- still elicits a 206 if I connect with a different browser ::

Dan99




msg:4609516
 1:26 pm on Sep 13, 2013 (gmt 0)

This is interesting, thanks.

Well, I might be on speaking terms with the requestor, who might be on speaking terms with his/her browser ...

I don't know how to log the request headers.

Now, I understand that this is probably where the requester has commanded download of multiple files, and the way the browser handles that is to do it incrementally from each in pieces. I have no problem with that. But two hundred 10K pieces? That just doesn't make a lot of sense. Is the requester downloading five hundred files at a time? It just doesn't seem that highly fractionating a task should make it very efficient.

wilderness




msg:4609536
 2:26 pm on Sep 13, 2013 (gmt 0)

But two hundred 10K pieces? That just doesn't make a lot of sense.


FWIW, two hundred 10K pieces is likely a mobile device (i. e., cell phone) and 1) you have no business allowing those devices to download such large files and 2) the user attempting to download such large files on such a small device has their head so far up their backside that they are of no benefit to you or your website (s).

Dan99




msg:4609613
 6:30 pm on Sep 13, 2013 (gmt 0)

That's a fascinating point. Mobile devices, eh? I had no clue. I presume they do that because the data rate is so low, they want to assure success in small pieces, rather than risking a big piece? How exactly do I disallow that?

lucy24




msg:4609626
 7:20 pm on Sep 13, 2013 (gmt 0)

In your first post you said you were getting the information from logs. This is a great starting point because it means you've got the full User-Agent information right there in front of you. What do you see?

:: counting on fingers ::

200 x 10K = one 2MB file
I am inclined to agree with the rest of wilderness's eloquent assessment.

Logging headers is straightforward in requests for pages, because you can shove the function into your footer code. I use a simple version that was posted here by, I think, incrediBill. I don't know how to do it with non-page requests, but that's not to say it can't be done.

Edit: You could put in a user-agent block on requests for the relevant files. But I think it would be more appropriate to look at the value of the Range header-- this can be done in mod_rewrite, mod_setenvif and possibly other places-- and disallow the file based on header content. As an alternative to outright disallowing, you could redirect requests to a custom page explaining why it won't work.

Dan99




msg:4613644
 8:29 pm on Sep 29, 2013 (gmt 0)

OK, let me reopen some discussion about this.

I have many different kinds of documents posted on my website for download. The only ones that end up with LOTS of 206-accesses are pdfs.

Yes, some ppts take a few accesses to get the whole thing, as do some mp3s, but this business with a hundred or more consecutive accesses, in 32K or 64K chunks, with 206 status code, is all about pdfs. All are of the same order of size -- about 2-10 MB.

As to mobile devices downloading small pieces, I've tried it with an iPhone and, at least with the iPhone connected to a local router, it gets them each in one fell swoop. That's not it.

So I still have the question. In my Apache logs, why do I often have GETs that require a vast number of accesses to get the whole thing? What makes that happen? These incomplete accesses make up about 20% of the lines in my Apache logs.

lucy24




msg:4613688
 2:26 am on Sep 30, 2013 (gmt 0)

Find out if they're sending a "range" header. Even if you can't easily log headers, you can ask mod_rewrite or mod_setenvif to take a look. Examples:

Range: bytes=0-262144
Range: bytes=0-524287
Range: bytes=0-50000

:: detour here for business with calculator ::

Oh, I see. Not multiples of 1K, but 2^18 and 2^19, respectively. (Why those values? Oh, who knows.)

:: further detour to pore over logs ::

:: pause for D'oh! moment as I realize that logheaders isn't recording image requests up front, it's getting triggered by the 403 page (which the robot never reads, but the server dutifully sends out anyway, regardless of original request's filetype) ::

:: continued scrutiny looking for something that's neither facebook nor a blocked robot ::

Range: bytes=0-
Host: www.example.com
Connection: close
User-Agent: AndroidDownloadManager


What on earth does "0-" mean? Wouldn't it make the server go haywire trying to guess how big a chunk it can send out?

:: final detour to apache docs to check wording ::

If a HTTP header is used in a condition this header is added to the Vary header of the response in case the condition evaluates to to true for the request. It is not added if the condition evaluates to false for the request. Adding the HTTP header to the Vary header of the response is needed for proper caching.

Someone else will explain that, in case it turns out to be important.

RewriteCond %{HTTP:Range} bytes=0-\d{0,5}$
RewriteRule \.pdf http://www.example.com/special-info-page [R=301,L]


I don't know whether that actually works; I only know it won't crash the server. The intention is to intercept any request with an unreasonably small Range header.

Depending on who's making the request, and why, they may or may not accept an html response. It may be safer to make a tiny little special-info-page.pdf of just a few K, giving the needed information and a link back to your site.

Dan99




msg:4613782
 1:42 pm on Sep 30, 2013 (gmt 0)

OK, let's get specific here. Here is an example of a string of requests from my Apache log, with one IP asking for one file, which is a 4.2MB pdf file. (IP changed to protect the innocent ...). My understanding is that after these 65 requests, the requester got the whole file.

This string was only 65 requests. What you see below is everything from 999.999.999.999 requesting this one file.


999.999.999.999 - - [29/Sep/2013:09:53:15 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 200 139264
999.999.999.999 - - [29/Sep/2013:09:53:16 -0500] "GET /favicon.ico HTTP/1.1" 200 318
999.999.999.999 - - [29/Sep/2013:09:53:16 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:17 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 28278
999.999.999.999 - - [29/Sep/2013:09:53:17 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:17 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:18 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:18 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:19 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:19 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:19 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:19 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:19 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:19 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 262144
999.999.999.999 - - [29/Sep/2013:09:53:20 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:20 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:20 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 131072
999.999.999.999 - - [29/Sep/2013:09:53:20 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:21 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:21 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:21 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:21 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:22 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:22 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:22 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:22 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:22 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 131072
999.999.999.999 - - [29/Sep/2013:09:53:23 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:23 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:23 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:23 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:23 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:24 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:24 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:24 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:25 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:25 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:25 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:25 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:25 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:25 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:26 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:26 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:26 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:27 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:27 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:27 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:27 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:27 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:28 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:28 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:28 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:29 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:29 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:29 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:30 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:30 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:31 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:32 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:33 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:34 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:35 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:35 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:36 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536
999.999.999.999 - - [29/Sep/2013:09:53:37 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 65536

Lots of other requesters get this file with one 200-status request. But this is one example of many, where a file was served to one requester in tiny chunks of (mostly) 65K.

The question is simple. Why do some requesters need the file to be served in tiny chunks, and some don't? As I said, an iPhone pulls it down in one piece. If I knew who 999.999.999.999 was, I guess I could ask them. But I don't.

I ONLY see these long strings of requests for pdf files. I never see such longs strings of requests for ppt files of the same size. Yes, that could implicate portable devices, in that many portable devices can display pdf, but not ppt.

lucy24




msg:4613805
 4:13 pm on Sep 30, 2013 (gmt 0)

The question is simple. Why ...

Simple, yes. Easy, no.

My understanding is that after these 65 requests, the requester got the whole file.

Was the last request an odd size?
999.999.999.999 - - [29/Sep/2013:09:53:17 -0500] "GET /~memyselfandi//mybigfile HTTP/1.1" 206 28278
That's the one I would expect to see at the end of the list.

many portable devices can display pdf, but not ppt

Until I tried it out (I just remembered I have a tiny pdf of my own online) I hadn't realized the iPad displays pdfs transparently.

:: wandering off unhelpfully to find out what in ### .ppt is ::

Dan99




msg:4613813
 4:41 pm on Sep 30, 2013 (gmt 0)

You're right. It is, in fact, very funny that the last request wasn't an odd size. I had wondered about that. Does this mean that the requester killed the request before it was complete? In fact, now that I look more carefully, most, but not all of the time I get these long strings of fractionally served requests, the last GET is NOT an odd size. That is a bit strange.

ppt is Powerpoint. I believe you can get an App for iPhone that displays those files, but iPhones do not do so by themselves. Not sure how common it is for other portable devices to display ppts files. I'm assuming they don't, and that's why, if these are requests from portable devices, no one is requesting them. (I get plenty of requests for ppt file, but those are served in one or a few GETs.)

Now, these multiple requests don't really take a lot of effort from my server, but they sure do add a load of crap to my log files.

Dan99




msg:4617395
 4:03 pm on Oct 17, 2013 (gmt 0)

OK, just to get some closure here. The reason for all those 206 requests, for tiny chunks of a file at a time, is now resolved. Curiously, this only seemed to happen for pdf files.

That's what happens when pdf-display enabled browsers use the "fast web view" option on pdf files. That option allows the browser to download a page at a time, so one can start looking at the document before it is completely downloaded. It's handy for the person viewing the document, but it sure fills the Apache log on the server with pages of crap if the documents are lengthy.

How to keep from getting loads of 206 requests in your Apache log? Simple. Just don't post pdfs for download with "fast web view" enabled.

lucy24




msg:4622818
 4:48 pm on Nov 12, 2013 (gmt 0)

:: bump ::

Adding to this thread just so's I will remember where I parked the information. I recently had occasion to ask a group of random people to look at a couple of pdf files, about 500k and 1.1mb respectively.

Holy ###. I thought my logs were going to explode. A handful of visits picked up each file in a single 200 fetch. But most took 10, 20 ... I think the record is 50 requests between the two files.

The one-page-at-a-time explanation (above) makes it all understandable. Interestingly, the worst offenders were Firefox. Conversely, the all-at-a-gulp requests included back-to-back MSIE 8 and iPad. Go figure. And the timing between requests corresponds well with humans riffling through pages looking for a particular target.

I wondered briefly why I never saw the same behavior in a clutch of large ebooks I offer as PDFs ... until I remembered that they are only available as downloadable zip files. No chance for the reader to stop midway.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved