Forum Moderators: phranque

Message Too Old, No Replies

repeating mp3 downloads

         

Dan99

4:50 pm on Feb 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'm not sure if this is the right forum to get some insights on this, but Ill try. I have a website where I post the audio (mp3s) from hour-long professional meetings. In looking at my logs, I frequently see those mp3 files being downloaded repeatedly, multiple times (in the dozens!) I've always assumed that was a matter of malicious activity.

But in talking to someone who's IP was evidently responsible for one of those incidents, he said that he was just listening to it by streaming it in his browser. Maybe his connection was a bit flaky because, to him, the audio kept pausing and restarting.

Now, what is mysterious is that according to my Apache log these IPs are served the WHOLE mp3 file (with an Apache code of 200) each and every access. So why doesn't my system serve them back a 304 when the access has already been once completed? Goodness knows they must have the #$%^&*( thing in their cache.

Of course, those who just download the file and then listen to it afterwards only need get it once. That works.

I'm confused. In serving a 10MB mp3, my system is often getting hit up for a good fraction of a gigabyte of downloads. Is this a matter of how their browser is set up?

not2easy

6:35 pm on Feb 15, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Maybe if you zip the file so it needs to be downloaded and accessed locally you could avoid the additional bandwidth use. Many hosting environments prohibit using their servers for delivering streaming content for that reason and it may be automatically restarting with every access after the server connection times out.

Dan99

6:49 pm on Feb 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



That's an interesting idea about zipping it. No browser is going to try to stream a zip file.

Now, if a hosting environment prohibits streaming, and does that prohibition by making it so the file has to be downloaded multiple times, that doesn't make a lot of sense, does it?

lucy24

9:04 pm on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Now, what is mysterious is that according to my Apache log these IPs are served the WHOLE mp3 file (with an Apache code of 200) each and every access. So why doesn't my system serve them back a 304 when the access has already been once completed?

Don't just look at the response code. Look at the size in bytes (the number immediately after the code in logs). Does the number correspond to the whole physical filesize?

I assume any given mp3 file never changes. So make sure you're setting an expiration header far, far in the future.

If they're streaming, I'm surprised the response is not a series of 206s. That's what you get in large pdf files; the browser downloads in chunks so they can start displaying content right away.

Dan99

10:15 pm on Feb 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Yes, the number after the 200 response code is the same (large!) file size on each GET. It's exactly the whole physical file size of the mp3 file it's looking at. That's what I'm struggling with.

Now, as to 206's, I've specified "Accept-Ranges" as none, so I don't think I'll get any 206s. Hmmm. Could that be why streaming mp3s are having trouble?

not2easy

10:20 pm on Feb 15, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The prohibition means you don't allow streaming files to be served. They aren't going to prevent it, just expect that you won't be doing it if it is in their restrictions. Your host may not have that restriction, but most have it in the agreement you signed to use their services.

Dan99

11:06 pm on Feb 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Well, I manage my own server, so it's all under my control. The internet line I'm connected to has, I believe, no such restrictions, as others using that line do video streaming regularly.

lucy24

1:41 am on Feb 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've specified "Accept-Ranges" as none, so I don't think I'll get any 206s. Hmmm. Could that be why streaming mp3s are having trouble?

It sure could be. Why not try commenting-out the line for a day or so and see what changes? It depends whether you do or don't want to permit streaming. If you do, accepting partial requests would have to be part of the package; if you don't, zip everything up so they couldn't stream if they wanted to.

Dan99

2:56 am on Feb 16, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Yes, I will do that. That's a good experiment. Now I put it in because the large number of 206 requests I get was cluttering my logs, and it was a simple way to suppress them. I guess that doing that may not reduce the number of entries in my logs, but could very much reduce the volume of data I'm serving!

phranque

6:45 am on Feb 16, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



what does the User Agent string look like?
perhaps it's not a browser but some other (non-caching) client making the request.

what are your caching-related HTTP Response headers?

i would next try logging the caching-related HTTP Request headers to see if the UA is sending them.

Dan99

2:02 pm on Feb 16, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



This happens with many clients, with different IPs and different user strings.

I'm not sure what you mean by "caching-related response headers". How do I log them?

Dan99

3:08 pm on Feb 16, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Nope, allowing ranges appears not to have helped the situation. This is what I got this morning that looks like what I've been seeing all along. Here's an anonymized piece of my log, with 6 identical downloads of an mp3 file. I often see lots more identical downloads. As I said, I see these clusters of identical downloads with many different user agents. I have not established, however, any potential commonality of those user agents.

1.1.2.7 - - [16/Feb/2015:06:57:22 -0500] "GET /x/y.mp3 HTTP/1.1" 200 9123048 "http://memyselfandi/telecon/x/y.mp3" "Mozilla/5.0 (Linux; Android 4.4.2; SM-G900T Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.109 Mobile Safari/537.36"
1.1.2.7 - - [16/Feb/2015:06:57:23 -0500] "GET /x/y.mp3 HTTP/1.1" 200 9123048 "-" "Mozilla/5.0 (Linux; Android 4.4.2; SM-G900T Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.109 Mobile Safari/537.36"
1.1.2.7 - - [16/Feb/2015:06:57:27 -0500] "GET /x/y.mp3 HTTP/1.1" 200 9123048 "-" "Mozilla/5.0 (Linux; Android 4.4.2; SM-G900T Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.109 Mobile Safari/537.36"
1.1.2.7 - - [16/Feb/2015:06:57:31 -0500] "GET /x/y.mp3 HTTP/1.1" 200 9123048 "-" "Mozilla/5.0 (Linux; Android 4.4.2; SM-G900T Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.109 Mobile Safari/537.36"
1.1.2.7 - - [16/Feb/2015:06:57:34 -0500] "GET /x/y.mp3 HTTP/1.1" 200 9123048 "-" "Mozilla/5.0 (Linux; Android 4.4.2; SM-G900T Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.109 Mobile Safari/537.36"
1.1.2.7 - - [16/Feb/2015:06:57:37 -0500] "GET /x/y.mp3 HTTP/1.1" 200 9123048 "-" "Mozilla/5.0 (Linux; Android 4.4.2; SM-G900T Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.109 Mobile Safari/537.36"

So someone appears to have needed to download the same mp3 file six times to get it right.

Dan99

3:23 pm on Feb 16, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I will add that when I first saw these clusters of downloads, I thought it was malicious mischief. But if you're going to do malicious mischief, why stop at six? I should also add that the IPs that do these clustered identical downloads don't appear to be suspicious in any way. The clue, for me, as I said, was when someone I know did such a cluster of downloads, and it was done when they tried to stream the mp3.

Dan99

9:11 pm on Mar 1, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



To the extent that what I've been seeing here is just people streaming mp3s, would this be a way of denying such streaming? That is, download it, don't stream it.

AddType application/octet-stream .mp3

Will this work for any browser?

lucy24

11:57 pm on Mar 1, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How would that stop streaming? It seems as if you'd need an additional rule somewhere else.

Dan99

12:21 am on Mar 2, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Well, so it is said, in a number of posts online. My question to you is whether this makes any sense. See for example

[webmasterworld.com...]

It's also not clear to me why it should deny streaming, which is partly why I asked.

[edited by: phranque at 2:35 am (utc) on Mar 2, 2015]
[edit reason] replaced blog link with relevant WebmasterWorld thread [/edit]

lucy24

1:23 am on Mar 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



OK, I looked it up. Which, ahem, I should have done in the first place. It looks as if "octet-stream" is one of those willfully misleading terms: although it sounds as if it means "this is streamable" it really means "this is NOT streamable". Or, formally, "I decline to state what this is, so you'll have to download it".

So, yeah, that should work. At least unless the user has manually edited their browser prefs; I'm not sure if that would override a mime-type declaration.

Edit: If you only want this to apply in some areas-- for example if you've got a scattering of smaller MP3s that users are perfectly welcome to stream-- put a supplementary htaccess in the directory where the big files live.

phranque

2:34 am on Mar 2, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



you might also want to try the Content-Disposition header:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1

If this header is used in a response with the application/octet- stream content-type, the implied suggestion is that the user agent should not display the response, but directly enter a `save response as...' dialog.

Dan99

3:36 am on Mar 2, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you. That's a lot more explanation that I was able to extract out of any of these other posts. Yes, now that I investigate more closely, it would seem that "octet-stream" basically means "arbitrary data, so don't assume you can stream it." As you say, if the browser is instructed what to do with it, it's possible that it could be commanded to do something specific with it.

I need to set this up and see what it does to these repeating mp3 GETs.

phranque

8:00 am on Mar 2, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'm not sure what you mean by "caching-related response headers". How do I log them?


use something like the Live HTTP Headers FF plugin or the WebmasterWorld Check HTTP Response Headers Tool:
http://freetools.webmasterworld.com/tools/fetch-header/ [freetools.webmasterworld.com]

not2easy

1:12 pm on Mar 2, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you use a Live Headers type tool, you need to try streaming it yourself to see the file requests and server responses. There's an old post here that tells you how to log headers from visitors: [webmasterworld.com...] in Msg#: 4538583 which is the fifth comment in that discussion.

Dan99

1:54 pm on Mar 2, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Well, I put

php_value auto_prepend_file "~/logheaders.php"


in my .htaccess file just to test it out, and I get nothing. That logheaders.php file never appears in my home directory, even as Apache commands are being processed. What's the trick?

not2easy

2:45 pm on Mar 2, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Did you create the file as shown and add it to your site? The line in .htaccess just includes the file as part of each request, but the php file shown in that comment needs to be on your site. I have used it.

Dan99

3:16 pm on Mar 2, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



OK, pardon my cluelessness, but I now have

php_value auto_prepend_file "~/Sites/logheaders.php"


in my ~/Sites/.htaccess file, and

<?php

$ip = get_server('REMOTE_ADDR');

$fh = fopen("headers-". date('Ymd') . ".log","a");
fwrite($fh, "IP: $ip\n");
foreach (getallheaders() as $name => $value) {
fwrite($fh, "$name: $value\n");
}
fwrite($fh, "----\n\n");
fclose($fh);

?>

in the file ~/Sites/logheaders.php, which I also made executable. I don't see any headers-YYYYMMDD.log file getting written. BTW, when I try to execute that php file directly (I'm using Bash), I just get errors.

not2easy

5:01 pm on Mar 2, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



No, I'm the clueless one, it would only work if the files were accessed via a link, but requesting them directly, it probably won't prepend the .php file to the .mp3 file. The suggestion phranque offered, to use Live Headers tools can at least show you the request response sequence for your own browser. I should never post before coffee, my apologies.

lucy24

5:12 pm on Mar 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Heh, that code looks familiar. In my version, I invoke logheaders.php as part of the html footer. But of course that only works on files that have a footer, i.e. pages.

:: detour to apache dot org ::

Does "auto_prepend_file" work on requests that don't call for php in the first place? My impression was that it doesn't. Well, actually my impression is that apache and php are glaring at each other saying "It's his problem, not mine".