Forum Moderators: open
Sample entry:
access.log.30.7:64.152.75.20 - - [28/Jul/2002:05:26:34 +0200] "GET /blue/widgets/index.html HTTP/1.1" 206 8585 www.mydomain.com "-" "Scooter/3.2" "-"
It really is only Scooter (and some people using download managers) getting the 206 status code. All others receive 200 OK.
Scooter's been all over my site today. So far, all 200's (or 404's if the file is really gone).
What is the common factor between the download managers and Scooter? Some kind of timeout?
Some questions to think about, for example: Are you detecting/checking User-agents or IP's and doing
anything (scripting) special for them? Are you blocking by excessive requests per IP per second?
Any kind of UA or IP-based redirection? Are you on a shared server, or a dedicated server on a slow
connection? Are your http headers static, or do you modify them?
I'm guessing, obviously... Just looking for anything in the file-serving process that could break
intermittently, or appear to break only under certain circumstances.
Very strange... I'm very interested to see how this turns out.
Jim
My guess is that it's sending range request headers, in this case, since you say the entire content was served back, then it already knew the content length, or it was possibly sending if-range headers.
[w3.org...]
What is the common factor between the download managers and Scooter? Some kind of timeout?
62.11.92.42 - - [02/Jul/2002:23:45:54 +0200] "GET /block.zip HTTP/1.0" 206 496 www.mydomain.com "-" "Mozilla/3.0 (compatible)" "-"
62.11.92.42 - - [02/Jul/2002:23:45:54 +0200] "GET /block.zip HTTP/1.0" 206 496 www.mydomain.com "-" "Mozilla/3.0 (compatible)" "-"
62.11.92.42 - - [02/Jul/2002:23:45:54 +0200] "GET /block.zip HTTP/1.0" 206 495 www.mydomain.com "-" "Mozilla/3.0 (compatible)" "-"
The file itself was 1980 bytes. So that makes perfect sense. But for Scooter to "downloading" HTML files and getting the whole file and still receiving 206s doesn't make any sense.
Are you detecting/checking User-agents or IP's and doing anything (scripting) special for them? Are you blocking by excessive requests per IP per second? Any kind of UA or IP-based redirection?
Are you on a shared server, or a dedicated server on a slow connection? Are your http headers static, or do you modify them?
why might they want to send a range request? To limit the maximum number of bytes on a page that they download and index? To see if the page size has changed?
Thanks for all the answers and suggestions. Still puzzled.
That's why I'm almost sure that the range header must have been sent (unless of course there's something wrong with your server, which I doubt)
To limit the maximum number of bytes on a page that they download and index?
No, it doesn't make much sense. But they could be testing something new, in other words the top range is 101K or something..
OK, Thanks... That was the most useful thing I could think of... This bears watching - to see if
they cap the indexed page size at some max number of bytes in the future. If their referal rate
also picks up, we may have a mad dash start here on WebmasterWorld to reduce page bloat and
move footer links higher up in the code!
Jim
had to check out 64.152.75.20 which resolves to trek13.sv.av.com. Just wanted to make sure that there wasn't any UA spoofing.
I can't think of any other request header (other than range types) that would prompt a 206 response.
Do you have any VERY LARGE pages on your site in question?
There's also a "Scooter_bh0-3.0.3" that got one file OK (the one that's listed in Yahoo. Or maybe it's cause it uses "GET /widget/" as opposed to "GET /widget/index.html").
All files are well below 20K or excluded by robots.txt.
Guess the only question that kinda matters is: will the 206 pages be in the index... When will they update their index?
Would still be interesting to know, why this happens. Asked them, maybe they'll answer...