Forum Moderators: open

Message Too Old, No Replies

Scooter gets 206 Partial Content status code for no reason

         

luma

2:38 am on Jul 29, 2002 (gmt 0)

10+ Year Member



Starting on July, 27th, I got lots of 206 Partial Content status code in my server logs for "Scooter/3.2". But the file size indicates that AltaVista's robot indeed got all of the file. Anyone noticed the same or even having an explanation?

Sample entry:

access.log.30.7:64.152.75.20 - - [28/Jul/2002:05:26:34 +0200] "GET /blue/widgets/index.html HTTP/1.1" 206 8585 www.mydomain.com "-" "Scooter/3.2" "-"

It really is only Scooter (and some people using download managers) getting the 206 status code. All others receive 200 OK.

jdMorgan

3:06 am on Jul 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



luma,

Scooter's been all over my site today. So far, all 200's (or 404's if the file is really gone).

What is the common factor between the download managers and Scooter? Some kind of timeout?

Some questions to think about, for example: Are you detecting/checking User-agents or IP's and doing
anything (scripting) special for them? Are you blocking by excessive requests per IP per second?
Any kind of UA or IP-based redirection? Are you on a shared server, or a dedicated server on a slow
connection? Are your http headers static, or do you modify them?

I'm guessing, obviously... Just looking for anything in the file-serving process that could break
intermittently, or appear to break only under certain circumstances.

Very strange... I'm very interested to see how this turns out.

Jim

bobriggs

5:03 am on Jul 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remember when scooter was playing around with the 404 responses? (requests like GET /aseerwerkweirowerowckxe ?)

My guess is that it's sending range request headers, in this case, since you say the entire content was served back, then it already knew the content length, or it was possibly sending if-range headers.

[w3.org...]

jdMorgan

5:34 am on Jul 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



bobriggs,

OK, just read that... I can understand why AV would want to investigate a site's 404 response, but
why might they want to send a range request? To limit the maximum number of bytes on a page that
they download and index? To see if the page size has changed?

Thanks,
Jim

luma

12:35 pm on Jul 29, 2002 (gmt 0)

10+ Year Member



What is the common factor between the download managers and Scooter? Some kind of timeout?

For example, when downloading .ZIP files, the download managers (e.g., "Star Downloader" will try to increase download speed by splitting the request and downloading using more than one request):

62.11.92.42 - - [02/Jul/2002:23:45:54 +0200] "GET /block.zip HTTP/1.0" 206 496 www.mydomain.com "-" "Mozilla/3.0 (compatible)" "-"
62.11.92.42 - - [02/Jul/2002:23:45:54 +0200] "GET /block.zip HTTP/1.0" 206 496 www.mydomain.com "-" "Mozilla/3.0 (compatible)" "-"
62.11.92.42 - - [02/Jul/2002:23:45:54 +0200] "GET /block.zip HTTP/1.0" 206 495 www.mydomain.com "-" "Mozilla/3.0 (compatible)" "-"

The file itself was 1980 bytes. So that makes perfect sense. But for Scooter to "downloading" HTML files and getting the whole file and still receiving 206s doesn't make any sense.

Are you detecting/checking User-agents or IP's and doing anything (scripting) special for them? Are you blocking by excessive requests per IP per second? Any kind of UA or IP-based redirection?

The answer to all of these questions is "No".

Are you on a shared server, or a dedicated server on a slow connection? Are your http headers static, or do you modify them?

I am using one of the two big German webspace provider. I don't think it's a slow connection. I am not sure about shared/dedicated (probably shared). I don't think I am modifiying headers.

why might they want to send a range request? To limit the maximum number of bytes on a page that they download and index? To see if the page size has changed?

Doesn't make too much sense. If they were doing it right, then they would get 304 Not Modified and not transferring anything.

Thanks for all the answers and suggestions. Still puzzled.

bobriggs

1:54 pm on Jul 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



An unconditional GET with a range request header is always supposed to return 206 by the server. Only if there is an if-modified header will a 304 be returned.

That's why I'm almost sure that the range header must have been sent (unless of course there's something wrong with your server, which I doubt)

To limit the maximum number of bytes on a page that they download and index?

No, it doesn't make much sense. But they could be testing something new, in other words the top range is 101K or something..

jdMorgan

11:18 pm on Jul 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



bobriggs,

OK, Thanks... That was the most useful thing I could think of... This bears watching - to see if
they cap the indexed page size at some max number of bytes in the future. If their referal rate
also picks up, we may have a mad dash start here on WebmasterWorld to reduce page bloat and
move footer links higher up in the code!

Jim

bobriggs

7:20 am on Jul 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



luma - Please keep us posted. I just checked all scooter activity on 5 sites and have yet to see a 206 response in my logs.

had to check out 64.152.75.20 which resolves to trek13.sv.av.com. Just wanted to make sure that there wasn't any UA spoofing.

I can't think of any other request header (other than range types) that would prompt a 206 response.

Do you have any VERY LARGE pages on your site in question?

luma

10:52 am on Jul 30, 2002 (gmt 0)

10+ Year Member



I checked yesterday's log files (29/Jul/2002) and again, 13 entries showing 206 status code and all "Scooter/3.2". The only file "Scooter/3.2" gets a 200 is robots.txt.

There's also a "Scooter_bh0-3.0.3" that got one file OK (the one that's listed in Yahoo. Or maybe it's cause it uses "GET /widget/" as opposed to "GET /widget/index.html").

All files are well below 20K or excluded by robots.txt.

Guess the only question that kinda matters is: will the 206 pages be in the index... When will they update their index?

luma

12:14 am on Sep 7, 2002 (gmt 0)

10+ Year Member



Okay, time for an update. :) Yesterday (September, 6th), Scooter/3.3 visited my server. I count 36 GET requests, 32 are status code 206, the other four are code 200 (2 x robots.txt, and two directories /). Nonetheless, it doesn't seem to matter: AltaVista lists 42 pages for my domain (this includes pages that returned 206).

Would still be interesting to know, why this happens. Asked them, maybe they'll answer...