Forum Moderators: phranque

Message Too Old, No Replies

translation needed

         

lucy24

2:45 am on Apr 21, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is unspeakably trivial, but it's driving me bonkers. Line from raw logs:

aa.bb.cc.dd - - [18/Apr/2012:08:37:12 -0700] "GET /fonts/images/filename.png HTTP/1.1" 304 173 "-" "Mozilla/4.0 (compatible;)"


The image in question belongs to a page that's got about 30 images

:: detour here to add up and establish that the total is about 60k* ::

many of them with the same datestamp. But this is not one line picked at random. It's sitting there in complete isolation. OK, so maybe the complete package is in a browser cache and the browser got the hiccups and decided to ask for only this one file-- and then realized it didn't need it after all. And the user, if human, has javascript turned off, so piwik doesn't receive a fresh request.

Initial question: What does 304 mean? That is, I know what it means. (Duh.) It means: the file has not changed since the last time you were here.

The part that's driving me bonkers is: There is no last time. The IP has never picked up the file before. Neither has the UA. (Yes, it's obviously bogus. The IP-- which is legitimate-- has used this fake UA before, though not for this set of files. But that's a different story.)

Other, related IPs have seen the file-- with other, normal UAs, and always with javascript enabled.

Question, reworded: What information is getting sent-- by whom, to whom-- that causes the server to return a 304?


* Exact sum depends on whether I go with my pencil-and-paper findings or my calculator findings. But they're only different by 5 bytes.

wilderness

4:33 am on Apr 21, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"Mozilla/4.0 (compatible;)"


lucy,
These have long been the method of UA cache. AOL used it for the longest time, some other obscure nets still use the UA, rather than a standard browser UA.

A begins with Mozilla plus versions
and ends with compatible will stop all that nonsense.

My HEAD thread in SSID is basically the same inquiry.

incrediBILL

4:48 am on Apr 21, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There is no last time. The IP has never picked up the file before. Neither has the UA.


So?

IP alone is meaningless in many circumstances.

Can you say modem pool?

Can you say proxy?

lucy24

5:30 am on Apr 21, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



These have long been the method of UA cache.

Oh, interesting. Pity I can't have a chat with the real UA and ask why they needed that image and none other. It isn't what you'd expect from their most common search. (Language-specific, I now realize.)

A begins with Mozilla plus versions and ends with compatible will stop all that nonsense.

Here I'll stick with IP, because it's really no skin off my nose. It's a small Canadian ISP using four different /19 ranges. Looked them up and turns out they also run servers and-- at least in theory-- some kind of colo facility. But they haven't deigned to tell Whois which IP ranges belong to which. Maybe they're genuinely fluid, so today's human is tomorrow's robot.

The interesting bit is that this specific IP is one of only two in the range(s) that has used that UA-oid. I see it more often picking up an administrative gif associated with a completely different page. Apparently it doesn't realize that the page no longer uses that gif. (The file, or rather its rewrite, still exists.) It just has a shopping list and collects everything on it.

enigma1

10:46 am on Apr 22, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you generate the etags or the server does it for you?

You can take the IP or whatever other request header you want into account and then generate the etag in your scripts. This means you need to use a custom script to serve images. When subsequent requests are made you could verify the validity on the fields you want and on a mismatch decide what to do.

It's possible different ips contain the same cache headers though and spiders are no exception.

lucy24

6:44 pm on Apr 22, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Put it this way: About 90% of the above post was Hungarian to me ;)

I kinda like the idea of a spider emanating from this particular IP. It conjures up pictures of a bored First Nations teenager designing robots in his spare time-- and after he gets it out of his system he'll grow up to do something useful :)