Page is a not externally linkable
lucy24 - 7:51 pm on Dec 25, 2012 (gmt 0)
Would have slipped right under the radar if it hadn't been for that one request for a nonexistent file. I'm inclined to think robot, but I really really don't like robots that act like humans.
Log dump:
199.190.46.141 - - [23/Dec/2012:05:05:42 -0800] "GET /directory/paston/ HTTP/1.1" 200 5134 "http://www.google.co.in/search?q=PASTON+LETTERS.pdf&hl=en {snip, snip} &start=10&sa=N" "JUC (Linux; U; 2.3.6; zh-cn; GT-B5512; 240*320) UCWEB7.9.0.94/139/355"
IP = ChinaCache, agrees with system language; I don't know what UCWeb is but it's got something to do with the IP range
search = Google India
query + startpage = correct (that is, I'm at the top of the 2nd page in google India using a different browser)
UA = who knows, but size says phone of some kind
... 05:05:42 ... /pastonstyles.css HTTP/1.1" 200 10893 "http://www.example.com/directory/paston/" ...
... 05:05:42 ... /piwik/piwik.js HTTP/1.1" 200 21928 "http://www.example.com/directory/paston/" ...
... 05:05:43 ... /images/bracket.gif HTTP/1.1" 200 490 "http://www.example.com/directory/paston/" ...
... 05:05:43 ... /images/bracket4.gif HTTP/1.1" 200 518 "http://www.example.com/directory/paston/" ...
... 05:05:43 ... /images/bracket_rt.gif HTTP/1.1" 200 489 "http://www.example.com/directory/paston/" ...
... 05:05:43 ... /images/bracket_tall.gif HTTP/1.1" 200 579 "http://www.example.com/directory/paston/" ...
Yes, the (shared) stylesheet really is bigger than the index page, though some of the size difference is due to (I guess) compression at the server end.
Q: Why did I highlight all those images?
A: Because they are not called by, or even used by, the requested file. They are background images from the stylesheet.
... 05:05:43 ... /images/signature.png HTTP/1.1" 200 3998 "http://www.example.com/directory/paston/" ...
... 05:05:43 ... /images/bracket_tall_rt.gif HTTP/1.1" 200 581 "http://www.example.com/directory/paston/" ...
... 05:05:43 ... /images/sharedtitle.png HTTP/1.1" 200 9825 "http://www.example.com/directory/paston/" ...
... 05:05:43 ... /images/sig120.png HTTP/1.1" 200 2503 "http://www.example.com/directory/paston/" ...
... 05:05:43 ... /images/dots.gif HTTP/1.1" 404 912 "http://www.example.com/directory/paston/" ...
Q: Why did they ask for a nonexistent file?
A: Because ::cough-cough:: I haven't got around to cleaning up the stylesheet. This is another background image; it happens not to be used yet-- but might show up in one of the remaining volumes in the set.
... 05:05:43 ... /images/shield.png HTTP/1.1" 200 2645 "http://www.example.com/directory/paston/" ...
... 05:05:43 ... /favicon.ico HTTP/1.1" 200 662 "-" ...
Robots never ask for the favicon-- except of course for google's faviconbot, and all those phony SEO sites.
... 05:05:43 ... /piwik/piwik.php?action_name=The%20Paston%20Letters& {snip, snip} &res=800x600 HTTP/1.1" 200 362 "http://www.example.com/directory/paston/" ...
Q: How come the res listed here doesn't match the res given in the UA?
A: I dunno, you tell me.
... 05:06:10 ... /zips/paston2.html.zip HTTP/1.1" 200 310750 "http://www.example.com/directory/paston/" ...
... 05:06:10 -0800] "GET /piwik/piwik.php?download {snip, snip} &res=800x600 HTTP/1.1" 200 362 "http://www.example.com/directory/paston/" ...
Note plausibly humanoid time lapse. Note also that they downloaded a zipped html instead of the pdf they originally searched for (and which I do have).
Where's that "noidea" smiley when you need it?