Page is a not externally linkable
lucy24 - 10:25 pm on Mar 29, 2012 (gmt 0)
Are these accesses you're talking about from the bot or these are due to manual review or another validation mechanism.
Can we stipulate that g### is not manually reviewing my site?
:: detour to random chunk of logs, containing (look at the proportions, not at the absolute numbers) from 66.249.nn.nn ::
I'll be ###. Google is evolving before our very eyes. Even two months ago when I was watching robot behavior closely, this is not the pattern I would have seen.
1 request for sitemap from Googlebot
6 requests for robots.txt from Googlebot (if it had been bingbot, there would have been 60 :))
62 pages:
--47 from the regular Googlebot
--14 from Googlebot-Mobile
--watch this space
90 images:
--28 Googlebot-Image with no referer
--54 Googlebot with human-style referer
--watch this space
5 stylesheets:
--3 Googlebot with human-style referer
--watch this space
Has everyone figured out what goes in the missing spaces?
66.249.17.123 - - [02/Mar/2012:08:49:35 -0800] "GET / HTTP/1.1" 200 1944 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0)"
66.249.17.123 - - [02/Mar/2012:08:49:35 -0800] "GET /sharedstyles.css HTTP/1.1" 200 0 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0)"
(I kinda think this was a mechanical glitch. Filesize 0?!) 66.249.17.123 - - [02/Mar/2012:08:49:35 -0800] "GET /sharedstyles.css HTTP/1.1" 200 2589 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0)"
66.249.17.123 - - [02/Mar/2012:08:49:35 -0800] "GET /images/WorldsHeadline.png HTTP/1.1" 200 2589 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0)"
66.249.17.123 - - [02/Mar/2012:08:49:35 -0800] "GET /images/FunnyFace.jpg HTTP/1.1" 200 5042 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0)"
(et cetera for the remaining 6 images that live on this page)
Doesn't that MSIE 7 business look just like bing? As long as they keep swiping ideas from google, it's only fair for google to turn around and swipe an idea from them.
If the front page had happened to use any .js files, they would have been picked up too. But not by 66.249; js goes to 74.125. Which, incidentally, seems to be turning into g###'s poor relation. When it isn't going around with no clothes at all as the faviconbot, you might find it dressed like this:
74.125.19.35 - - <snip> "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0"
or like this, in its mysterious Preview-less Preview costume:
74.125.63.33 - - <snip> "GET /piwik/piwik.js HTTP/1.1" "http://www.example.com/filename.html" 200 20113 "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.46 Safari/535.11"
If it had not been in plain clothes, it would have got a 403 slammed in its face. Time to fine-tune the htaccess.