Forum Moderators: open
And Now Google's Doing It. JS Stats Show GoogleBot
'Hey, this user-agent completely disregards robots.txt, because it helps us out and we feel like it...'
WWW Robots (also called wanderers or spiders) are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages
I haven't decided yet, and it's not on my site, so I don't get to make the final decision ... It's likely I'll end up re-working the stat system to account for their visitor accommodation, since they may not treat the site(s) blocking their non-compliant POS kindly.
It's likely I'll end up re-working the stat system
is not "recursively retrieving linked pages"
...since they may not treat the site(s) blocking their non-compliant POS kindly.
That's not entirely accurate!
Can I show different content in the preview?
A: No. You must show Googlebot and the Google Web Preview the same content that users from that region would see (see our Help Center article on cloaking).
How can I block previews from being shown?
A: You can block previews using the "nosnippet" robots meta tag or x-robots-tag HTTP header.
[edited by: incrediBILL at 7:09 am (utc) on May 16, 2011]
[edit reason] thread clean up [/edit]
I swear I once read on a google page that good robots wait a second or more between each hit. Anyone got a calculator? 100 pickups (exactly!) ÷ 6 seconds = ...
lucy24 (previously in this thread)
[edited by: incrediBILL at 7:23 am (utc) on May 16, 2011]
[edit reason] thread clean up [/edit]
It now seems that your problem was actually caused by your obsolete stat system.
I really don't want to take the time to research it and find out
Are the regulars in here always this nice to new visitors to the Forum?
[edited by: incrediBILL at 7:15 am (utc) on May 16, 2011]
[edit reason] thread clean up [/edit]
Gbot regularly causes 404 errors in my stats
The launch of Google Web Preview clearly made some methods of stats analysis obsolete, unless they were upgraded to take the new bot into account.
[edited by: TheMadScientist at 3:41 am (utc) on May 17, 2011]
the Google Web Preview bot is considered a pre-fetcher which is not subject to robots.txt instructions
Considered by whom?
Can someone sum up if there is an issue with GoogleBot
Considered by whom?
A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
Note that "recursive" here doesn't limit the definition to any specific traversal algorithm; even if a robot applies some heuristic to the selection and order of documents to visit and spaces out requests over a long space of time, it is still a robot.
Usually, the 'best sources' I've found take the time to cite, reference and explain their position, rather than simply posting as 'the authority' on a subject, plus we all 'make mistakes' or 'jump to a conclusion' from time to time, so it's always wise to follow up, imo.
erroneous claims
the robotstxt.org site
erroneous claims
It is unfortunate that your stat system cannot identify a user-agent correctly and was not updated to take Google Web Preview into account six months ago.
It's funny, you keep talking about my stat system like it's old. I only installed this version 2 or so weeks ago, and my guess is anyone who writes a brand new one misses a few at the beginning. (Google's Web Preview didn't even show up right away, which I find interesting, because if it had I wouldn't have 'just let it run' for a couple of weeks without posting.)
...
System Live: Sun, 01 May 2011 06:52:27 -0600 GMT
First Google Web Preview Request: Fri, 06 May 2011 19:12:31 -0600 GMT
...
So far, M$ was first (got them in the first couple days and they were much harder to detect than Google's) and now Google, but everything else I have seems to be 'real visitors' from what I'm seeing, so to date, to my knowledge, to keep your jQuery based JS stats from being 'obsolete' you need to account for 2 bots. One from M$ IP Addresses claiming to be a browser, and the other, Google's Web Preview.
...
The only way to 'see' these visits is to pop open the database, because I don't 'add in' visitors without exit times or 'other missing entries', except on the overall count, so they're not apparent in the live version I watch and compared to the DB for about 8 hours a day the 1st 3 days it launched ... I dropped back by to compare again and as soon as I opened the DB I saw the G requests ... They were so obvious compared to M$ it's not even funny, and honestly, I wish they would do as good a job of running jQuery as M$ does ... They're way behind there, because M$ sends me the HTML variable I'm supposed to have ... My guess is this is a new addition for G.
One person here has claimed that the GoogleBot user-agent was used to fetch disallowed files.
One person here has claimed that a "web standard" was violated.
Apologies if I have misunderstood, but you seem to have initially claimed that your problem was caused by GoogleBot misbehaviour (title of the thread) and then changed to say it was caused by Google Web Preview (which is a rather different animal).
The Google Web Preview bot is considered exempt from the Robots Exclusion Protocol by just about everyone except those belatedly complaining here.
My guess is this is a new addition for G.
we'll have to agree to disagree about what is considered to be standard
This document is an Internet-Draft... Internet-Drafts are draft documents valid for a maximum of six months
Can you point me in the direction of the 'exemption list' or even the 'exemption protocol' so I know what bots are considered to be 'exempt' from the exclusion protocol in the future? I can't find it...
[edited by: incrediBILL at 6:52 pm (utc) on May 17, 2011]
[edit reason] thread clean up [/edit]