Forum Moderators: Robert Charlton & goodroi
I have an "who's online" script running and reporting visitors in real time as they browse the site. The script uses javascript to report referrer and currently viewed page.
One of the visitors today was from 66.249.72.180 [google.com] sucking approx 200 pages in a 15-20 minutes frame and invoking the on page tracking javascript....sending referring page and current URL information (it had to invoke javascript just like a browser would to do that). In other words acting very much like it is a full on browser.
Maybe it is already old news to some, maybe i missed previous topics disscussing this. Anyway, it is the first time i see this thing in real time. Very interesting crawl development.
Anyone else noticed this ?
P.S.
Page to page browsing was occuring at a very fast rate, could not be human.
or it might be one of Google's human evaluators looking at your site
I've been watching this thing in real time as it browse the site (was watching using an "who's online" type of script). The speed in which it browse pages was way too fast for humans (read: evaluators). At times, loading 2-4 pages p/second. Also, IP Reverse DNS shows "crawl-66-249-72-180.googlebot.com".
It behaved like a browser and fully invoked javascript (including javascript encrypted tracking code...as far as i know only browsers can do that). It was sending referrer information, occasionally starting with a google query page and advancing throughout the site.
Gobbled approx 200 pages in a 20 minutes frame....that's one hell of a dedicated speedy evaluator if you ask me. Nope...this thing was a bot. A very clever bot.
[edited by: Web_speed at 3:01 am (utc) on Oct. 8, 2008]
I'd say this was a special circumstance - someone at Google running an automated tool to check out your website.
Side note:
This web site lost almost all of it's google traffic starting around mid August. Hasn't recovered fully since. Some pages still appearing on google serp on pages 50 to 120 though.
Side note2:
No adsense on this website and never had any.
Re:
but has the behavior continued
I am using statcounter.com for tracking and interestingly enough, this bot did not trigger their tracking code (javascript)...in other words i can not see any of it's activity at "statcounter.com". Stealth, like it was designed NOT to execute or ignore their tracking code (and probably many other popular tracking code) yet i was able to fully see it live online via javascript code plus i can fully see it's activity over my raw server logs. Very weird...i know.
IMO, must be an automated browser tool OR new/experimental Chrome based bot.
[edited by: Web_speed at 5:44 am (utc) on Oct. 8, 2008]
Hmmm...(donning tin foil hat) Could it be that Google wants to see what's on Javascript Heavy pages their other bot does not see? Wanting MORE information? Hmmm?
LOL, Naa....this thing was indexing pages like a bot would. It was just a NEW different type of bot. I strongly suspect a chrome based thingy.
Now that we know Google has Chrome and has released is AS A BROWSER and we start to see "chrome" and "webkit" in UAs acting like bots, but MIGHT BE browsers, are we going to block it for bot-like activities?
Haven't seen this at my site yet so can't do more than offer a speculation in that regard. I have little to no javascript on my site from the get go so don't expect to see much... but any browser grabbing 2-5 pages/second does get my attention.
are we going to block it for bot-like activities?
[edited by: GaryK at 6:19 am (utc) on Oct. 8, 2008]
Testing of a new browser is a truly monolithic undertaking, but Google's vast database of websites past and present is pretty much the most comprehensive source of such information on the planet, which is allowing the developers to perform testing at a stupendous rate. Every week, "Chrome Bot" tests millions of pages, returning performance results that the development team would normally wait months to get from human beta testers.
Note: "Chrome bot"
That would be a shocker, because Google has given assurances that their crawler will always follow "googlebot" rules.
And the dentist always said it wouldn't hurt a bit ..
[bluesky]
Just extrapolating a little ..lets say that after the current financial and political dust has settled ..say December or early next year ..If G gets what it can call a friendly administration in the US and given G has strong corporate presence in the EU and good working relationships with most governments worldwide ..
Wheras MS is basing itself in Norway ..not an EU state ..so will be missing some "levers" ..and already is not liked by the politicians in the EU ..
I wonder if G might not be thinking of lobbying ( both sides of the atlantic ) for MS to be forced to ship "chrome" on it's desktop installs ( like we used to have netscape way back in the day on fresh 95 and 98 )..the argument would be that it was open source and thus in the interests of greater choice ..I'm sure it would fly in the EU corridors of power ..and in a relooked USA ..
Anticipation of such a strategy would certainly explain why they would be crawling ( instealth mode ) using a bot version of their browser ..so as to have all their ducks in a row when presenting their case ..even if they get others to present it for them ..
[/bluesky]
"Chrome" in the UA... block it or not? Browser or bot? That's the question (sooner or later). Where'd I put that tin foil hat? Had it just a minute ago!
But seriously, I'm not amused there's a bot AND browser named chrome. Get ripped enough by true bots without having to deal with one masquerading as a browser. Already ticked about some browsers with prefetch activated and skewing hit reports as it is.
The Chrome browser itself uses:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 Safari/525.13
Does the bot use "ChromeBot" somewhere in the string, or just the ordinary browser user-agent?
Ted, that's such an obvious question. I feel really stupid for not asking it myself. Oh well. ;) It would be nice to know what the UA is in Web_speed's case. I'd like to know if there is a ChromeBot. So far I've seen 25 unique UAs from Chrome and none from ChromeBot.
I emailed the ISP and requested them to grant me access or email me the raw logs for the required time frame. It is a special favour request and i hope they will be kind enough to provide it. Will updated as soon as i have a reply.
Would very much like to put my hands on this information too.
[edited by: Web_speed at 1:43 am (utc) on Oct. 9, 2008]
I also love that Google has a “ChromeBot” which takes each new browser build and throws (put your pinky finger to your lips) one million webpages at the build as a torture test. That testing virtually guarantees that everyday web pages shouldn’t crash your browser.
I've looked everywhere for a ChromeBot UA and can't find one.
I've removed a few other citations people posted from second-hand, third-hand, whatever-hand sources since the informatiion is not verifiable and contained no citations from Google. Let's stay away from that kind of thing for the remainder of the thread.
will someone please look at the link I posted and tell me if you see any comments/questions from GaryK?
I'm getting a 403, "The website declined to show this webpage".
Still waiting for my ISP to reply. Going to call them Monday morning. I am itching to check that darn server log.
I checked the "who is online" feature in the backend of my webshop today and saw the google crawler feeding random keywords to my search.
First I thought it was only following links of searches customers had perhaps posted elsewhere but this was not the case.
The crawlder did not only fill in nouns but also adjectives and verbs.
Like:
advanced_search_result?keywords=widget
advanced_search_result?keywords=blue
advanced_search_result?keywords=effective
advanced_search_result?keywords=produces
I even noticed a word that had a misspelling in it and when I checked I noticed it was indeed misspelled on my website and when I searched google with the misspelling my website was the only one to be found.
So google must have crawled my website, created an index of words and is now feeding the words to my search, creating new pages. I could however not find any of the newly created pages in Googles Index.
So I am wondering what they are doing. Has anyone else noticed this? Or is this yesterdays news? However I have never seen this before in my logfiles or on my "who is online" page.
[edited by: Robert_Charlton at 6:57 pm (utc) on Oct. 20, 2008]
[edit reason] moved from another location [/edit]
Although Google announced it last spring, they've been working this approach at a low volume for quite a while.
I can't help but notice the similarity between your observation and what Web_speed reports in the post that leads off this thread...
jecasc
I checked the "who is online" feature in the backend of my webshop today and saw the google crawler feeding random keywords to my search.
Web_speed
I have an "who's online" script running and reporting visitors in real time as they browse the site. The script uses javascript to report referrer and currently viewed page.