| 11:26 pm on Oct 7, 2008 (gmt 0)|
Sounds like someone at Google might have run an automated tool through their browser. I've never seen anything like this in the logs I look at - if it continues, I'd be surprised.
| 12:30 am on Oct 8, 2008 (gmt 0)|
or it might be one of Google's human evaluators [webmasterworld.com] looking at your site
| 2:21 am on Oct 8, 2008 (gmt 0)|
|or it might be one of Google's human evaluators looking at your site |
I've been watching this thing in real time as it browse the site (was watching using an "who's online" type of script). The speed in which it browse pages was way too fast for humans (read: evaluators). At times, loading 2-4 pages p/second. Also, IP Reverse DNS shows "crawl-66-249-72-180.googlebot.com".
Gobbled approx 200 pages in a 20 minutes frame....that's one hell of a dedicated speedy evaluator if you ask me. Nope...this thing was a bot. A very clever bot.
[edited by: Web_speed at 3:01 am (utc) on Oct. 8, 2008]
| 3:07 am on Oct 8, 2008 (gmt 0)|
I'm sure you're right that it was automated - but has the behavior continued, and do others see it on other sites? That would be a shocker, because Google has given assurances that their crawler will always follow "googlebot" rules.
I'd say this was a special circumstance - someone at Google running an automated tool to check out your website.
| 4:52 am on Oct 8, 2008 (gmt 0)|
Very possible. But here's an interesting question. Why index approx 200 pages?. I understand an auto check for 20,30,50,90 pages....why spend the time and resources to re-index approx 200 pages.
This web site lost almost all of it's google traffic starting around mid August. Hasn't recovered fully since. Some pages still appearing on google serp on pages 50 to 120 though.
No adsense on this website and never had any.
|but has the behavior continued |
Haven't seen it again since posting the OT. But still looking + laid out some traps.
IMO, must be an automated browser tool OR new/experimental Chrome based bot.
[edited by: Web_speed at 5:44 am (utc) on Oct. 8, 2008]
| 5:44 am on Oct 8, 2008 (gmt 0)|
| 5:51 am on Oct 8, 2008 (gmt 0)|
LOL, Naa....this thing was indexing pages like a bot would. It was just a NEW different type of bot. I strongly suspect a chrome based thingy.
| 6:08 am on Oct 8, 2008 (gmt 0)|
Okay... tin foil hat off.
Now that we know Google has Chrome and has released is AS A BROWSER and we start to see "chrome" and "webkit" in UAs acting like bots, but MIGHT BE browsers, are we going to block it for bot-like activities?
| 6:19 am on Oct 8, 2008 (gmt 0)|
|are we going to block it for bot-like activities? |
Did it read and respect robots.txt? Normally I'd block any user agent that acted like a badly behaved bot. In this case you might want to cross-reference the PTR and if it's from Google then block it otherwise you risk blocking humans.
[edited by: GaryK at 6:19 am (utc) on Oct. 8, 2008]
| 6:28 am on Oct 8, 2008 (gmt 0)|
In case anyone is interested, found this at:
|Testing of a new browser is a truly monolithic undertaking, but Google's vast database of websites past and present is pretty much the most comprehensive source of such information on the planet, which is allowing the developers to perform testing at a stupendous rate. Every week, "Chrome Bot" tests millions of pages, returning performance results that the development team would normally wait months to get from human beta testers. |
Note: "Chrome bot"
| 10:39 am on Oct 8, 2008 (gmt 0)|
Interesting and perfectly logical that they should do this ( no tinfoil hat here ) it makes perfect sense ..
|That would be a shocker, because Google has given assurances that their crawler will always follow "googlebot" rules. |
And the dentist always said it wouldn't hurt a bit ..
Just extrapolating a little ..lets say that after the current financial and political dust has settled ..say December or early next year ..If G gets what it can call a friendly administration in the US and given G has strong corporate presence in the EU and good working relationships with most governments worldwide ..
Wheras MS is basing itself in Norway ..not an EU state ..so will be missing some "levers" ..and already is not liked by the politicians in the EU ..
I wonder if G might not be thinking of lobbying ( both sides of the atlantic ) for MS to be forced to ship "chrome" on it's desktop installs ( like we used to have netscape way back in the day on fresh 95 and 98 )..the argument would be that it was open source and thus in the interests of greater choice ..I'm sure it would fly in the EU corridors of power ..and in a relooked USA ..
Anticipation of such a strategy would certainly explain why they would be crawling ( instealth mode ) using a bot version of their browser ..so as to have all their ducks in a row when presenting their case ..even if they get others to present it for them ..
| 12:00 pm on Oct 8, 2008 (gmt 0)|
What part of BROWSER reads robot.txt? Perhaps some do, but who writes robot.txt for browsers? Sounds like an opportunity for perfect stealth to me... ie, told us up front "it's a browser".
"Chrome" in the UA... block it or not? Browser or bot? That's the question (sooner or later). Where'd I put that tin foil hat? Had it just a minute ago!
But seriously, I'm not amused there's a bot AND browser named chrome. Get ripped enough by true bots without having to deal with one masquerading as a browser. Already ticked about some browsers with prefetch activated and skewing hit reports as it is.
| 6:06 pm on Oct 8, 2008 (gmt 0)|
There's a critical piece of information we need here. What is the user-agent for this "Chrome Bot" activity?
The Chrome browser itself uses:
|Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 Safari/525.13 |
Does the bot use "ChromeBot" somewhere in the string, or just the ordinary browser user-agent?
| 6:25 pm on Oct 8, 2008 (gmt 0)|
tangor, there are lots of bots that use legitimate UAs from browsers to identify themselves. I'll grant you most of them do not read robots.txt. Using a legit browser UA is why we cannot depend solely upon the UA when it comes time to decide which ones to block. That's why I stated earlier while you can use the UA as a starting point you really need to analyze what any UA is doing before deciding to block it.
Ted, that's such an obvious question. I feel really stupid for not asking it myself. Oh well. ;) It would be nice to know what the UA is in Web_speed's case. I'd like to know if there is a ChromeBot. So far I've seen 25 unique UAs from Chrome and none from ChromeBot.
| 12:50 am on Oct 9, 2008 (gmt 0)|
Unfortunately this particular site is hosted over my ISP's provided user free space (very very basic setup) and while i have access to webalizer and can check the numbers etc. I am unable(no access)to check/see the actual line by line raw server log.
I emailed the ISP and requested them to grant me access or email me the raw logs for the required time frame. It is a special favour request and i hope they will be kind enough to provide it. Will updated as soon as i have a reply.
Would very much like to put my hands on this information too.
[edited by: Web_speed at 1:43 am (utc) on Oct. 9, 2008]
| 2:05 am on Oct 9, 2008 (gmt 0)|
From Matt Cutt's blog:
|I also love that Google has a “ChromeBot” which takes each new browser build and throws (put your pinky finger to your lips) one million webpages at the build as a torture test. That testing virtually guarantees that everyday web pages shouldn’t crash your browser. |
I've looked everywhere for a ChromeBot UA and can't find one.
| 2:21 am on Oct 9, 2008 (gmt 0)|
Thanks for the word from a known Google source, Gary.
I've removed a few other citations people posted from second-hand, third-hand, whatever-hand sources since the informatiion is not verifiable and contained no citations from Google. Let's stay away from that kind of thing for the remainder of the thread.
| 3:37 am on Oct 9, 2008 (gmt 0)|
No problem, Ted. I just wish I'd spelled Matt's last name correctly. Oh well. I'm going to post a comment/question in his blog and hope for a reply from him.
| 8:42 am on Oct 9, 2008 (gmt 0)|
@GaryK Thanks for heads up which is what I do on a regular basis: wait and see if action is required. Just speculated as to whether Google found a new way to index without saying "they" were doing it. We do know their chrome browser reports home, so I guess that's not real news....
| 4:23 pm on Oct 9, 2008 (gmt 0)|
Before I start speculating will someone please look at the link I posted and tell me if you see any comments/questions from GaryK? Thanks. :)
ADDED: Oops. Sorry for my lack of manners. You're welcome tangor.
[edited by: GaryK at 4:26 pm (utc) on Oct. 9, 2008]
| 7:48 am on Oct 10, 2008 (gmt 0)|
|will someone please look at the link I posted and tell me if you see any comments/questions from GaryK? |
I'm getting a 403, "The website declined to show this webpage".
| 8:05 am on Oct 10, 2008 (gmt 0)|
Yes, might be some google employee undercover :)
| 9:07 am on Oct 10, 2008 (gmt 0)|
Hi Web_speed. I've been getting an permissions-related error message all day on Thursday and I'm still getting it now. The rest of his website is having similar problems except for the home page. I appreciate you trying. Hopefully when you try again later it'll be working. The reason I asked someone to look at the page was because it seemed as though the question I asked got deleted even though I initially saw it because I don't have to go through his pre-moderation queue anymore. But it may just be somehow related to whatever issues he's having with the website as a whole.
| 8:18 pm on Oct 10, 2008 (gmt 0)|
OK, his blog is back up. I looked at it earlier this afternoon from my brother-in-law's office computer and my question about a ChromeBot user agent has been deleted. I'd love to know why.
| 12:00 am on Oct 11, 2008 (gmt 0)|
Can't see your question either ;) not too surprised.
Still waiting for my ISP to reply. Going to call them Monday morning. I am itching to check that darn server log.
| 12:06 am on Oct 11, 2008 (gmt 0)|
I hope they let you have access to it and that you find something interesting. Based on what happened to my question I doubt you're going to see a ua with ChromeBot in it. It'll be nice to see what ua from Google was hitting your site so hard that day.
| 6:29 pm on Oct 20, 2008 (gmt 0)|
<moved from another location>
I checked the "who is online" feature in the backend of my webshop today and saw the google crawler feeding random keywords to my search.
First I thought it was only following links of searches customers had perhaps posted elsewhere but this was not the case.
The crawlder did not only fill in nouns but also adjectives and verbs.
I even noticed a word that had a misspelling in it and when I checked I noticed it was indeed misspelled on my website and when I searched google with the misspelling my website was the only one to be found.
So google must have crawled my website, created an index of words and is now feeding the words to my search, creating new pages. I could however not find any of the newly created pages in Googles Index.
So I am wondering what they are doing. Has anyone else noticed this? Or is this yesterdays news? However I have never seen this before in my logfiles or on my "who is online" page.
[edited by: Robert_Charlton at 6:57 pm (utc) on Oct. 20, 2008]
[edit reason] moved from another location [/edit]
| 7:03 pm on Oct 20, 2008 (gmt 0)|
This sounds like something that we discussed last spring - see [webmasterworld.com...]
Although Google announced it last spring, they've been working this approach at a low volume for quite a while.
| 7:06 pm on Oct 20, 2008 (gmt 0)|
jecasc - What identified this as a Google crawler?
I can't help but notice the similarity between your observation and what Web_speed reports in the post that leads off this thread...
|I checked the "who is online" feature in the backend of my webshop today and saw the google crawler feeding random keywords to my search. |
| This 31 message thread spans 2 pages: 31 (  2 ) > > |