homepage Welcome to WebmasterWorld Guest from 54.196.201.253
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 31 message thread spans 2 pages: 31 ( [1] 2 > >     
Is Google now using Chrome to crawl the web ?
Web_speed




msg:3760238
 5:11 am on Oct 7, 2008 (gmt 0)

Haven't posted in a long time. Thought i would drop in to report a very interesting Google activity I've been observing today.

I have an "who's online" script running and reporting visitors in real time as they browse the site. The script uses javascript to report referrer and currently viewed page.

One of the visitors today was from 66.249.72.180 [google.com] sucking approx 200 pages in a 15-20 minutes frame and invoking the on page tracking javascript....sending referring page and current URL information (it had to invoke javascript just like a browser would to do that). In other words acting very much like it is a full on browser.

Maybe it is already old news to some, maybe i missed previous topics disscussing this. Anyway, it is the first time i see this thing in real time. Very interesting crawl development.

Anyone else noticed this ?

P.S.
Page to page browsing was occuring at a very fast rate, could not be human.

 

tedster




msg:3760890
 11:26 pm on Oct 7, 2008 (gmt 0)

Sounds like someone at Google might have run an automated tool through their browser. I've never seen anything like this in the logs I look at - if it continues, I'd be surprised.

Tastatura




msg:3760916
 12:30 am on Oct 8, 2008 (gmt 0)

or it might be one of Google's human evaluators [webmasterworld.com] looking at your site

Web_speed




msg:3760979
 2:21 am on Oct 8, 2008 (gmt 0)

or it might be one of Google's human evaluators looking at your site

I've been watching this thing in real time as it browse the site (was watching using an "who's online" type of script). The speed in which it browse pages was way too fast for humans (read: evaluators). At times, loading 2-4 pages p/second. Also, IP Reverse DNS shows "crawl-66-249-72-180.googlebot.com".

It behaved like a browser and fully invoked javascript (including javascript encrypted tracking code...as far as i know only browsers can do that). It was sending referrer information, occasionally starting with a google query page and advancing throughout the site.
Gobbled approx 200 pages in a 20 minutes frame....that's one hell of a dedicated speedy evaluator if you ask me. Nope...this thing was a bot. A very clever bot.

[edited by: Web_speed at 3:01 am (utc) on Oct. 8, 2008]

tedster




msg:3760988
 3:07 am on Oct 8, 2008 (gmt 0)

I'm sure you're right that it was automated - but has the behavior continued, and do others see it on other sites? That would be a shocker, because Google has given assurances that their crawler will always follow "googlebot" rules.

I'd say this was a special circumstance - someone at Google running an automated tool to check out your website.

Web_speed




msg:3761029
 4:52 am on Oct 8, 2008 (gmt 0)

Very possible. But here's an interesting question. Why index approx 200 pages?. I understand an auto check for 20,30,50,90 pages....why spend the time and resources to re-index approx 200 pages.

Side note:
This web site lost almost all of it's google traffic starting around mid August. Hasn't recovered fully since. Some pages still appearing on google serp on pages 50 to 120 though.

Side note2:
No adsense on this website and never had any.

Re:
but has the behavior continued

Haven't seen it again since posting the OT. But still looking + laid out some traps.

I am using statcounter.com for tracking and interestingly enough, this bot did not trigger their tracking code (javascript)...in other words i can not see any of it's activity at "statcounter.com". Stealth, like it was designed NOT to execute or ignore their tracking code (and probably many other popular tracking code) yet i was able to fully see it live online via javascript code plus i can fully see it's activity over my raw server logs. Very weird...i know.

IMO, must be an automated browser tool OR new/experimental Chrome based bot.

[edited by: Web_speed at 5:44 am (utc) on Oct. 8, 2008]

tangor




msg:3761051
 5:44 am on Oct 8, 2008 (gmt 0)

Hmmm...(donning tin foil hat) Could it be that Google wants to see what's on Javascript Heavy pages their other bot does not see? Wanting MORE information? Hmmm?

(Joking!)

Web_speed




msg:3761054
 5:51 am on Oct 8, 2008 (gmt 0)

Hmmm...(donning tin foil hat) Could it be that Google wants to see what's on Javascript Heavy pages their other bot does not see? Wanting MORE information? Hmmm?

LOL, Naa....this thing was indexing pages like a bot would. It was just a NEW different type of bot. I strongly suspect a chrome based thingy.

tangor




msg:3761059
 6:08 am on Oct 8, 2008 (gmt 0)

Okay... tin foil hat off.

Now that we know Google has Chrome and has released is AS A BROWSER and we start to see "chrome" and "webkit" in UAs acting like bots, but MIGHT BE browsers, are we going to block it for bot-like activities?

Haven't seen this at my site yet so can't do more than offer a speculation in that regard. I have little to no javascript on my site from the get go so don't expect to see much... but any browser grabbing 2-5 pages/second does get my attention.

GaryK




msg:3761068
 6:19 am on Oct 8, 2008 (gmt 0)

are we going to block it for bot-like activities?

Did it read and respect robots.txt? Normally I'd block any user agent that acted like a badly behaved bot. In this case you might want to cross-reference the PTR and if it's from Google then block it otherwise you risk blocking humans.

[edited by: GaryK at 6:19 am (utc) on Oct. 8, 2008]

Web_speed




msg:3761071
 6:28 am on Oct 8, 2008 (gmt 0)

In case anyone is interested, found this at:
[gizmag.com...]

Testing of a new browser is a truly monolithic undertaking, but Google's vast database of websites past and present is pretty much the most comprehensive source of such information on the planet, which is allowing the developers to perform testing at a stupendous rate. Every week, "Chrome Bot" tests millions of pages, returning performance results that the development team would normally wait months to get from human beta testers.

Note: "Chrome bot"

Leosghost




msg:3761153
 10:39 am on Oct 8, 2008 (gmt 0)

Interesting and perfectly logical that they should do this ( no tinfoil hat here ) it makes perfect sense ..

That would be a shocker, because Google has given assurances that their crawler will always follow "googlebot" rules.

And the dentist always said it wouldn't hurt a bit ..

[bluesky]
Just extrapolating a little ..lets say that after the current financial and political dust has settled ..say December or early next year ..If G gets what it can call a friendly administration in the US and given G has strong corporate presence in the EU and good working relationships with most governments worldwide ..
Wheras MS is basing itself in Norway ..not an EU state ..so will be missing some "levers" ..and already is not liked by the politicians in the EU ..

I wonder if G might not be thinking of lobbying ( both sides of the atlantic ) for MS to be forced to ship "chrome" on it's desktop installs ( like we used to have netscape way back in the day on fresh 95 and 98 )..the argument would be that it was open source and thus in the interests of greater choice ..I'm sure it would fly in the EU corridors of power ..and in a relooked USA ..

Anticipation of such a strategy would certainly explain why they would be crawling ( instealth mode ) using a bot version of their browser ..so as to have all their ducks in a row when presenting their case ..even if they get others to present it for them ..
[/bluesky]

tangor




msg:3761180
 12:00 pm on Oct 8, 2008 (gmt 0)

What part of BROWSER reads robot.txt? Perhaps some do, but who writes robot.txt for browsers? Sounds like an opportunity for perfect stealth to me... ie, told us up front "it's a browser".

"Chrome" in the UA... block it or not? Browser or bot? That's the question (sooner or later). Where'd I put that tin foil hat? Had it just a minute ago!

But seriously, I'm not amused there's a bot AND browser named chrome. Get ripped enough by true bots without having to deal with one masquerading as a browser. Already ticked about some browsers with prefetch activated and skewing hit reports as it is.

tedster




msg:3761455
 6:06 pm on Oct 8, 2008 (gmt 0)

There's a critical piece of information we need here. What is the user-agent for this "Chrome Bot" activity?

The Chrome browser itself uses:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.27 Safari/525.13

Does the bot use "ChromeBot" somewhere in the string, or just the ordinary browser user-agent?

GaryK




msg:3761463
 6:25 pm on Oct 8, 2008 (gmt 0)

tangor, there are lots of bots that use legitimate UAs from browsers to identify themselves. I'll grant you most of them do not read robots.txt. Using a legit browser UA is why we cannot depend solely upon the UA when it comes time to decide which ones to block. That's why I stated earlier while you can use the UA as a starting point you really need to analyze what any UA is doing before deciding to block it.

Ted, that's such an obvious question. I feel really stupid for not asking it myself. Oh well. ;) It would be nice to know what the UA is in Web_speed's case. I'd like to know if there is a ChromeBot. So far I've seen 25 unique UAs from Chrome and none from ChromeBot.

Web_speed




msg:3761703
 12:50 am on Oct 9, 2008 (gmt 0)

Unfortunately this particular site is hosted over my ISP's provided user free space (very very basic setup) and while i have access to webalizer and can check the numbers etc. I am unable(no access)to check/see the actual line by line raw server log.

I emailed the ISP and requested them to grant me access or email me the raw logs for the required time frame. It is a special favour request and i hope they will be kind enough to provide it. Will updated as soon as i have a reply.

Would very much like to put my hands on this information too.

[edited by: Web_speed at 1:43 am (utc) on Oct. 9, 2008]

GaryK




msg:3761746
 2:05 am on Oct 9, 2008 (gmt 0)

From Matt Cutt's blog:
[mattcutts.com...]

I also love that Google has a “ChromeBot” which takes each new browser build and throws (put your pinky finger to your lips) one million webpages at the build as a torture test. That testing virtually guarantees that everyday web pages shouldn’t crash your browser.

I've looked everywhere for a ChromeBot UA and can't find one.

tedster




msg:3761761
 2:21 am on Oct 9, 2008 (gmt 0)

Thanks for the word from a known Google source, Gary.

I've removed a few other citations people posted from second-hand, third-hand, whatever-hand sources since the informatiion is not verifiable and contained no citations from Google. Let's stay away from that kind of thing for the remainder of the thread.

GaryK




msg:3761790
 3:37 am on Oct 9, 2008 (gmt 0)

No problem, Ted. I just wish I'd spelled Matt's last name correctly. Oh well. I'm going to post a comment/question in his blog and hope for a reply from him.

tangor




msg:3761886
 8:42 am on Oct 9, 2008 (gmt 0)

@GaryK Thanks for heads up which is what I do on a regular basis: wait and see if action is required. Just speculated as to whether Google found a new way to index without saying "they" were doing it. We do know their chrome browser reports home, so I guess that's not real news....

GaryK




msg:3762199
 4:23 pm on Oct 9, 2008 (gmt 0)

Before I start speculating will someone please look at the link I posted and tell me if you see any comments/questions from GaryK? Thanks. :)

ADDED: Oops. Sorry for my lack of manners. You're welcome tangor.

[edited by: GaryK at 4:26 pm (utc) on Oct. 9, 2008]

Web_speed




msg:3762705
 7:48 am on Oct 10, 2008 (gmt 0)

will someone please look at the link I posted and tell me if you see any comments/questions from GaryK?

I'm getting a 403, "The website declined to show this webpage".

moftary




msg:3762714
 8:05 am on Oct 10, 2008 (gmt 0)

Yes, might be some google employee undercover :)

GaryK




msg:3762728
 9:07 am on Oct 10, 2008 (gmt 0)

Hi Web_speed. I've been getting an permissions-related error message all day on Thursday and I'm still getting it now. The rest of his website is having similar problems except for the home page. I appreciate you trying. Hopefully when you try again later it'll be working. The reason I asked someone to look at the page was because it seemed as though the question I asked got deleted even though I initially saw it because I don't have to go through his pre-moderation queue anymore. But it may just be somehow related to whatever issues he's having with the website as a whole.

GaryK




msg:3763232
 8:18 pm on Oct 10, 2008 (gmt 0)

OK, his blog is back up. I looked at it earlier this afternoon from my brother-in-law's office computer and my question about a ChromeBot user agent has been deleted. I'd love to know why.

Web_speed




msg:3763328
 12:00 am on Oct 11, 2008 (gmt 0)

Can't see your question either ;) not too surprised.

Still waiting for my ISP to reply. Going to call them Monday morning. I am itching to check that darn server log.

GaryK




msg:3763336
 12:06 am on Oct 11, 2008 (gmt 0)

I hope they let you have access to it and that you find something interesting. Based on what happened to my question I doubt you're going to see a ua with ChromeBot in it. It'll be nice to see what ua from Google was hitting your site so hard that day.

jecasc




msg:3769822
 6:29 pm on Oct 20, 2008 (gmt 0)

<moved from another location>

I checked the "who is online" feature in the backend of my webshop today and saw the google crawler feeding random keywords to my search.

First I thought it was only following links of searches customers had perhaps posted elsewhere but this was not the case.

The crawlder did not only fill in nouns but also adjectives and verbs.

Like:

advanced_search_result?keywords=widget
advanced_search_result?keywords=blue
advanced_search_result?keywords=effective
advanced_search_result?keywords=produces

I even noticed a word that had a misspelling in it and when I checked I noticed it was indeed misspelled on my website and when I searched google with the misspelling my website was the only one to be found.

So google must have crawled my website, created an index of words and is now feeding the words to my search, creating new pages. I could however not find any of the newly created pages in Googles Index.

So I am wondering what they are doing. Has anyone else noticed this? Or is this yesterdays news? However I have never seen this before in my logfiles or on my "who is online" page.

[edited by: Robert_Charlton at 6:57 pm (utc) on Oct. 20, 2008]
[edit reason] moved from another location [/edit]

tedster




msg:3769832
 7:03 pm on Oct 20, 2008 (gmt 0)

This sounds like something that we discussed last spring - see [webmasterworld.com...]

Although Google announced it last spring, they've been working this approach at a low volume for quite a while.

Robert Charlton




msg:3769833
 7:06 pm on Oct 20, 2008 (gmt 0)

jecasc - What identified this as a Google crawler?

I can't help but notice the similarity between your observation and what Web_speed reports in the post that leads off this thread...

jecasc
I checked the "who is online" feature in the backend of my webshop today and saw the google crawler feeding random keywords to my search.

Web_speed
I have an "who's online" script running and reporting visitors in real time as they browse the site. The script uses javascript to report referrer and currently viewed page.

This 31 message thread spans 2 pages: 31 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved