You're quick, Critter. I was looking for new information about it, too.
The most recent visit by Googlebot/2.1 had this UA:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
But about two minutes before that, and all other visits before that, the UA was:
Oh well. Something to investigate tomorrow. Nighty night.
I don't mind the new format. Although it does look a lot like Slurp's:
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
I believe that is Google checking if you are cloaking, ie, showing different things to the crawler depending on the user agent identification.
if people where cloaking they would do it by just scanning for the word google..
In theory, yes, but in practice not everybody knew that Google would be checking pages with a different agent that does not start its name with Googlebot.
Why detecting clocking, it is only meaningful if your have 100% different contents. Sites like news.google.com change every minute without clocking because they have FRESH data.
Sites that employ cloaking are banned from Google index.
Google can determine if a site is cloaking by looking twice at the page with Googlebot user agent and in between with the Mozilla pretender .
If in the two times with Googlebot it looks like the same but with the Mozilla it looks different, you're busted.
|I believe that is Google checking if you are cloaking |
I doubt it. From what I've seen today, all of the UAs for the regular Googlebot (not Mediapartners) have the new string.
BTW, does anyone know why Google puts a + before the URL for their bot information page?
|Sites that employ cloaking are banned from Google index. |
Nonsense. There are legit uses for cloaking and some huge sites that use it. [msdn.microsoft.com...] detects the browser and delivers different pages accordingly. Try browsing it with Opera identified as Opera (instead of IE) and you'll see some huge differences. Probably other MS sites are the same, but I almost never visit them.
"Google can determine if a site is cloaking by looking twice at the page with Googlebot user agent and in between with the Mozilla pretender. If in the two times with Googlebot it looks like the same but with the Mozilla it looks different, you're busted."
Whoa! Whoa! That can't possibly be true. All my pages have server side includes that throw a piece of text in with a random quote at two points on the page. This means that the page changes everytime it is loaded and I've never been removed from the Google Index.
What about all the news sites that change constantly throughout the day? Google would have to remove itself ...
I am sure that "minimal changes" are allowed between visits.
I expect that it looks to see if on one visit it gets three paragraphs of normal looking text, and on the other visit it gets 20 000 repetitive keywords in no particular order.
I don't think Google bans whole sites for cloaking but rather each of the pages that employ cloaking. Banning may not mean necessarily removing the page from the index but rather not counting for the page rank of the pages they link to. Therefore, internal pages may get PR0 if they do not have any other links pointing to them.
|BTW, does anyone know why Google puts a + before the URL for their bot information page? |
Just a bump. I've wondered about that too - and I know that others here have as well.
Yesterday the "new" Googlebot did a full deep crawl of one of my sites. I first did not recognize it. Today, the "old" one came, too, picking all the old 301 redirects (with no external links) I had to put there three months ago due to a large-scale page rename. Hopefully the bots will not interpret this as a cloaking attempt.
Yeah, I see that too, bull. Today, Googlebot with the old UA got a 404 from my site because it requested a file I deleted a few months ago, as well as the links to it.
Deep crawling have started.
One of my sites has been getting deep crawled everyday since march 1. about 20 pages. <simpons mode: woohoo>
anyone else also experiencing this?
Is the request really coming from google's IP-Range?
we get daily fully crawled since a long time, even our PR4 sites.
Yes, I've verified that the IPs are from Google's network allocation.
Furthermore, it's kind of obvious that the new bot is not a bogus crawler because it comes from different IPs within the same (Google) netblock, something that would be almost impossible for a scammer to pull off unless they had their own netblock (and they ain't givin' out Class Cs like they used to) :)
Just found a very strange entry in my Logs.
I am certain this is NOT GOOGLE
Someone else got this one also?
Http Code : 200
Date: Mar 05 13:05:37
Http Version: HTTP/1.1"
Size in Bytes: 16491
Referer: - Agent: Googlebot/Test
Yes, "Googlebot/Test" got two files.
Nilloc, the IP you mention IS Google, as well as the one I posted:
 marcs was a little faster with the whois... :) 
Any statment, Googleguy? Would be appreciated.
[edited by: bull at 6:35 am (utc) on Mar. 5, 2004]
>I am certain this is NOT GOOGLE
Maybe it is :
network:Street;I:2400 E. Bayshore Pkwy
network:City;I:Mountain View , CA 94043
Changing my htaccess file Back again.
10-minutes ago I had blocked agent: Googlebot/Test
Letting it in again now
"Any statement, Googleguy? Would be appreciated."
Absolutely, bull. Normally I post by the seat of my pants, but I wrote this up earlier today and hadn't posted it yet:
Hey everybody, I wanted to give you a heads-up about a potential change in our user-agent name. Currently we use the user-agent
|Googlebot/2.1 (+http://www.googlebot.com/bot.html) |
but we're considering changing our user-agent to something like
|Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) |
Or is it purely a case on how the Web Server treats Googlebot?
Bit above me this type of talk :(
"allowing site owners to recognize that Google is visiting"
Not if they have Urchin stats.
But what is "Googlebot/Test"?
I have that in yesterday's logs from 18.104.22.168. It was getting a 406 from the server.
I'm also getting 406 from 'Googlebot/Test' on pages that are fine (and already indexed for weeks.) Worrying. We haven't done a thing to them.
Only problem I see GG is that my stats vendor will have to update there software to recognize the new googlebot identification, but hey thats not a big deal and I can imagine that the negatives might outweight the good.
So the only thing I'd say is if it is all good, make a formal press release so that all the stats vendors will update our software so that we can know when googlebot is visiting and can see what you have looked at without having to resort to open our log files in something like excel and doing a "find" to pull up the googlebot entries.
| This 54 message thread spans 2 pages: 54 (  2 ) > > |