Well, beside the regular googlebots, there's
MediaBot - used to analyze AdSense pages
user agent "Mediapartners-Google"
ImageBot - crawling for the Image Search
user agent "GoogleBot-Image"
AdsBot - checking AdWords landing pages for quality
user agent "AdsBot-Google"
Isn't there one for rss also?
There is Feedfetcher-Google
Although this is the google.com/ig rss reader fetcher not an actual RSS bot looking for feeds, just the thing that requests feeds to display on google/ig
What about froogle and google base?
Here's another one...
Generic Mobile Phone (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
|What are the different types of google bot |
I cannot seem to remember!
If you spot one let me know, they seem to be an endangered species nowadays :)
"GoogleBot/2.1" has been crawling one page of my site daily. The page being crawled is the target of an adwords campaign. So Adwords is definitely looking at target page quality. Note the upper case "Bot". This is ("GoogleBot/2.1") the entire referrer string as well. It has used multiple IP addresses.
This bot ("GoogleBot/2.1") also detected a change to one of my robots.txt files (more liberal for Google) and appeared to almost immediately trigger a deep complete crawl of the site from the conventional bot. I believe all the bots cooperate in collecting the robots.txt file.
The conventional bot has a lower case "bot"
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Google has made it a little difficult to sort out all of their bots with one search string "/2.1" works but does find some extra unrelated odds and ends in logs. (Except of course Googlebot-Image/1.0)
Referrer strings extracted directly from my logs
Google-: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Of course as mentioned, there's Froogle,Mobile,and Feed for RSS feeds.
Also Google now appears to be exclusively using HTTP/1.1, until recently there has been a mix of HTTP/1.1 and HTTP/1.0. One thing important to note is Google is now always requesting GZIP compressed content if your server provides it. Your website might get an "attaboy" if you served GZIP compressed content to the "bots". This could cut your website's bandwidth usage and boost your site's performance.
(Webmasterworld and Google serve GZIP compressed content)
Hi all: I hope this doesn't hi-jack the thread.
I look at the DC pages all the time for my keywords <using an online tool>.
DCs are subdivided into 'New', and various wild-card DNS numbers like .104, .107 and so on.
What I would like to know is the geographical locations for these DNS numbers.
I presume Google puts DCs all over the place to reduce long distance bandwidth
and to balance the insane load that all those surfers place on their servers.
Does anybody have a list of where 22.214.171.124 is for example?
I mean the physical location (City, State ..) of the servers.
Are any overseas? Lots of them? Where the heck are they?
Most of them give me good SERPs positions, but a few have my site in the dumpster.
If those few are in Upper Volta or Ananaguay I don't care so very much.
Whois lookups always point back to Mountain View, California (or very nearby)
where Google HQ are, and that tells me nothing. Any help appreciated! -Larry
[edited by: tedster at 5:44 am (utc) on Dec. 20, 2006]
One means that may provide some information on physical locations can be found at:
I believe this could be considered a link to an authority site.
You can use the "ITR client" and then "trace" the route to the IP addresses. This trace typically shows physical locations along a communication path.
Your firewall or modem firewall may block this capability to some extent.
Thanks Bumpski: I couldn't make sense of the recommended page, so I pinged the DNS #.
That got there in several steps and back, but no info as to actual location.
All results indicated zero distance and 'USA' however.
I was really hoping that all this was public knowledge, and that somebody had a simple list up,
123.123.123 Dallas, TX
234.234.234 Boston, MA
121.121.111 Paris, France .. and so on.
Just for reference:
I believe this could be considered a link to an authority site.
On this page to the right you will see a "click here" link to the "ITR Client" download. This windows executable is very useful and free.
I double checked, I had to set my DSL modem firewall to OFF from a setting of LOW to fully enable the Route Tracing function of the ITR client.
I presume this is deemed on-topic... along with their user-agents, can any advise what to use in robots.txt for them? I've only ever used Mediapartners-Google* and Googlebot to date. I disallow other robots. I was just wondering if I happened to be excluding some of the other Google robots (eg Image), or does 'Googlebot' cover them all?
This link may help with Googlebot Image
Sign up for sitemaps above. There is a robots.txt analysis tool that Google provides. You don't have to have a sitemap.xml file to sign up for sitemaps, but you may have to verify your site ownership. Google actually asks you to place a uniquely named file in your Website, so they can verify you are who you say you are!
If you blocked all bots but Googlebot, you probably are blocking Google Images, and Google Mobile. Of course there's also Adsense and Adwords bots.
Allowing only specific bots is now risky because search engines keep inventing new ones.
Excellent, thankyou! You know, I looked (not very hard, I admit) and never found the info before.
I do use the Sitemaps and did use the robots.txt analysis tool in the 'early days'. But since the Sitemaps page had misspelt the Mediapartners robot (*), I wasn't 100% confident in its results/info.
Plus it gave incorrect analysis results for the normal Googlebot, although that was fixed on a return visit a short while after ... yeah - I just didn't trust that page ;-)
Maybe Google have sorted it out since the early days of that analysis tool/feature, I shall give it another go.
(*) at least, if it was correct it was spelt differently to the Adsense guidelines and help pages.
Yeah. It's right now. It was probably corrected very early on. If I recall rightly I think it was initially implemented as Google-Mediapartners instead of Mediapartners-Google (as it is now).
A "sort of" google bot is gsa-crawler, used by the Google Search Appliance: [google.com...]
BTW, has someone seen the Supplemental Bot in action?
A google proxy (browser, not bot) which will show up in web logs is "Google Wireless Transcoder"; used by Google Mobile [http://www.google.com/xhtml], [http://www.google.com/gwt/n]
so how do you detect these extra bots? none of them showed up on all my stats program (or is it a sign that I should get a better stat apps?)
and regarding to the adsense and image bots. does regular optimization technique applies to them as well? or there's a different sets of tricks to handle these bots?
I have been doing a daily check to see how much Mozilla bot has been spidering my site. Although 70% de-indexed or supplemental, Mozilla has been spidering over 200 'pages' a day for about 3 days.
Now for the on topic bit: I now see that it has been requesting images, not pages. Yes, this is Mozilla bot, not the image bot. This little guy: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Is Mozilla bot taking over image bots job?