Forum Moderators: Robert Charlton & goodroi
The M_Bot [webmasterworld.com] (identified by Mozilla/5.0 (compatible; Googlebot/2.1;
in the referer string) had been hitting my site at upto 3 times/sec (avg: 836/day), triggering unruly-bot-prevention routines [webmasterworld.com] in the PHP-scripts. I had used the G on-site form [google.com] a few days before, and asked them (nicely) to stop it. So far so good.
By July 1 it was clear that--rather than turning the knob down a notch or two--they had switched it off altogether, at least as far as the M_Bot was concerned - there were no hits from this bot at all. The G_Bot (identified just by Googlebot/2.1 (+http://www.google.com/bot.html)
in the referer string) had also slowed:
So, my site has now dropped from 28,000 G-hits in June to a likely 500 in July. So, beware...
Take a look at the first post (msg#521) here for an example...
[webmasterworld.com...]
Personally, I think its great that Google have a sense of humour.
As the old adage goes...
"Be careful what you wish for... it may come true!"
[edited by: lawman at 8:45 pm (utc) on July 6, 2005]
Personally, I think its great that Google have a sense of humour.
Humour:- a sabre-toothed tiger enters the cave, drags out your neighbour, and eats him.
Tragedy:- You stub your toe on a rock.
It's always funny when it happens to others.
<snip> I think that I may be due my shot of humour. I am willing to wait.
[edited by: lawman at 8:46 pm (utc) on July 6, 2005]
Whilst at first sight this does seem to make a nonsense of G's "Googlebot is overloading my servers" [google.com] page, on further consideration it suggests that there really is only an on/off switch, rather than a graduated knob.
The more I think about this, the more important it seems to know just what is the case.
Google has commented in the past that they would crawl harder if it weren't for the smaller webmasters complaining.
Google could solve that problem in 5 minutes by looking for a crawl-delay: 0 for googlebot in robots.txt.
That would be treated as permission for crawling at high speed.
If they did that, and let it be known that they were doing that, webmasters could decide to add that or not.
It's not a matter of webmasters' complaining. It's to do with secondary web activities (like index builders) honoring the wishes of the primary movers (those providing content).
...and here are comparative stats for the last 3 months:
July: ............................Pages
Inktomi Slurp ...................24,065
Google AdSense ..................15,972
MSNBot ..........................11,990
Googlebot HTTP/1.0 .................866
Googlebot HTTP/1.1 Mozilla/5.0 ......61
.
June:
Googlebot HTTP/1.1 Mozilla/5.0 ..25,089
MSNBot ..........................19,211
Google AdSense ..................15,424
Inktomi Slurp ...................14,236
Googlebot HTTP/1.0 ...............1,801
.
May:
MSNBot ..........................20,475
Google AdSense ..................14,193
Inktomi Slurp ...................11,654
Googlebot (HTTP/1.0 + HTTP/1.1) ..4,409
As the man said, be careful what you ask for, you might just get it.
I see GB getting the same file 10-15 times a day sometimes (since sitemaps came in existence even though I have the frequency on "daily") but it's much better for me. Now my pages get indexed within 2-3 days and I'm getting targeted traffic right away.
thanks for the warning though
Every request is putting double slashes, though I cannot find any links with double slashes. many of these are deep internal pages that are unlikely to have any external inbound links. I've checked for internal links with double slashes and there are none.
example: [domain.com...]
"GET //pagename.html HTTP/1.1" 200 21531 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
I am getting hundreds of these a day, if I go to the url, it's a 404, but I have a spider tracking script on these pages, and it is showing the pages as being spidered. The only way the spider tracking program could show it is if the page loads, but the page will not load with double slashes.
anyone seeing this in their logs or have a clue what may be happening?
My site was (?) banned from Google recently in the July 28 changes, and I'm still working to get back into the index. I am see 66.249.65.#*$! crawling this site, several thousand requests per day... and then stopping. The next day, they're at it again.
I don't know exactly what it's for, but I am used to seeing "Googlebot/2.1 (+http://www.googlebot.com/bot.html)" and not "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
What's going on.
did you mention how many pages your site contains to be spidered? Hundreds? Thousands?
walkman:
...sitemaps...
koan:
...notification by my anti-abuse script that the mozilla compatible bot from Google triggered it and was automatically blocked from reading further.
no need to hold it in reserve at all. Go right ahead and use them.
Anyway I know it's too late to say this now, but a 10,000-page site with a lot of inbound links will get spidered like crazy. I have a site around the same size, and Googlebot takes an average of 3,500-4,000 pages a day.
You should be fine, as long as you make sure every page gets crawled at least once during an indexing cycle. I don't know if they really do that "deep crawl" before the update any more, but if a page doesn't get crawled at least once during the indexing cycle then my Google Belief System says that it will probably get ranked lower or not at all.
jomaxx, are you using a sitemap? If so, what is the frequency and how many new pages added to the file? In msg #:12
walkmanspoke of a 1200 page site getting 4000+ visits in one day (which is as outrageous as your own case).
I would rather be in my current situation re: G than yours. (Boggle, again).
PS I did ask Google to return the Status Quo re: the M_Bot on 6 July, but with neither reply nor effect. At the time I was annoyed. Now, I am beginning to bless my lucky stars.
Google Visits one of the site I maintain but it seems it doesn't index any page that it visits.
Last time Google index that website is back in April. What can be the reason? Any idea?
Here is last 3 days web stats...
+++++++++++++++++++++++
RobotStats - Google Bot (http://www[.]google[.]com/)
User-Agent[View Log] Mozilla/5.0 (compatible; Googlebot/2.1; +http://www[.]google.com/bot[.]html)
Quantity6732
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www[.]google[.]com/bot.html)
6732
++++++++++++++++++++++++++++++++++++++++++++++
So, what do you say?
I also have been crawled hard by the G Mozilla bot the last week (5K-10K pages a day) - but nothing got into the index.
I have read on WW that the Mozilla bot doesn't get pages into the index but what I really need to know is:
Did anyone see the regular G bot come AFTER the Mozilla bot? or did anyone see pages get into the index after a deep crawl of the Mozilla bot?
Andem, KiShOrE - I'd be happy to hear if you have any updates.
Did anyone see the regular G bot come AFTER the Mozilla bot?
Yes.
My own experience + research suggests that the G_Bot needs to hit a page three times before the page gets into the index (see msg#5+7 [webmasterworld.com]). The same research suggests that the M_Bot does not count towards this total, but does suggest that the M_Bot 'scouts' a page first, then the G_Bot follows up.
Just to curdle the blood, I also saw the reverse (the M_Bot hit a page after the G_Bot, and take the page out of the index).
No way was I going to ask them to slow it down via support as I feared what happened to the OP and I'd fall from grace with the spiders. I stuck the Crawl-delay in robots.txt and it's been much more civilized but it seems the dang spiders are always on my site now taking a page or two as they just can't get it all fast enough anymore.