I have seen slower spidering this past month on several sites - and even slower indexing of what was spidered. However, not quite as dramatic as your report. As with almost any we notice on a limited sample of sites, there are reports to the opposite, too - see Google is gulping my new pages [webmasterworld.com].
From reports in the forum here, my sense of April was that Google was focused on some new ranking changes and a lot of their spidering and indexing was throttled during that period.
I can verify that our main site is having the exact same thing occur. The crawling stats were not updated for most of March and when they came back in April, they were almost totally flat.
Our logs verify that googlebot is now 3rd behind Y and then MSN. For the last few years it was always first... by a large margin.
Same issue here. Across all sites googlebot activity has slowed to a ... crawl.
Think it might have anything to do with the new form crawling announced a couple of weeks ago?
I see 90% less crawling and indexing all over the place, maybe they just need their horse power to crank out another penalty?
[edited by: SEOPTI at 2:55 pm (utc) on April 26, 2008]
I'm seeing the same thing with new domains. January to March was okay but starting in April it was very hard to get Googlebot to crawl not to mention having pages actually appear in the index.
Same thing here, glad to see is not just me. Almost had heart attack when I saw the graph drop off to nothing.
I sure wish we could get some word from g about this. I'm even seeing y and m/l gobbling up pages that g would've devoured almost instantly just 1-2 months ago.
Googlebot is still crawling us a bit less than it normally would. However, it seems to have recovered from crawling almost nothing to crawling about 10-20% less than normal.
The graphs on Google's webmaster central don't show the partial recovery yet. They seem to be infrequently updated.
Glad to see this topic...I have millions of pages spread across hundreds of sitemaps. I parse my logs daily to see what Googlebot is doing, and my totals for April are 1) way down from previous months and 2) much more volatile from day to day. I saw a lot of days in February and March in the 150,000-200,000 range of pages crawled per day. By comparison, here are the last couple weeks:
(note that these numbers include both general spidering and sitemap crawling)
Those three days starting 4/22 had me tearing my hair out. I did an early check on today, and I can see the total is going to be considerably down again from yesterday (probably sub-50,000).
A month ago I was hoping to spend some time seeing if I could speed my pages up even more to get Googlebot firing through my site even faster...now I'm afraid to do anything, since spikes in the crawl could be a result of something I screwed up or Googlebot just deciding to shut down for three days.
Back in March I introduced gzip to my site--I expected it to speed up Googlebot, but it seems to have done nothing...but then again, with spikes like these, how would I know?
Welcome to the forums k_m_a, and thank you very much for the stats. That gives a concrete face to what many here are feeling about googlebot in April.
I'll share a few data as well - April is in general significantly below March - these are totals, international domains were affected most (up to 90%):
4/14/2008 529k 4/14/2008 218k
4/15/2008 459k 4/15/2008 213k
4/16/2008 365k 4/16/2008 252k
4/17/2008 221k 4/17/2008 195k
4/18/2008 349k 4/18/2008 224k
4/19/2008 408k 4/19/2008 245k
4/20/2008 499k 4/20/2008 279k
4/21/2008 479k 4/21/2008 273k
4/22/2008 126k 4/22/2008 248k
4/23/2008 154k 4/23/2008 32k
4/24/2008 85k 4/24/2008 28k
4/25/2008 499k 4/25/2008 279k
4/26/2008 453k 4/26/2008 323k
4/27/2008 532k 4/27/2008 315k
4/28/2008 461k 4/28/2008 322k
4/29/2008 495k 4/29/2008 323k
Hope that helps ...
The 17th, 23rd, and 24th were the weakest days for us as well. The stats pretty much mirror exactly what kms11 is showing.
I think it is pretty clear that Googlebot itself was having problems those days as opposed to not crawling popular sites as much for algorithmic reasons. (Which takes a load off of my mind.)
Yahoo crawls deeper these days. Google seems not to have enough CPU to crawl constantly.
[edited by: SEOPTI at 2:51 pm (utc) on May 1, 2008]
Really? I thought Google was crawling deeper too. Picking up those long tail URL's and the like.
I know for certain they are on our site anyway. ;)
I've just noticed in Google Webmaster Tools that all my sites have defaulted back to "Normal" crawl rate. It's under Tools->Set Crawl Rate.
I was able to change it back to "Faster". But I get this message:
"This site is currently set at a Faster crawl rate. This rate will return to Normal on Jul 30, 2008."
So I just set an iCal reminder for that day to reset it.
Me too. BTW - Good to have you aboard WebmasterWorld thread(s). ;)
[edited by: tedster at 3:43 pm (utc) on May 1, 2008]
One other oddity: even though the actual page crawl rate bounces around, Googlebot grabs my site indexes at roughly the same rate. There is no correlation with the number of pages it actually crawls.
On the three bad days last week, I would see it pull down a site index and maybe index a page or two over many minutes, maybe not. Then it moved on and pulled down another site index. Here's a breakdown by day:
4/22 - 39 site indexes grabbed
4/23 - 47
4/24 - 48
4/25 - 53
4/26 - 36
4/27 - 48
4/28 - 53
4/29 - 37
4/30 - 49 (note that yesterday really stank...down to 25K pages crawled)
So, the bot is out there, it just isn't doing its job.
Similar observations here. Not sure about crawling, but indexing of new pages has slowed down significantly.
Usually, new pages on my site are indexed within 1-2 days. In April, it took 2-3 weeks.
I think it's a natural migration from the high speed crawling to shift to just crawl things that websites claim are updated via RSS and sitemaps since the other pages most likely haven't change.
Here's what I'm seeing:
* Anything on an RSS feed that's associated with Feedburner is being crawled and indexed within 30 minutes most of the time.
* Anything being submitted via sitemaps seems to be crawled timely as well.
* Additionally, Google has a long history of what pages on your site have changed most frequently from previous crawls so that list of pages may be crawled more often.
* Anything left over appears to be crawled at a slower rate.
That's my $0.02, can someone make change?
Their crawl priorities continue to move in the direction they have been for two years... far weaker crawling via following links, far more picking up any trivial thing added via mere creation.
This leads to more worthless blog pages and auto-generated crap picked up in an hour, while a new article or section added to a website (and heavily linked throughout the site) being ignored, even though googlebot "sees" the new link multiple times.
It's about as butt-backwards a philosophy that you can get, but that is google-the-innovator. It's like they have total amnesia about what made them a great search engine, just because they stubbornly cling to the idea that their new lazybot is an improvement over (gasp) actually crawling the web.
Come on steveb, how can it be butt-backwards?
Google raced to index the world, then they made sitemaps an industry standard so webmasters could tell them what changed, then snapped up Feedburner to know what new content was being added.
Makes total sense to me because any serious webmaster will be telling Google via sitemaps or Feeedburner that new content is available for indexing.
Others will get indexed eventually as they aren't very important with a lack of sitemaps or RSS, some might call those "static" sites.
FWIW, it wasn't that long ago everyone was bellyaching that Google was indexing too many pages too often and too fast and now that it's shifted to the other direction the bellyaching has shifted as well.
From my point of view it appears Google is damned if they do and damned if they don't!
I doubt that sarcasm helps understand the situation. The 0.001% of webmasters who would complain about indexing pages too fast or having too many pages indexed can go back to the 1980s. 99.99% of webmasters would love to have their entire websites indexed and crawled regularly. Those that don't can use sitemaps to say they want to be crawled slowly or weakly for whatever reason.
Google is simply prioritizing crawling things based on creation, not merit, which is a thoroughly foolish idea if you want to present results based on merit!
I saw the slowdown across multiple sites on multiple servers. Some have sitemaps, some have RSS feeds, some have both and some have none. All experienced a 95% reduction in crawled pages during the same time period.
I was worried that it was me. FWIW, glad to hear that others have experienced the same thing ( I guess misery loves company).
I saw the google has now indexed the small site very fast, in couple hours, and later one about two days later, we got the ranking.
I have a small article-based website (about 5k articles submitted by users). In webmaster tools I saw a decrease of crawl rate and also, I am unable to set it to Fast.
Although, my articles are indexed and appear in serps in about 30-60 minutes after I post them on site :D
|Google is simply prioritizing crawling things based on creation, not merit, which is a thoroughly foolish idea if you want to present results based on merit! |
No, Google is optimizing based on CHANGE which could either be creation or update.
Don't confuse what's indexed and ranked in Google's pages by how frequently it's crawled.
If you have a page of merit that never changes, which is already indexed and ranked, why does it need to be crawled frequently?
|The 0.001% of webmasters who would complain about indexing pages too fast or having too many pages indexed can go back to the 1980s. |
Slightly off the Google topic, but many people are complaining on WebmasterWorld that Yahoo Slurp! is indexing way too fast at the moment. Not everyone is thrilled when a single crawler burns 1GB or more of bandwidth a month or crawls too fast during peak hours and slows down visitor response times.
[edited by: incrediBILL at 5:14 pm (utc) on May 4, 2008]
I wouldn't care if Yahoo crawled a lot--if it gave me decent SERPs!
"No, Google is optimizing based on CHANGE which could either be creation or update."
Well obviously that is not true, and part of the point. Google does not crawl now based on change, specifically not based on a new page getting say 100 links when it is put online. Google crawls new pages pages promptly based on submission of that new page via sitemap or creation feeds. These creation methods offer no guidance at all on the value a domain puts on such page, let alone other domains.
A new page with three links to it and a new page with 1000 links to it are treated much too much the same by Google now. Instead of "discovery" via linking, which leads to original scores for the pages that are in the right ballpark, they discover pages via the quivalent of a press release. They add a page to the index with zero understanding of its proper score. They give weight to the domain the page is on, but all that does is create the three link versus 1000 link problem.
Google is not caring about change now. That is plainly clear. They are prioritized on new content that they are told about via notification tools. This leads to both inapproriate premature ranking and poor crawl priorities. And that doesn't even take into consideration that vast majority of these ping/creation pages are cruft that barely deserve to even be indexed in the first place. If the own domain of a page can't be bothered to link to it well, the rest of the world hardly needs to know about it.
"Don't confuse what's indexed and ranked in Google's pages by how frequently it's crawled."
You did that, not me. The ranking algorithm still is based more on linking rather than new-ness, but it's bungled up because the crawl is so weak. You can't rank properly if you are unaware of changes to existing pages because you are spending your time focusing on new pages. The creation of new pages offers no signal of quality; the opinion of older pages about new pages does that.
"If you have a page of merit that never changes, which is already indexed and ranked, why does it need to be crawled frequently? It doesn't."
This can sum up part of your mistaken notion. You can't rank new pages sensibly if you don't crawl old pages -- even if just to see that the new page is NOT linked by old pages.
How can a crawl priority be poor if it pushes pages historically known to be static further down in the queue?
|Google is not caring about change now. |
I have about 10K pages they crawl per day, out of 100Ks, that would care to disagree and it's all change.
The new content is indexed in about 30 minutes, the rest of it takes time.
[edited by: incrediBILL at 9:03 pm (utc) on May 4, 2008]
| This 39 message thread spans 2 pages: 39 (  2 ) > > |