Google's April 2008 crawl down 75% across multiple sites

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google's April 2008 crawl down 75% across multiple sites

deadsea

10:57 am on Apr 25, 2008 (gmt 0)

I monitor several sites with more than ten thousand pages. On all of those sites the GoogleBot crawl rate has dropped by at least 75%. Is this a phenomenon that others are seeing as well.

I am observing this on my own site, and sites run by several others. The sites are not related (different ownership, development teams, link strategies, hosting facilities, etc).

The graphs in Google webmaster tools (site -> tools -> set crawl rate) show that pages and kilobytes per day have dropped drastically, almost to zero in April. The graph of time spent downloading a page has not changed.

Are other people observing this phenomenon too?

[edited by: tedster at 1:50 am (utc) on April 29, 2008]
[edit reason] fix formatting [/edit]

steveb

9:13 pm on May 4, 2008 (gmt 0)

"How can a crawl priority be poor if it pushes pages historically known to be static further down in the queue?"

That's not the issue, nor even particularly true, so let's stay focused. Google is crawling all existing pages less, and new notification pages quickly. I've already explained the obvious reason this is a foolish priority, if an engine cares about quality. If you don't crawl existing pages, you can't know the correct value of new pages.

incrediBILL

9:32 pm on May 4, 2008 (gmt 0)

If you don't crawl existing pages, you can't know the correct value of new pages.

Again, they are crawling a LOT of existing pages, but reducing priorities on static pages.

How many times do your about, contact, privacy and legal pages need to be crawled?

The strategy to crawl and index new information as quickly as possible is all about being timely so people can find new things in Google vs. some other search engine or blog aggregator.

Whether you or I agree with that strategy is moot because static search engine results aren't good in a rapidly changing world as new information needs to be assimilated by the masses as quickly as possible.

Since people tend to look to the search engines more than anywhere else for answers, if Google is able to provide those answers to recent events ahead of anyone else, they win, simple as that.

[edited by: incrediBILL at 9:33 pm (utc) on May 4, 2008]

steveb

10:11 pm on May 4, 2008 (gmt 0)

"Again, they are crawling a LOT of existing pages, but reducing priorities on static pages."

Again, not only is that not what they are doing, it isn't the topic here.

The topic is a 75% reduction in crawling, which reflects the widespread reduction in crawling existing pages by Google. This has nothing to do with contact or privacy pages... unless those make up 75% of the pages on your domain!

The topic here is Google's reduction in crawling pages on a domain that are updated in a normal, regular fashion.

"Google is able to provide those answers to recent events ahead of anyone else, they win, simple as that."

Certainly not. Google doesn't need to provide answers, it wins when it provides accurate answers.

The priority of creation means a piece of crap spam blog is very close to on the same footing as as an authoritative blog, while being ahead of a quality news sites. But that is secondary. Speed is for Google news, not the organic results, which should be prioritized by quality, and now Google's upside-down crawl priorites lessen quality. The results are still basically good, but since changed pages are picked up slower and slower, the results degrade.

incrediBILL

11:25 pm on May 4, 2008 (gmt 0)

The topic is a 75% reduction in crawling

I'm not seeing 75% reduction and explained what I'm seeing, doesn't make my observations any less valid as it's a reduction nonetheless. I can only draw my conclusions from the pages I see crawled frequently on my servers and nothing more. I described what I'm witnessing on multiple web sites and it was disputed so it's no longer a conversation but a contradiction.

Ocean10000

11:39 pm on May 4, 2008 (gmt 0)

I don't think google has slowed down at all, but is now indexing more pages that previously it couldn't crawl. For example on one of my sites it is now submitting a form and crawling the results and pages previously it could not get too or even see. I personally have to figure out if I am going to block it or just let it keep doing this. One day alone it submitted well over 500 requests for different terms in that form. If my site is any type of marker, I can only imagine what it is doing on other websites which have similar setups as mine.

Just my 2 cents.

howiejs

1:48 am on May 5, 2008 (gmt 0)

interesting topic.

Google NEEDS new pages / user generated content indexed FAST - to remain competitive w/ the blog aggregators IMO

iridiax

2:26 am on May 5, 2008 (gmt 0)

Again, they are crawling a LOT of existing pages, but reducing priorities on static pages.

I can only draw my conclusions from the pages I see crawled frequently on my servers and nothing more.

This is exactly what I'm seeing! I have one site with static html pages and another site that uses WordPress and feeds, and I link to the newest pages on each site from the home page. The static site is seeing a much reduced crawl rate and new pages take weeks to get indexed. The WordPress site is crawled regularly and new pages are indexed quickly, sometimes within minutes.

webfool

8:39 pm on May 5, 2008 (gmt 0)

My discussion forums have been crawled quite a bit. Google can't get enough!

potentialgeek

12:00 am on May 9, 2008 (gmt 0)

Possible Fall-Out from the Reduced Google Crawl Rate

I'm used to Google hunting down and crawling all new pages quickly. But one site of mine is not being crawled so quickly any more. Some of the older pages in a large directory are getting return visits in fact before newer pages are crawled.

But here's a problem. A thief has lifted content directly from my site shortly after it was published. And I checked Google, and its content was crawled before mine! I searched for a snippet of text on my site that was stolen. Google shows Search Results on its site not mine.

My SERPs have since gone down about ten places. I suspect Google is now treating the site that got crawled first as the original and it thinks I'm the copier.

The thief isn't an auto-generated scraper site but looks like one. If Google's scraper algo is based on timing (first source=original), its reduced crawl rate could wreak havoc on SERPs.

p/g

This 39 message thread spans 2 pages: 39