Welcome to WebmasterWorld Guest from 100.26.176.182
Forum Moderators: Robert Charlton & goodroi
I am observing this on my own site, and sites run by several others. The sites are not related (different ownership, development teams, link strategies, hosting facilities, etc).
The graphs in Google webmaster tools (site -> tools -> set crawl rate) show that pages and kilobytes per day have dropped drastically, almost to zero in April. The graph of time spent downloading a page has not changed.
Are other people observing this phenomenon too?
[edited by: tedster at 1:50 am (utc) on April 29, 2008]
[edit reason] fix formatting [/edit]
That's not the issue, nor even particularly true, so let's stay focused. Google is crawling all existing pages less, and new notification pages quickly. I've already explained the obvious reason this is a foolish priority, if an engine cares about quality. If you don't crawl existing pages, you can't know the correct value of new pages.
If you don't crawl existing pages, you can't know the correct value of new pages.
Again, they are crawling a LOT of existing pages, but reducing priorities on static pages.
How many times do your about, contact, privacy and legal pages need to be crawled?
The strategy to crawl and index new information as quickly as possible is all about being timely so people can find new things in Google vs. some other search engine or blog aggregator.
Whether you or I agree with that strategy is moot because static search engine results aren't good in a rapidly changing world as new information needs to be assimilated by the masses as quickly as possible.
Since people tend to look to the search engines more than anywhere else for answers, if Google is able to provide those answers to recent events ahead of anyone else, they win, simple as that.
[edited by: incrediBILL at 9:33 pm (utc) on May 4, 2008]
Again, not only is that not what they are doing, it isn't the topic here.
The topic is a 75% reduction in crawling, which reflects the widespread reduction in crawling existing pages by Google. This has nothing to do with contact or privacy pages... unless those make up 75% of the pages on your domain!
The topic here is Google's reduction in crawling pages on a domain that are updated in a normal, regular fashion.
"Google is able to provide those answers to recent events ahead of anyone else, they win, simple as that."
Certainly not. Google doesn't need to provide answers, it wins when it provides accurate answers.
The priority of creation means a piece of crap spam blog is very close to on the same footing as as an authoritative blog, while being ahead of a quality news sites. But that is secondary. Speed is for Google news, not the organic results, which should be prioritized by quality, and now Google's upside-down crawl priorites lessen quality. The results are still basically good, but since changed pages are picked up slower and slower, the results degrade.
The topic is a 75% reduction in crawling
I'm not seeing 75% reduction and explained what I'm seeing, doesn't make my observations any less valid as it's a reduction nonetheless. I can only draw my conclusions from the pages I see crawled frequently on my servers and nothing more. I described what I'm witnessing on multiple web sites and it was disputed so it's no longer a conversation but a contradiction.
Just my 2 cents.
Again, they are crawling a LOT of existing pages, but reducing priorities on static pages.
I can only draw my conclusions from the pages I see crawled frequently on my servers and nothing more.
This is exactly what I'm seeing! I have one site with static html pages and another site that uses WordPress and feeds, and I link to the newest pages on each site from the home page. The static site is seeing a much reduced crawl rate and new pages take weeks to get indexed. The WordPress site is crawled regularly and new pages are indexed quickly, sometimes within minutes.
I'm used to Google hunting down and crawling all new pages quickly. But one site of mine is not being crawled so quickly any more. Some of the older pages in a large directory are getting return visits in fact before newer pages are crawled.
But here's a problem. A thief has lifted content directly from my site shortly after it was published. And I checked Google, and its content was crawled before mine! I searched for a snippet of text on my site that was stolen. Google shows Search Results on its site not mine.
My SERPs have since gone down about ten places. I suspect Google is now treating the site that got crawled first as the original and it thinks I'm the copier.
The thief isn't an auto-generated scraper site but looks like one. If Google's scraper algo is based on timing (first source=original), its reduced crawl rate could wreak havoc on SERPs.
p/g