IMO... its important whenever content change on your website ...
Also I believe google cache update frequency depends upon the number of quality links pointing .. more links from quality content site having quality links pointing to it more frequently google spider will get you site ...
I have several sites which main pages are crawled and updated at least daily. Some of them have content changing after a few days, but other have no changes in content for weeks. But all of them have good inbound links and PR 3 and higher.
In my impression it's related to inbounds and PR how frequently a page is crawled, but frequency of content updates is also an important factor, but less important.
In my experience, daily updates begin from PR 3 while PR 2 are updated much less frequently, and PR 1 just from time to time. I wonder what are observations of other people here - does someone have PR 4 or higher site that is not crawled every day?
Yep. My site for example. I forgot to mention that my siteīs homepage has had a good PR6 average for the last years. It has now a PR7 which is switching to PR5 now and then due to the latest update. The links comming in are of good quality and on-topic. Maybe itīs only a matter of time until I "catch up" with Google?
Regarding daily google cache. We all know its important to get cached as much as possible to stay current.
The question I have is this.
When your homepage gets cached/spidered/snapshot or whatever you want to call it, does it follow all the links on the homepage to see if there has been any inteneral changes to content on these internal pages?
Not every time it follows all the links in the home page. But yes it does sometime. And frequent crawling is very much necessary to get the good ranking and also to reach the audience with the latest updates in the sits. And this is possible with strong inbound links and also a regular updates without infringement of the content.
Typically, if you get quality inbound links pointing to your site, you will increase the frequency of crawls/caches. Additionally, make sure your website design and basic seo are put together well. It doesn't hurt to add a little content here and there and grow your site too. It really is that easy. I have never personally seen any different.
Couple more questions.
When google has cached the homepage will it follow the link that has been updated with fresh content while leaving the others on the homepage alone? I am not talking about a link change on the homepage but the content that goes beyond the link? Does google cache spiders go beyond the homepage when it takes a snapshot of the page to follow the links that lead to the pages that have fresh content on them?
Wouldn't it make sence and save time for them if they did crawl it then and there?
Can google spiders when taking a cache homepage snapshot, do the spiders actually know that there has been a specific change to the content from that link on the homepage? Or does it have to spider it first to know that?
Are the google cache homepage snapshot spiders intelligent enough to know that there has been a change to your site and this is the reason why they are coming. Or is just a guess on the spiders part to come and look if there are any changes?
I guess what I am asking is that when google comes to cache/spider/snapshopt or what ever you want to call comes to spider your homepage can it tell by spidering the homepage links that there has been changes to the content that those particular links on the homepage link to?
After this last update, my homepage which updated almost daily for 2 years, is stuck back on June 29th. I think it changed once then went back to June 29th. My pages below the home page do show some July and Aug caches. I tried adding the beta test sitemap on June 14th, not sure if that did something or not. Any ideas?
I've always tracked cache dates of sites worked on as an indicator of when changes I made to a site would be updated in the index... I never thought of it as a "Trust Factor" , though it does make sense since sites that I work on that are more popular have their cache refreshed quicker.
[edited by: engine at 9:02 am (utc) on Jan. 28, 2010]
[edit reason] See WebmasterWorld TOS [/edit]
I don't know of any way to directly influence the speed with which Google will crawl and index sites, or update the cache. Setting update frequency in your sitemap or meta tags will have no effect -- Google crawls and indexes your site based on other factors, such as site authority and history. A sitemap just helps Google find all your pages eventually (on its own schedule).
Updates to the Google cache typically trail updates to the actual search index by quite a bit -- days or weeks. You cannot use the cache as an indication of what is in the index. A better way to check is to search on exact strings in your new pages -- you may find them indexed but not yet in the cache.
As to how to speed up crawling and indexing, the best way to do this is to build the authority of your site, and that means building inbound links.
[edited by: engine at 9:03 am (utc) on Jan. 28, 2010]
[edit reason] See WebmasterWorld TOS [/edit]
there is a website with pr 1 and it's cached every day.
and i have a website with pr 1 as well but it's not cached every day.
both sites have news updated every day.
any idea about this ?
It can be many things - the exact topics covered, for example, since freshness is more important to Google on some topical areas. It also might be how often googlebot finds changes to a page after it is first published. And I'm sure there are other factors, too.
If you can definitely pin down the reason in this case, please let us all know.
|We all know its important to get cached as much as possible to stay current. |
No we don't all know that.
For several reasons, I always block my sites from being cached.
Getting cached is not what matters, getting crawled is.
But the only way to see if/when a URL has been reindexed is to look at the cache date. There is also a "last found" date in WMT but it differs from the cache date.
|There is also a "last found" date in WMT but it differs from the cache date. |
"Last found is the indicator of when googlebot crawled the URL. The server may have given a 304 Not Modified response, but at least there was a request on that date. Eventually, the cached date may be updated to match this last crawl, too.
Google's publicly visible cache can be 3 days behind a new indexing of the URL - even more.