ZydoSEO - 11:15 pm on Dec 10, 2012 (gmt 0)
My experience has been that your site has scheduled deep crawls where Googlebot arrives and follows the various internal links on your site in order to discover new and/or updated content (as well as possible links to other sites). They may or may not crawl all of your pages, depending on the number you have. As jimbeetle indicated, the more PageRank a site has, it seems the deeper they are willing to crawl and the more pages the engine is typically willing to index from your site. But I'm guessing that this phenomenon is more correlation than causation... that the phenomenon has more to do with the following paragraph than simply the overall PageRank number.
On top of your site's scheduled deep crawls, all of the sites that link to your site also have their own scheduled deep crawls. And when Googlebot visits each of those external linking sites for their scheduled crawl, follows their links, and stumbles onto a link from their site to your site... they will request your URL on your site to see if that link is still valid (200), is being redirected (3xx), or no longer exists (4xx). Googlebot often doesn't stop there.
While they are checking that externally linked URL on your site to see if the external inbound link is still good, they will often perform an incremental or partial crawl of your site, crawling other pages on your site in close proximity to the linked page. I'm guessing the number of pages crawled in close proximity to the linked page is based on the amount of PageRank/link juice being passed in via that external links... but who really knows. But this is why having deep links into your site are so important. They help get the deep content around that deep linked page crawled and indexed as well.
This previous paragraph is why it when a site gets redesigned and tons of 301 redirects get implemened, that the site typically loses traffic/rankings for some period, often several weeks, before rankings/traffic return. You have to wait for each one of the sites/pages linking to your site to be recrawled one-by-one, the 301 redirect for that specific link to be discovered, and credit for that specific link to be transfered to your new URL. During this transition the old URL which was ranking has fewer and fewer links while the new URL's links grow until all links have been recrawled. PageRank, being a recursive algorithm, also likely takes several crawls of all inbound links before it can be properly calculated (approaches some asymptotic value).
So yes... Googlebot and other crawlers are definitely going to crawl pages with external links more often than pages with only internal links. Those with external links have a chance of getting crawled on your scheduled crawls as well as each time a linking page on another site is crawled. Those with only internal links only have the chance of being crawled as part of your site's scheduled crawls.
A site I managed for 3 years had 5MM backlinks, and Googlebot and the other major crawlers were on that site literally 24x7 crawling. The site had less than 5,000 pages and they rarely updated the pages or added new content. So there was no real reason it warranted that much activity.... definitely not the freshness of the content. Instead, it was all of the incremental crawls due to scheduled crawls of other 5MM external linking sites/URLs that kept the crawler on the site