Welcome to WebmasterWorld Guest from 188.8.131.52
Ill give you a hint. go to www.google.com and do a random search. Look at each result and see how much k each one is. If in your industry there are many people with 500k pages ranking number one (please show me this if you find it!) than you dont have to worry but in most cases you dont see pages more than 100k for very competitive keywords!
If in your industry there are many people with 500k pages ranking number one (please show me this if you find it!) than you dont have to worry but in most cases you dont see pages more than 100k for very competitive keywords!
But then, 500k pages are certainly not the norm, so it's not expected that you would see many prominently placed. They should be a rarity, purely based on natural distribution.
I need to have very high quality full size images on the site
Image size is irrelevant to how a page performs in search results: images are collected and indexed separately. The size in kb next to results is purely the HTML size, and even Google AdWords only measured the HTML download time the last time I checked.
Make the page load as fast as is humanly possible, since every visitor likes a fast page, and being popular is great SEO. But don't stress too much about an overall measurement of size: look at things like Wikipedia articles for competitive single words, or many major brand websites to see quick evidence that page size (in numbers terms) doesn't stop you performing for competitive phrases.
IMO, there's no optimal page size for Google (if indeed there ever was!): there isn't even an optimal amount of text. It's a judgement call based on the topic, the author and who's competing.
It's also an easy thing for a site owner to forget that when they view their own site, nearly everything will be cached, and load extremely quickly. A 500k page is likely to take longer than 10 seconds to fully load for the majority of users, and most new users don't have that kind of patience.
Of course, the numbers aren't really the problem for usability either: below the fold can take considerably longer to load if a user immediately sees the top half of a page that exactly matches their needs.
... how many pages get spidered during a Googlebot visit.
Is this how it works? I mean doesn't Googlebot visit pages rather than a 'website'?
I agree with the benefits of fast loading pages for humans who don't like to wait. I'm curious to know, though, how long Googlebot is prepared to 'wait' on a particular page. I assume Googlebot is basically a program that reads pages and puts the relevant content into a database (without downloading images) and that this program has a timeout set by Google.
The crawl team has some pretty complex and evolving algorithms for prioritizing and budgeting, and it's not clear to me whether the budget sets a total time, total bandwidth, or (most likely) some combination of the two. Consideration is given both to Google's needs and to accomodating the server's abilities.
All CSS and JS goes in external files, and images are of course separate too.
The bot retrieves the HTML and nothing else. Browsers pull everything needed to render.
Google seems to like to index 95% or more of the pages, though they take a few weeks to get everything, growing slowly.