I would say that it depends on the speed of your server. If you dont have a fast server and your site is slow dont make your pages heavy the big g likes speedy sites. Many people say to try and limit your pages to 100k but i dont think that rule applies anymore. There are so many more important things to worry about than page size. If most of your pages are under 100k you should be safe.
Ill give you a hint. go to www.google.com and do a random search. Look at each result and see how much k each one is. If in your industry there are many people with 500k pages ranking number one (please show me this if you find it!) than you dont have to worry but in most cases you dont see pages more than 100k for very competitive keywords!
My server is very fast fortunately. It would be hard for me to get most of my pages under 150K...my home page is now 158K. Continuing to whittle away at it though...I can't wait to see the effect that slashing my page sizes has on indexing speed and ranking.
|If in your industry there are many people with 500k pages ranking number one (please show me this if you find it!) than you dont have to worry but in most cases you dont see pages more than 100k for very competitive keywords! |
But then, 500k pages are certainly not the norm, so it's not expected that you would see many prominently placed. They should be a rarity, purely based on natural distribution.
|I need to have very high quality full size images on the site |
Image size is irrelevant to how a page performs in search results: images are collected and indexed separately. The size in kb next to results is purely the HTML size, and even Google AdWords only measured the HTML download time the last time I checked.
Make the page load as fast as is humanly possible, since every visitor likes a fast page, and being popular is great SEO. But don't stress too much about an overall measurement of size: look at things like Wikipedia articles for competitive single words, or many major brand websites to see quick evidence that page size (in numbers terms) doesn't stop you performing for competitive phrases.
IMO, there's no optimal page size for Google (if indeed there ever was!): there isn't even an optimal amount of text. It's a judgement call based on the topic, the author and who's competing.
If images are not included into the mix with Google then there are no worries as my html, css, scripts are around 50-70k combined. Still, some of my pages were verging on 600k before so I expect cutting that down by over two thirds will provide a bump. I always thought my bounce rates were unusually high and now I know why! Hopefully the site will be stickier now and more people will link to it/bookmark it
Excessive page load times remain a serious usability problem, especially for sites that need to repeatedly acquire repeat visitors (first impressions count).
It's also an easy thing for a site owner to forget that when they view their own site, nearly everything will be cached, and load extremely quickly. A 500k page is likely to take longer than 10 seconds to fully load for the majority of users, and most new users don't have that kind of patience.
Of course, the numbers aren't really the problem for usability either: below the fold can take considerably longer to load if a user immediately sees the top half of a page that exactly matches their needs.
Do you use gzip output compression? This makes your pages much smaller...
File size may well affect how many pages get spidered during a Googlebot visit.
The higher your PageRank, the more time Googlebot will spend on site. So if you have a lot of pages and low PR, you don't want your files to be large.
|... how many pages get spidered during a Googlebot visit. |
Is this how it works? I mean doesn't Googlebot visit pages rather than a 'website'?
I agree with the benefits of fast loading pages for humans who don't like to wait. I'm curious to know, though, how long Googlebot is prepared to 'wait' on a particular page. I assume Googlebot is basically a program that reads pages and puts the relevant content into a database (without downloading images) and that this program has a timeout set by Google.
As I understand it, googlebot has a "crawl budget" for each domain. Most (all?) of its url requests come from a url list for the domain that was previously discovered, prioritized and filed away for future spidering adventures.
The crawl team has some pretty complex and evolving algorithms for prioritizing and budgeting, and it's not clear to me whether the budget sets a total time, total bandwidth, or (most likely) some combination of the two. Consideration is given both to Google's needs and to accomodating the server's abilities.
I rarely have an HTML page with combined HTML code and worded content over about 25 to 30 KB.
All CSS and JS goes in external files, and images are of course separate too.
The bot retrieves the HTML and nothing else. Browsers pull everything needed to render.
Google seems to like to index 95% or more of the pages, though they take a few weeks to get everything, growing slowly.