Forum Moderators: open
Looking at the page as it loads, a lot of the first part of the page is information I'd rather not have indexed by Google. These are things like drop-down menus for navigation, search boxes, affiliate logos, ads, etc and loaded as included files. Now most of these items are stored in the root with an extension of .htm (while the main code is index.html), so if I use robots.txt to disallow all files with extension of htm, will Googlebot not load these when it reads index.html?
no, it is the nature of included files.
when googlebot requests index.html (which doesn't have an htm extension) the server then responds by putting the index.html page together and returning it.
googlebot doesnt know, or care anything about, what files were included, the server takes care of that and returns it upon request.
If you want to reduce your page size you will have to do it the old fashioned way, reduce it. ;)
If that's the case, then wouldn't that be cloaking (which the engines don't like)?
for a description of cloaking check here
The Truth About Cloaking [webmasterworld.com]