Google officially says that it indexes the content just up to 101kb
If you have sitemaps with more than 1 MB of data I assume that there are several thousand links on them. Google suggests to have a maximum of 100 links on a site.
Maybe you should break the sitemaps down into smaller parts.
I have read somewhere in this forum GG suggesting breaking up sitemaps into number of pages, solves 2 problems,
1) complete indexing of entire page
2) Assurity that all pages will get indexed.
yes i do agree that this is the wisest thing to do.
it would be quite a task as i have quite a lot of sitemaps/allproducts page :)
thanks guys for the help. i am more leaning on what you all had to say, but perhaps i just needed a little push.
In my experience, pages with anything up to 1,000 links get crawled & the links followed but generally Google stops looking at the page when it hits 101Kb.
Regarding the post above: Google suggests to have a maximum of 100 links on a site., is this for sure? Documented by them somewhere or a proven fact?
i just learned that one large allproducts page of mine was cached but not everything was included.
the cached page only included almost 3/4 of it.
[mydomain.com...] 101k - Cached - Similar pages
on viewing the cached page, not all is included. so i guess the 101k limit is accurate.
Personally, I've seen googlebot (64.****) grab chunks up to almost 200k.
|Regarding the post above: Google suggests to have a maximum of 100 links on a site., is this for sure? Documented by them somewhere or a proven fact? |
Check out the comments from GoogleGuy on - [webmasterworld.com...]
Google will download your whole page, and they will even follow links beyond 101k. Over a year ago, I saw PR passed to the links beyond 101k. Brett agrees that the links are followed, but disagrees about PR being passed.
It does appear that the content of the page beyond 101k is not indexed, and it certainly is not cached nor does it show up in the snippet. This means that any searches for keywords that are located beyond 101k will not be found.
But they absolutely DO index what is in the first 100k. Most useful content is found well before that 100k point. Your users won't want to go down 50 screens to find what they searched on, and google does not want them to have to.
As for the 100 links per page, this is just a recommendation by google, and it is generally good design. I have had google follow well over 1000 links off a page that went beyond 101k.
But there is nothing to say that Google will not discount the value of those links, or even ignore them in the future. You can also be fairly certain that there is a big safety margin built into that 100 link recommendation. Don't even worry about it if you are under 200.
What about pages that are compressed? There is a GZIP encoding so you can compress your pages and save on bandwidth, which the browsers uncompress prior to display.
Anyone know how Google deals with compressed pages?
IIRC google specifically requests the pages uncompressed.
It really doesn't matter. They transfer the entire page anyway, even uncompressed. They just do not store the entire page.
Make your pages smaller than 101k if you care about it. Don't worry about it if you don't care.
See also: [webmasterworld.com...]