Welcome to WebmasterWorld Guest from 54.224.17.208

Forum Moderators: open

Message Too Old, No Replies

Will Google Index a 1 MB+ page?

Google indexing large files.

     
10:36 am on Jan 11, 2004 (gmt 0)

10+ Year Member



hi,

Is google and any search engine for that matter have any trouble indexing a 1 MB + single html/php file? i have a few around that size .. usually sitemaps and index of all products.

any pros and cons?

thank you.

bluetoothache

3:52 pm on Jan 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google officially says that it indexes the content just up to 101kb

If you have sitemaps with more than 1 MB of data I assume that there are several thousand links on them. Google suggests to have a maximum of 100 links on a site.

Maybe you should break the sitemaps down into smaller parts.

4:07 pm on Jan 11, 2004 (gmt 0)

10+ Year Member



I have read somewhere in this forum GG suggesting breaking up sitemaps into number of pages, solves 2 problems,
1) complete indexing of entire page
2) Assurity that all pages will get indexed.
4:19 pm on Jan 11, 2004 (gmt 0)

10+ Year Member



yes i do agree that this is the wisest thing to do.

it would be quite a task as i have quite a lot of sitemaps/allproducts page :)

thanks guys for the help. i am more leaning on what you all had to say, but perhaps i just needed a little push.

best regards.

bluetoothache.

4:20 pm on Jan 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In my experience, pages with anything up to 1,000 links get crawled & the links followed but generally Google stops looking at the page when it hits 101Kb.
4:21 pm on Jan 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Regarding the post above: Google suggests to have a maximum of 100 links on a site., is this for sure? Documented by them somewhere or a proven fact?
4:24 pm on Jan 11, 2004 (gmt 0)

10+ Year Member



i just learned that one large allproducts page of mine was cached but not everything was included.

the cached page only included almost 3/4 of it.

4:28 pm on Jan 11, 2004 (gmt 0)

10+ Year Member



noticed this.

[mydomain.com...] 101k - Cached - Similar pages

on viewing the cached page, not all is included. so i guess the 101k limit is accurate.

5:05 pm on Jan 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Personally, I've seen googlebot (64.****) grab chunks up to almost 200k.
7:20 pm on Jan 11, 2004 (gmt 0)

10+ Year Member



Regarding the post above: Google suggests to have a maximum of 100 links on a site., is this for sure? Documented by them somewhere or a proven fact?

Check out the comments from GoogleGuy on - [webmasterworld.com...]

7:37 pm on Jan 11, 2004 (gmt 0)

WebmasterWorld Senior Member bigdave is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Google will download your whole page, and they will even follow links beyond 101k. Over a year ago, I saw PR passed to the links beyond 101k. Brett agrees that the links are followed, but disagrees about PR being passed.

It does appear that the content of the page beyond 101k is not indexed, and it certainly is not cached nor does it show up in the snippet. This means that any searches for keywords that are located beyond 101k will not be found.

But they absolutely DO index what is in the first 100k. Most useful content is found well before that 100k point. Your users won't want to go down 50 screens to find what they searched on, and google does not want them to have to.

As for the 100 links per page, this is just a recommendation by google, and it is generally good design. I have had google follow well over 1000 links off a page that went beyond 101k.

But there is nothing to say that Google will not discount the value of those links, or even ignore them in the future. You can also be fairly certain that there is a big safety margin built into that 100 link recommendation. Don't even worry about it if you are under 200.

8:56 pm on Jan 11, 2004 (gmt 0)

10+ Year Member



What about pages that are compressed? There is a GZIP encoding so you can compress your pages and save on bandwidth, which the browsers uncompress prior to display.

Anyone know how Google deals with compressed pages?

9:09 pm on Jan 11, 2004 (gmt 0)

WebmasterWorld Senior Member bigdave is a WebmasterWorld Top Contributor of All Time 10+ Year Member



IIRC google specifically requests the pages uncompressed.

It really doesn't matter. They transfer the entire page anyway, even uncompressed. They just do not store the entire page.

Make your pages smaller than 101k if you care about it. Don't worry about it if you don't care.

10:57 pm on Jan 13, 2004 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



See also: [webmasterworld.com...]
 

Featured Threads

Hot Threads This Week

Hot Threads This Month