homepage Welcome to WebmasterWorld Guest from 54.204.249.184
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Will Google Index a 1 MB+ page?
Google indexing large files.
bluetoothache




msg:213395
 10:36 am on Jan 11, 2004 (gmt 0)

hi,

Is google and any search engine for that matter have any trouble indexing a 1 MB + single html/php file? i have a few around that size .. usually sitemaps and index of all products.

any pros and cons?

thank you.

bluetoothache

 

ThomasB




msg:213396
 3:52 pm on Jan 11, 2004 (gmt 0)

Google officially says that it indexes the content just up to 101kb

If you have sitemaps with more than 1 MB of data I assume that there are several thousand links on them. Google suggests to have a maximum of 100 links on a site.

Maybe you should break the sitemaps down into smaller parts.

nileshkurhade




msg:213397
 4:07 pm on Jan 11, 2004 (gmt 0)

I have read somewhere in this forum GG suggesting breaking up sitemaps into number of pages, solves 2 problems,
1) complete indexing of entire page
2) Assurity that all pages will get indexed.

bluetoothache




msg:213398
 4:19 pm on Jan 11, 2004 (gmt 0)

yes i do agree that this is the wisest thing to do.

it would be quite a task as i have quite a lot of sitemaps/allproducts page :)

thanks guys for the help. i am more leaning on what you all had to say, but perhaps i just needed a little push.

best regards.

bluetoothache.

johnser




msg:213399
 4:20 pm on Jan 11, 2004 (gmt 0)

In my experience, pages with anything up to 1,000 links get crawled & the links followed but generally Google stops looking at the page when it hits 101Kb.

nakulgoyal




msg:213400
 4:21 pm on Jan 11, 2004 (gmt 0)

Regarding the post above: Google suggests to have a maximum of 100 links on a site., is this for sure? Documented by them somewhere or a proven fact?

bluetoothache




msg:213401
 4:24 pm on Jan 11, 2004 (gmt 0)

i just learned that one large allproducts page of mine was cached but not everything was included.

the cached page only included almost 3/4 of it.

bluetoothache




msg:213402
 4:28 pm on Jan 11, 2004 (gmt 0)

noticed this.

[mydomain.com...] 101k - Cached - Similar pages

on viewing the cached page, not all is included. so i guess the 101k limit is accurate.

jamesa




msg:213403
 5:05 pm on Jan 11, 2004 (gmt 0)

Personally, I've seen googlebot (64.****) grab chunks up to almost 200k.

nileshkurhade




msg:213404
 7:20 pm on Jan 11, 2004 (gmt 0)

Regarding the post above: Google suggests to have a maximum of 100 links on a site., is this for sure? Documented by them somewhere or a proven fact?

Check out the comments from GoogleGuy on - [webmasterworld.com...]

BigDave




msg:213405
 7:37 pm on Jan 11, 2004 (gmt 0)

Google will download your whole page, and they will even follow links beyond 101k. Over a year ago, I saw PR passed to the links beyond 101k. Brett agrees that the links are followed, but disagrees about PR being passed.

It does appear that the content of the page beyond 101k is not indexed, and it certainly is not cached nor does it show up in the snippet. This means that any searches for keywords that are located beyond 101k will not be found.

But they absolutely DO index what is in the first 100k. Most useful content is found well before that 100k point. Your users won't want to go down 50 screens to find what they searched on, and google does not want them to have to.

As for the 100 links per page, this is just a recommendation by google, and it is generally good design. I have had google follow well over 1000 links off a page that went beyond 101k.

But there is nothing to say that Google will not discount the value of those links, or even ignore them in the future. You can also be fairly certain that there is a big safety margin built into that 100 link recommendation. Don't even worry about it if you are under 200.

BigJay




msg:213406
 8:56 pm on Jan 11, 2004 (gmt 0)

What about pages that are compressed? There is a GZIP encoding so you can compress your pages and save on bandwidth, which the browsers uncompress prior to display.

Anyone know how Google deals with compressed pages?

BigDave




msg:213407
 9:09 pm on Jan 11, 2004 (gmt 0)

IIRC google specifically requests the pages uncompressed.

It really doesn't matter. They transfer the entire page anyway, even uncompressed. They just do not store the entire page.

Make your pages smaller than 101k if you care about it. Don't worry about it if you don't care.

g1smd




msg:213408
 10:57 pm on Jan 13, 2004 (gmt 0)

See also: [webmasterworld.com...]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved