Forum Moderators: open

Message Too Old, No Replies

Site Map Too Big

Do I Need to Break Up My Site Map?

         

Whoa

9:54 pm on Sep 13, 2002 (gmt 0)

10+ Year Member



We recently created a new site map for our site, adding a site map for the first time. We did this after the last index was posted, and since then Googlebot came to our site and indexed the home page and picked up the new site map. Unfortunately, we don't have access to our logs so I can't see where Googlebot went, i.e. if it crawled the pages off the site map, but I know that it got to the new site map page because the site map appeared in search results.

I started doing searches on the site map's links and it came up in Google under most of the links terms. In other words, if there was a link to a page on our site and the link text was "Widget Repair" the site map was found when I did a search on "Widget Repair".

However, it didn't seem like Google got all the way through the site map file. In other words, only about two-thirds of the link texts result in my finding the site map when I do a search in Google. That's about 400 search terms of around 600. The last 200 search terms don't result in Google finding the site map.

Is it possible that the site map is too big? I had seen somebody say that a site map should be broken up into linked chunks of 50K files. My site map is about 148K and strangely Google indicates that it's about 101K.

It's like it just ignored the last 47K in the file. Maybe it got tired or bored.

Does anyone have any experience with this? Do I need to break up my site map? Does Google think it's a link farm or a bunch of doorway pages or something? I do use a common template and some of the pages are quite similar (while still being uniquely valuable to the visitor).

I guess in the end I will find out what Google thought of the site map when the next index is up, but just thought somebody might have some ideas.

Timona

10:09 pm on Sep 13, 2002 (gmt 0)

10+ Year Member



Google only indexes 101K before it moves on to the next page.

pageoneresults

10:12 pm on Sep 13, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If it is that large, break it up into multiple site maps. You just need to make sure that there is a whole linking structure between all site maps and the rest of the site.

Now that I think about it, if its that large, you may want to go directory style. ;)

akogo

10:15 pm on Sep 13, 2002 (gmt 0)

10+ Year Member



I have a 101K+ file. Don't think it index everything. 1,000+ files didn't appear in the index.

Whoa

10:18 pm on Sep 13, 2002 (gmt 0)

10+ Year Member



Timona,

Thanks for letting me know that. What exactly does it mean?

Does it mean Google drinks 101K worth of data and then goes on and will never get to the other 49K of data (but still indexing the links in the first 101K)?

Or does it mean Google drinks 101K worth of data, leaves, but eventually will come back to finish the job (but still indexing the links in the first 101K)?

Or does it mean Google realizes the files is > than 101K and moves on and never ever indexes any of the links in the first 101K?

ikbenhet1

10:28 pm on Sep 13, 2002 (gmt 0)

10+ Year Member




i'd go for answer A, index the first 101k and move on.

pageoneresults

10:45 pm on Sep 13, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It means that 101k is well over any acceptable page size limits. I believe the magic number is 40k. The smaller the better. If you have to worry about how much is being spidered per page, then you have too much. You either need to trim out html code, or build additional pages.

brotherhood of LAN

10:47 pm on Sep 13, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



i'd go for answer A, index the first 101k and move on.

You are tonights winner then ;) If you look at the SERP's in Google, 101k is the maximum file size of the source code.

It is rare that pages should be that big anyway, spare a thought for the slow-coach modems that many people are still using.

pageoneresults

10:56 pm on Sep 13, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> It means that 101k is well over any acceptable page size limits.

I meant that from a design and indexing perspective. The leaner your code, the more you get indexed. Just think, would you rather have 10 pages of 101k get indexed or 20 pages of 50.5k? I'll take the latter. Even so, I might even go a step further and trim them down to 25.25k and end up with 30 pages being indexed. The smaller the better.