Forum Moderators: phranque

Message Too Old, No Replies

Sitemap Best Practices

Dealing with size and dumb search engines

         

ronburk

1:53 am on Feb 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Site maps are a simple but great idea. You give visitors an overview of the whole site, and you end up with a page that has links to everywhere, which helps ensure SE spiders can locate everything.

But there are some problems with just naively spitting out a page that has a link to every other page on your website.

  • Dumb Engines Overvalue Your Sitemap Page - because we usually link to the sitemap page from every other page on the website, that can make some search engines decide to list the sitemap page instead of the actual article that contains the "real stuff". Of course, people who click on that SE link find themselves in a sitemap, and probably can't immediately locate the actual relevant text (and it's link), so they leave. Ugh.
  • Sitemap Page Gets Too Big - Spiders usually have some limit on page size and/or # of links before they give up. Not a problem when you're starting out, but as you keep adding content that sitemap keeps growing and, before you know it, you figure out that the search engine isn't reading it all any more.

Ran into these (and other) problems? What do you do? How do you break your sitemap up? How do you link to a broken-up sitemap (e.g., all pages point to the root sitemap page, or each page points to the most relevant sitemap sub-page, or...?)

specter

12:17 pm on Feb 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

I wouldn't worry for this:SEs searches keyword density,so they index page contents not links.Also,If you have a long link list,SEs anyway scan all of them,and search the related ones.
So there is any problem,but if you wish to be more secure,you just could break them up in sub-pages in a three structure like any existing directory.

ronburk

4:13 pm on Feb 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I wouldn't worry for this

Sorry, I wasn't clear. These weren't hypothetical concerns. MSN/Yahoo! regularly rank my sitemap.html higher than, say, "article.html" for the title of "article.html" (which, of course, appears in sitemap.html. Not all the time, of course, but often enough to be irritated about the wasted traffic (people are unlikely to search a large sitemap page to find the exact keyword they were looking for).

Maybe someone has some SEO tips for discouraging this kind of mis-ranking of the sitemap page.

you just could break them up in sub-pages

I wasn't clear again. Yes, I implicitly listed that as a solution in the original post. What I'm looking for is a sampling of how other people who've run into these same problems have actually dealt with them (and the actual link structure they chose). Sounds like you haven't run into these problems.

Also,If you have a long link list,SEs anyway scan all of them,and search the related ones.

While it's true that Google has expanded their previous maximum page size that GoogleBot will scan, you seem to be saying you think that there is no maximum at all. My experience is to the contrary. See, for example, [webmasterworld.com ] wherein GoogleGuy cites the hard limit (at that time) of 100KB per page. The limit is larger now. I'm certain it's not infinite.

Key_Master

4:48 pm on Feb 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dumb Engines Overvalue Your Sitemap Page

Have you tried <meta name="robots" content="noindex,follow">?

Sitemap Page Gets Too Big

Google offers an alternative way to submit large sitemaps.

[google.com...]

Beagle

7:30 am on Feb 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I notice that you specifically mention MSN and Yahoo! I have no experience with Yahoo! But judging just from my own use of MSN as a searcher, it seems to pay more attention to the site content and less attention to the page content than Google does (relatively speaking). That's why I prefer it for my own searching; I'd much rather find a good site about a topic than a lot of pages that happen to have the word or phrase I typed into the search box.

I don't know how to turn that into specific ideas about how to stop it from mis-ranking your site map. I can't remember it ever landing me on a site map page, but it often takes me to a site's home page, which is usually what I want.

watercrazed

8:16 am on Feb 26, 2006 (gmt 0)

10+ Year Member



Hi,
I have ran into the same problem, I also had about 700+ pages on the site map, I broke it up into pages by category and about 75+ links per page. Link at the bottom of category 1 page to catagory 2. They got spidered and indexed great but out ranked my main pages on the topic. Still a problem but I limited the the number of pages that I put the sitemap link on, it helped a bit. I am in the process of puting nofollow links on most of the sitemap links on the main info pages.

joaquin112

1:49 am on Feb 28, 2006 (gmt 0)

10+ Year Member



Have you tried <meta name="robots" content="noindex,follow">?

Then why would you place a sitemap in the first place?

Key_Master

2:06 am on Feb 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Because it keeps the sitemap out of the serps but allows the bot to follow the links.

Can't be any more simple than that.

joaquin112

3:38 am on Feb 28, 2006 (gmt 0)

10+ Year Member



so even if it's not indexed, it follows the links? I am not too sure... can anyone concurr?

Key_Master

3:53 am on Feb 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, sometimes I forget this site is more or less geared towards inexperienced webmasters. Do a search, there is plenty of info on this site and on the web.

[webmasterworld.com...]

ronburk

6:09 pm on Feb 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, I not only want the sitemap for the bots to follow, but I want the content indexed as well. I just don't want the SE to rank that sitemap page higher than a page it links to for a phrase that that other page is clearly devoted to!

Why do I want the sitemap indexed? Because it's a valuable source of keyword research once it attains a certain number and variety of keywords on the page. People put together keywords X and Y and land on the sitemap page because it refers to an article about X and also to an article about Y. I see that in my logs and realize "Oh, I could get some extra traffic by writing an article about "X Y".

noindex,follow

Hmmm, Google isn't a problem in this regard, and I guess I thought that was a Google-only construct, but I see now that it allegedly isn't. Of course, I don't want to use it as suggested here, since I do want the sitemap indexed and followed :-).

But, I guess I could use the rel="nofollow" link attribute on all the anchors that point to the sitemap page (except one). I suspect that would convince Yahoo! and MSN that it should not rank higher than a "normal" page for anything. However, I wonder if it would also significantly decrease the number of serendiptious hits on unanticipated keyword combinations that my sitemap currently gets.

Might be worth an experiment.