Msg#: 3802528 posted 1:56 pm on Dec 8, 2008 (gmt 0)
I know Google sitemaps get discussed here from time to time but trying to search for threads on their importance didn't come up with much - and they don't appear to be featured in Hot Topics.
I have a fairly large site, with a lot of static AND dynamic pages. I have never created an xml site map for this site (have used them on smaller or new sites) as was concerned about getting it complete enough.
My main question - is a partial or complete sitemap useful or harmful? Is it ok to create a sitemap that lists the static pages and let Google find the dynamic pages that lead on from there?
Having updated and significantly added to one area of the site, I would like to encourage Google to update it's index of the relevant pages. The Google cache of one particular page that is now much more important with much more content, is 7 weeks old.
I guess following on from main question I would also ask, is an out of date xml sitemap harmful (ie not listing recent page additions)
Msg#: 3802528 posted 8:29 pm on Dec 8, 2008 (gmt 0)
A Sitemap can contain a list of up to 50,000 URLS. If more than 50,000 URLs, you should create multiple Sitemaps and submit a Sitemap index file. I have a couple of large sites that use an index file that list individual sitemaps created from db queries. It is the easiest way to create a sitemap with a hands-off approach. Dates are automatically filled as pages are created. Priorities are assigned according to the sections the pages come from. For example, if you have a forum with 20 categories, you could set up an individual sitemap for each category and list those sitemaps in the index file. As posts are added to each category, the sitemaps query a db to get all urls in that category. Posts are timestamped which can be used to fill in the creation date. On my forums, each category carries a different priority which is filled in the sitemap. I used to try to lead googlebot to the main sections of the site in hopes of it going further to crawl most of the posts, but I have found that I can get around 50% more of my pages in the index by having them listed in sitemaps. This comes after a couple of years of trying to maintain the sitemaps by hand... which can really get to be a pain as a site grows.
Msg#: 3802528 posted 8:34 pm on Dec 8, 2008 (gmt 0)
A Sitemap is a supplemental bit of information that Google's ADDS to their regular crawling. That regular crawling is Google's primary method of URL discovery, and the site's link structure is the big key.
So, while I've never done it (at least not intentionally) I can't foresee any problems from a Sitemap that doesn't contain all the site's URLs. If a URL has a link pointing to it, then Google almost always finds it.
The only drawback to not having a URL included in the Sitemap would be that you can't give Google the extra XML information, such as <crawlfreq> suggestions.
Msg#: 3802528 posted 3:19 am on Dec 10, 2008 (gmt 0)
You can always try see if you can scan a section of your sitemap... That way also easier to update the sitemap for that part of the website when it changes... if you are using a sitemap generator program.
Msg#: 3802528 posted 8:58 am on Dec 10, 2008 (gmt 0)
That is very interesting to know, and what I was hoping to hear.
In another recent thread here somewhere, someone was asking how you can get Google to spider one particular page. I'm not suggesting a sitemap for one page, but submitting a sitemap for the main pages because one of them needs updating in their index sounds feasible.