Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: mademetop
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.
It is a shame most of you missed PubCon, you would have heard the rather detailed explanations about the use of sitemaps from Tim Mayer (Yahoo), Vanessa Fox (Google) and Eytan Seidman (Microsoft). (I am still curious as to why Peter Linsley carefully avoided questions about why Ask.com did not participate in the joint sitemap awareness project. We tend to pick on Ask.com but they are #4 in the search game and should be more vocal about helping webmasters.)
Just as you do not have to authenticate with Google, Yahoo or Microsoft to submit your websites to their indexes, you don't have to use the authenticated services for Google Webmaster Tools or Yahoo SiteExplorer or Microsoft Live Search to submit your sitemaps. The Google & Yahoo tools are primarily useful (and intended) for monitoring how the bots are accessing your site(s) and to check to see if they are encountering errors. They were created in response to the numerous complaints about bots repeatedly chasing down 404 errors and creating loops while crawling sites. By monitoring and fixing problems on your sites, you reduce your bandwidth usage and help control bot traffic.
Think of Google's Webmaster Tools and Yahoo's SiteExplorer as utilities that are less intended for proclaiming who you are and where your site is and more targeted for asking "Can you access my websites and do you see any problems I need to fix?"
The link attribute rel="nofollow" and submitting a sitemap will NOT deter or affect normal spidering of your website. The nofollow attribute simply tells the indexes you do not "trust" the linked target url and don't want that association to affect your site. The sitemap serves as a "guide" for what you believe is important or should be noticed on your website. Think of the sitemap as a more detailed and formal robots.txt. A sitemap is no more powerful and only slightly more useful than robots.txt.
Each sitemap should be limited to under 100 urls (links/pages) and under 10mb gzipped compressed size. Yes, just as you should be serving your html/php/jsp/asp pages in compressed format to reduce bandwidth, you should also create & store your sitemaps compressed with gzip.
For large sites, you can have individual, unique sitemaps for subdirectories or large numbers of urls and one master Sitemap Index that lists all the other sitemaps on your website.
(Google sitemap FAQ) [google.com] -
(Sitemaps.org FAQ) [sitemaps.org]
You cannot list other site urls in your sitemap, that rather defeats the purpose of calling it a SITEmap rather than an Internet roadmap.
The consensus was that this is an attempt to get webmasters to start taking responsibility for producing solid, well designed sites so they will stop blaming the bots for poor indexing & crawl errors.
This year's PubCon was heavily slanted towards the benefits of proper site management and how it affects the search industry. I agree with the theme; corporate, commercial and MFA webmasters need to cleanup their sites. (if you have a comment on that, start a new thread)
Plan your large site sitemaps as carefully as you should be planning your website development. Someone said it best in the pub on the last day of PubCon, "You should be feeding the bots, not trapping them." My apologies, due to the crowd and noise, I didn't get the source of the quote.
JK - AWT
[edited by: TXGodzilla at 5:54 am (utc) on Nov. 20, 2006]