Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Can a site be hurt by an incomplete xml sitemap?

         

smithaa02

2:13 pm on May 17, 2011 (gmt 0)

10+ Year Member



On one of our company's large sites we run a CMS that automatically generates a sitemap.xml file that we have tied into webmaster tools.

This is fine for the most part...updated pages are automatically posted to the sitemap.xml file as well as new pages added are also added to this XML file.

The problem is that this is a legacy site that has a lot of non-CMS pages on the server (like archived e-newsletters) that I would really liked to be crawled and ranked, but because they are not part of the CMS's DB files (they are just straight HTML) they aren't included automatically in the sitemap.xml file.

Am I hurting my non-CMS pages (there are a lot) by omitting them from sitemap.xml? So does google pretty much assume that what you have in your sitemap.xml file = your entire site? I know google can and does crawl these non-sitemap.xml files but am worried about the crawl frequency and subsequent weightings.

Am I better off just ditching the sitemap.xml file altogether so google doesn't make false assumptions on my incomplete sitemap.xml file?

TheMadScientist

3:35 pm on May 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It should not have any impact on rankings, or imo even an impact on crawling ... As long as they are being crawled, then they have been found and will likely be given a 'crawl frequency' based on their PR more than inclusion or omission from a sitemap file, because Google's mission is to 'get the right information' to the visitor and if they based crawling or rankings on a inclusion or omission from an xml sitemap they could miss 'good information' based on a simple error, so my guess is they might use the sitemap xml as a 'hint' but not as anything 'concrete'.

I haven't seen any evidence to the contrary either.

goodroi

3:39 pm on May 17, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Google uses sitemaps as a tool to find more of your site. Sitemaps are not used by Google to penalize or ignore pages. You are not technically hurting your non cms pages but you also are not helping them as much as you could be.

Here are a few tips you might want to try

#1 - Create a secondary static sitemap file for all of your legacy pages. Google lets you have multiple sitemaps per domain and this would help Google discover your legacy pages.

#2 - Make sure your legacy pages have other pages pointing to them. Turn your web into a spider web by embedding links in relevant content. This will help improve Google's crawling of your site and also SEO.

#3 - 301 redirect your outdated pages. Some of your archived content may no longer be correct or severely outdated. Instead of trying to get these pages indexed it might be better to 301 redirect and consolidate the link juice towards more current and relevant pages. Another reason to do this is Google engineers have been hinting that large amounts of poor quality pages is connected with Panda penalty.