Forum Moderators: Robert Charlton & goodroi
Many of us non-techie folk rely totally on 'off the shelf' sitemap generators. This may be where the problem lies.
One I've used only includes pages to level two; rather missing the point that sitemaps are to help get those deep pages Googled!
Another does a beautiful job from a well designed, professional looking site - but often (not always) omits the last few lines of the map. Not helpful.
A third does a great job. Usually. But it seems to have a minor allergy, and repeatedly claimed errors that simply did not exist - very frustrating until I used two other programs, and have had no problems since.
Clues to errors:
1. Makes it too fast - probably missing chunks.
2. Fails to report the number of pages that you know you have (don't you!)
There's no substitute for a visual check, however.
All three of these programs are listed on Google's website.
When I first looked at it I could not understand why they were plugging something then telling you to go off and download software from god knows who!
Given their resources it should simply be 'click here to create your site map'
Half a job in my opinion!
Given their resources it should simply be 'click here to create your site map'
The reason for using a sitemap is to tell Google what to crawl and where to find it. If you don't think Google can crawl your site adequately without help, why would you trust Google to crawl your site for a sitemap?
[edited by: europeforvisitors at 5:50 pm (utc) on Aug. 6, 2006]
One thing I noted, if your html coding is crappy or a "cowboy code" these automated programs will miss links.
The same will happen with regular spiders like Googlebot. If a sitemap generator is not capable of reading the HTML soup on your site, you shouldn't expect Googlebot to index it well either. This is where the basic problem lies with sitemap generators. Sites with a clean HTML structure and linkpaths won't have that much problems with indexing.
My experience is that a good link checker capable of generating sitemaps is far better than using a generator which can generate sitemaps only. A link checker is therefore the start for generating a sitemap.
First run a link checker over your site until no internal link errors are found and no orphan pages exist. Orphan pages can be put in a sitemap file for Google to index them, but chances are rare that they will rank because of no incomming links.
Secondly use the link checker to check the link depth of the pages from the main source of pagerank (mostly the homepage). Lack of pagerank is often the reason that Googlebot doesn't index specific pages or pagetrees. Pagerank dillutes when pages are many steps away from the pagerank source. Decreasing the number of steps from the homepage to the pages you want to index may help. Again, you can add these pages in a sitemap file and Google might index them, but they probably won't rank.
Third, check the output of the link checker for duplicate contents. Do all links display under the same type of URL, or do you see URL types you didn't know they were there. next and previous links in some forum software kan generate these strange URLs, or printer-friendly outputs. Remove these URLs or at least make them harmless with a dynamic generated robots meta tag. I ran a link checker on a dynamically generated site with a common used CMS system where I thought I had all URLs rewritten in the .htaccess. There were however still several hunderds "strange" URLs popping up from all kinds of deep pages. Rewrite these URLs, until you are sure every piece of content can only be accessed via one unique URL and is only referenced in your site with that specific URL.
Step four is the generation of the sitemap. But for many sites, after you completed the first three steps, you don't need the sitemap file anymore because Googlebot can find its way through the site on its own.
xenu link slueth is a very large help
Yes, but it doesn't accurately count the path depth of a specific page to the homepage. The level it shows is the level of the first reference of a page it encountered, but not necessarily the lowest possible level. If you run the program several times, you may see different level values for a specific URL with each run. Other link checkers are better in level counting but worse in other fields.
But back to the subject of the thread: it does generate a list of URLs which can be the base of a Google sitemap.
I launched on July 1, I have 33 back links in MSN 167 in Y and 0 in G plus MSN has indexed just over 300 pages and Y 52. Yet G only 1.
Despite the Googlebot coming every week on the money then posting it crawled on Aug 5 that it came on AUG 2!.
Then it leaves the same info nothing changed during the time saying 1 URL cant be found when its there and one bag on nonsense.
I have a massive site and it took me only a few days to get hooked in DMOZ. And yet to no avail G will only index my home page.
Whats the deal, Ive heard people say that they have dumped their sitemap account and then G indexes there pages how weird is that. Plus I heard G does not count a IBL if its just a few weeks OLD only.
Is this for real, that means when the next update comes I am...out of luck then.
I BLAME GOOGLE. Why put out a system if it is not working properly and can invariably do more harm than good AND guys here cuss out Microsoft. There might be a cost difference but good hype drives up stock value and keeping in the news with new innovative things drives up the hype. So what is the real purpose of G sitemaps.
My thoughts....
If you don't think Google can crawl your site adequately without help, why would you trust Google to crawl your site for a sitemap?
I would trust Google's technical know-how a great deal more than I would some programmer unknown, possibly sat in a bedroom somewhere!
My main point is they are promoting a service that, when visited, cannot be easily used by many.
If it requires a site map generator to use then they should supply one of their own.
You don't need the brains of a Google programmer to see that!
I would trust Google's technical know-how a great deal more than I would some programmer unknown, possibly sat in a bedroom somewhere!
In that case, why not trust Googlebot to crawl your site without a sitemap?
Google sitemaps are simply an option for people who don't trust Googlebot to get the job done without help from an outside source.
I launched on July 1, I have 33 back links in MSN 167 in Y and 0 in G plus MSN has indexed just over 300 pages and Y 52. Yet G only 1.
Google's back link search only shows a random sample of the backlinks that Google has actually found and is using to rank the site. In other words, there is no way of knowing how many backlinks Google knows about.
If it requires a site map generator to use then they should supply one of their own.
The entire point of the sitemaps project is for site owners to build and check a complete list of their site's pages and assign crawl priorities. It wouldn't make a difference if Google provided the tool if site owners aren't doing the checking part with the current tools.
Google sitemaps are simply an option for people who don't trust Googlebot to get the job done without help from an outside source.
No, it's more than that... It's a way to set page priorities (including, I would think, if it's not in the list it's not important), it's a single place for Google to check to see what's new and what's changed.
All in all it's a pretty great tool for those of us with dynamic web sites - and that's the only context I use it in. If the site rarely changes (e.g. a brochureware site), I don't know that it would be all that valuable.
Also the site which is 3 months old, was fully indexed and all pages were out of the supplemental index 2 weeks ago. But only 3 days ago, half of pages where dropped and the vast majority went back to the supplemental index. I wonder whether this is because I use Sitemaps, or is there something else I am doing wrong.
It's my understanding that the Google Sitemaps service is for pages that are hidden from crawlers due to links embedded in Flash, Javascript or other client-side code.
The sitemap generator should be built in to your web site's content management system. It shouldn't be a seperate piece of software that crawls your site.
Google have done a good job of creating a simple, yet effective way of submitting sitemaps to Google. It's not complicated to program a sitemap creator to complient your existing web sitesback-end software. I can't see an off-the-shelf package could be of any real use, it's not a one-size-fits-all problem.