Welcome to WebmasterWorld Guest from 220.127.116.11
Seems strange that Google are introducing this service now and not try to improve googlebot when following links.
In my opinion, robots.txt should take preference.
While I can see your point, there is a perfectly valid opposing viewpoint that you have now specifically told the search engine to index that URL.
The real problem is that it is an undocumented situation, that should be documented no matter whether robots.txt or sitemap takes precedence.
I would recommend that you post it to the sitemap newsgroup where a Google engineer is more likely to spot it.
Sorry to be dense - which sitemap newsgroup?
There's a thread there right now about the sitemap / robots.txt conflict.
Edit: Looks like you started it... :)
Does the submitted sitemap file need to have a .xml extension?
No - Google will just request the URI that you submit and work out what to make of it.
That said, it is still good practice to make your script issue the Content-Type: text/xml header.
[edited by: dmorison at 9:12 pm (utc) on June 19, 2005]
I have had a few people report that they receive "not found errors" when the filename has underscores or dashes. Changing the name from ex-sitemap.xml to exsitemap.xml seemed to fix it.
The Google parser appears to be picky. It does seem to require that the server report the correct mime type for xml or gzip.
Sitemaps are not a way to *make* google crazwl your site. They are just a way for you to tell google about pages that you would like to have crawled. They make no guarantees that they will crawl those pages within any specific timeframe, if ever.
That said, they do seem to do a significant crawl within a couple of days of submitting the sitemap. One new site that only had the home page indexed got 42 of the 68 pages crawled after about a day. The rest of them have just been crawled and indexed over the weekend.
Can anyone post a sample sitemap that I could modify?
Go to the google sitemaps site and check the docs. They have a sample that you can easily cut and paste.
Or, just make a list of URLs, one per line, in a text file. If you aren't going to add any additional information, then there is little point in making the XML file instead of going with straight text.
Yes. Google recommends resubmits on content changes. Currently it works smoothly without resubmits. Googlebot does request sitemaps like a clockwork, and mostly does deep crawls afterwards (at least on popular sites). However, this behavior is not documented, thus you can't rely on it.
I submitted an asp program, as I've mentioned. So it is dynamically generating the xml listing everytime G goes and opens it.
I think that is the best way to go.. I don't know what you need on a unix server.. php perhaps.
I am wondering if I should leave it up there with G or remove it until I change something on the website? As it stand now G is rerunning the sitemap at least everyday, if not twice a day.
I hope I made this question clear; it's kinda hard to understand even to me, and I wrote it! Anyways, thanks.
Each of the above pages should be considered a distinct and separate page. The site map helps in that it provides the googlebot (who is not really tired as the title suggests) with a hint of what pages exist.
It's either a crying shame or a laughable bad joke the way webmasters so readily sumbit to Google's every advisement.
Suppose next week, Bill Gates tells you all to shut your sites down for two days so that his new whizbang technocrawler can properly index the 'net. Would you do it? I'd wager a few would if promised "higher rankings."
Kinda sad, how shortsighted people are these days.
I submitted a new site to froogle, and so far, my only sales have come from my froogle listings. I haven't even gotten to the point of buying any ads yet, and the site is in the black. Somehow, I seem to be confused here because it seems to me that I made money off Google in this case, not the other way around.
That site also only had the home page crawled. Then I submitted a sitemap and the next day 45 pages were crawled. The next day those pages were in the index. By the end of the week, all pages were crawled and indexed.
On another site, I created a subdomain yesterday, and submitted the sitemap this morning. The sitemap was read within an hour. Crawling has begun.
Given past experience, I would not expect such fast response, and would not expect either site to be fully indexed any time soon.
If you are so bothered by Google making money off your site, block them. I on the other hand, like the traffic they send me, so I will take advantage of every tool they offer to help get indexed faster.