|In my opinion, robots.txt should take preference. |
While I can see your point, there is a perfectly valid opposing viewpoint that you have now specifically told the search engine to index that URL.
The real problem is that it is an undocumented situation, that should be documented no matter whether robots.txt or sitemap takes precedence.
I would recommend that you post it to the sitemap newsgroup where a Google engineer is more likely to spot it.
Sorry to be dense - which sitemap newsgroup?
Does the submitted sitemap file need to have a .xml extension?
Reason I'm asking is I have created an .asp file using an asp script.
|Sorry to be dense - which sitemap newsgroup? |
There's a thread there right now about the sitemap / robots.txt conflict.
Edit: Looks like you started it... :)
|Does the submitted sitemap file need to have a .xml extension? |
No - Google will just request the URI that you submit and work out what to make of it.
That said, it is still good practice to make your script issue the Content-Type: text/xml header.
[edited by: dmorison at 9:12 pm (utc) on June 19, 2005]
What are your thoughts on creating a sitemap page and linking to it from all pages within your site. Only difference is that this page will contain actual links as oppossed to xml text elements.
Maybe this would be a good supplement to the Google Sitemap feature.
surfer: as far as I can see that, any extension should work as long as it sends xml. I tried it with a php file that generates it's content dynamically. After submitting it Google says that it is "OK".
"Does the submitted sitemap file need to have a .xml extension?"
I have had a few people report that they receive "not found errors" when the filename has underscores or dashes. Changing the name from ex-sitemap.xml to exsitemap.xml seemed to fix it.
The Google parser appears to be picky. It does seem to require that the server report the correct mime type for xml or gzip.
New at this forum...
wonna ask if anyone had any result from submitting google sitemap, and what do you think about submitting google sitemap for a ban website :-)
I submitted a .asp sitemap and it works great.. most all my pages are now indexed - they weren't before submiting.
I submitted site_map.asp
and it works fine.
How long after your sitemap was downloaded did it take to index your pages? Mine was downloaded 5 hours ago and I don't see any new indexed pages.
Do you have a large site?
What do you use to create the sitemap... and did google index pages from the sitemap that don't have a link to it?
I wonna try sending google sitemap for a new website that don't have a link from anywhere and has not been submitted to any search engine...
what do you think?
That would be useful and it will help sure i have seen the results...
KaMran - White Eagle
Does google post a "crawled" status message after a site's sitemap file has been executed?
Sitemaps are not a way to *make* google crazwl your site. They are just a way for you to tell google about pages that you would like to have crawled. They make no guarantees that they will crawl those pages within any specific timeframe, if ever.
That said, they do seem to do a significant crawl within a couple of days of submitting the sitemap. One new site that only had the home page indexed got 42 of the 68 pages crawled after about a day. The rest of them have just been crawled and indexed over the weekend.
Can anyone post a sample sitemap that I could modify?
|Can anyone post a sample sitemap that I could modify? |
Go to the google sitemaps site and check the docs. They have a sample that you can easily cut and paste.
Or, just make a list of URLs, one per line, in a text file. If you aren't going to add any additional information, then there is little point in making the XML file instead of going with straight text.
do i need to resumbit my sitemap (via https://www.google.com/webmasters/sitemaps/) every time i update my site(and sitemap).
or this button (resubmit)is only, if goolge don't allow ur previous map format.?
>do i need to resumbit my sitemap (via https://www.google.com/webmasters/sitemaps/) every time i update my site(and sitemap).
Yes. Google recommends resubmits on content changes. Currently it works smoothly without resubmits. Googlebot does request sitemaps like a clockwork, and mostly does deep crawls afterwards (at least on popular sites). However, this behavior is not documented, thus you can't rely on it.
I have a small site <300 pages.. it took maybe 2 or 3 days for the new listings to propagate through the various DCs... I suspect it showed up in one of them within 24 hours but I don't know which gets it first.
I submitted an asp program, as I've mentioned. So it is dynamically generating the xml listing everytime G goes and opens it.
I think that is the best way to go.. I don't know what you need on a unix server.. php perhaps.
I am wondering if I should leave it up there with G or remove it until I change something on the website? As it stand now G is rerunning the sitemap at least everyday, if not twice a day.
I used the same PHP script on 5 similar sites (not content) and only 4 were found by googlebot. It's been two days and I have been over and over them all and can't find a reason for the lone miss. It was finding the Python generated sitemap.xml.gz but can't find the PHP generated sitemap.xml file.
Hey guys, I've submitted my sitemap, but it's a short list of links, generated from the home page and listing all the pages that were linked from the main page only. But since most of the links go to my archives, which are themselves linked to EVERY SINGLE PAGE ON MY SITE, would it matter that my sitemap list was small and not all-inclusive of EVERY SINGLE that exists on my site? Or is the idea to submit ALL of your pages at once? (I have quite a bit, over 9,000 plus.)
I hope I made this question clear; it's kinda hard to understand even to me, and I wrote it! Anyways, thanks.
The idea is to list every page on the site including any dynamic ones. For example if you have a product page that changes depending on what product is searched (....mysite. com/product.asp?productID=widgetX)then each instance of this page should get indexed. To be really clear:
Each of the above pages should be considered a distinct and separate page. The site map helps in that it provides the googlebot (who is not really tired as the title suggests) with a hint of what pages exist.
Three days after submitting my sitemap I noticed this morning that google had indexed about a hundred new pages. I just ran another check and saw that the newly indexed pages are now all gone.
Any ideas or similar experiences?
<<The idea is to list every page on the site including any dynamic ones.>>
Thanks. I thought all Google needed was a page or two, and then they would just spider everything that was linked to that page, etc. I'll resubmit a longer list, one with ALL my pages.
So if I submit a sitemap, and it does not have all my pages on it, will pages that are already in the index continue to be spidered and included in the index?
This whole thing is a crock. Next month Google will come out with some other feature for you to feed their hungry engine. Does nobody stop to look at the incredible sums of money Google is making off YOUR websites?
It's either a crying shame or a laughable bad joke the way webmasters so readily sumbit to Google's every advisement.
Suppose next week, Bill Gates tells you all to shut your sites down for two days so that his new whizbang technocrawler can properly index the 'net. Would you do it? I'd wager a few would if promised "higher rankings."
Kinda sad, how shortsighted people are these days.
Yup, I'm pretty short-sighted.
I submitted a new site to froogle, and so far, my only sales have come from my froogle listings. I haven't even gotten to the point of buying any ads yet, and the site is in the black. Somehow, I seem to be confused here because it seems to me that I made money off Google in this case, not the other way around.
That site also only had the home page crawled. Then I submitted a sitemap and the next day 45 pages were crawled. The next day those pages were in the index. By the end of the week, all pages were crawled and indexed.
On another site, I created a subdomain yesterday, and submitted the sitemap this morning. The sitemap was read within an hour. Crawling has begun.
Given past experience, I would not expect such fast response, and would not expect either site to be fully indexed any time soon.
If you are so bothered by Google making money off your site, block them. I on the other hand, like the traffic they send me, so I will take advantage of every tool they offer to help get indexed faster.
I tried using a Google sitemap generator for a site of mine (the tool worked great for another site I have), but it only listed the home page and no other pages. What sort of things would be keeping a tool like this from crawling my site?
I'm so disappointed. I submitted my sitemap 5 days ago. Googlebot has crawled over 6 megs of my site and STILL no appearance in the index. *sigh* By comparison MSN has practically crawled the entire site in the first week of me being online. But it's almost a month and Google still hasn't bothered to index me. Grr. What's it going to take?
| This 188 message thread spans 7 pages: < < 188 ( 1 2 3 4 5  7 ) > > |