|sitemap.xml still have problems|
Now I have a fixed all errors but on the sitemap.xml ,I see it says all over the place, priority blah blah
I took that out, and I got a message from google earlier saying to let google decide when it will crawl, that's fine by me
I have to do something with settingsand I don't know how to locate it in google webmaster tools, anyhelp
|The crawl rate affects the speed of Googlebot's requests during the crawl process. It has no effect on how often Googlebot crawls your site. Google determines the recommended rate based on the number of pages in your site. |
Lucy, I found it, thank you
now on the same page it says:
Don't set a preferred domain
Display URLs as www.mysite
Display URLs as my site
which one should I choose?
Either with or without www according to personal preference. And redirect the same way in your htaccess (or let the host do it for you). Otherwise you get the dreaded Duplicate Content.
Amy, I'd pick the version of your URLs that already show the most in your site: operator results. If you're already indexed one way, for the most part, and then you ask Google to change to the other way it can cause a disruption in your traffic... at least it used to.
Also as Lucy recommended, make sure your .htaccess redirect goes in the same direction.
I grew weary messing around with xml sitemaps.
I now just support 2 sitemaps -
One sitemap.html for my visitors, and
one sitemap.txt for Google and Bing - just a list of alphabetized URLs.
Google gobbles it up from Webmaster Tools within minutes of submission, and indexes new pages within a day or two.
On one occasion, Google gobbled it up INSTANTLY!
It takes days and days for Bing just to download the list.
|one sitemap.txt for Google and Bing - just a list of alphabetized URLs. |
If we want to utilize this same method, would you put a link somewhere on one of the primary pages so Google can find sitemap.txt? Or is it better to put a link to it on sitemap.html? And does it matter that it be named "sitemap.txt"?
|... would you put a link somewhere on one of the primary pages so Google can find sitemap.txt |
I don't think that is even necessary. If you go Webmaster Tools, and click on "Site Configuration" at the top, and then "Sitemaps", you can input your site map directly.
I don't know how strict they are about a naming convention. I would imagine that /site-map.txt might be OK. But why try to name it something obscure, unless you are trying to hide it. /qwerty1.txt might even be OK, since you are telling them the name, and that it is a site map. But I am not sure about that. If someone really wants my sitemap as text, they can always copy my html site map, and strip it down.
I did remove a link to sitemap.txt from my sitemap.html file, because only Google and Bing have a need to see it. So, I have NO link to sitemap.txt from my site.
[edited by: Sally_Stitts at 2:14 am (utc) on Oct 11, 2011]
If you're submitting it manually to gwt or listing it in your robots.txt you can probably call it just about anything. If you want search engines to find it by blind luck, better stick with sitemap.txt. Or sitemap.xml.
https://www.google.com/support/webmasters/bin/answer.py?answer=183668 * sez
|For best results, follow these guidelines: |
* You must fully specify URLs, as Google attempts to crawl them exactly as provided.
* The text file must use UTF-8 encoding.
* The text file should contain nothing but the list of URLs.
* You can name the text file anything you wish. Google recommends giving the file a .txt extension (for instance, sitemap.txt).
If you are based in a country that uses Roman script you can ignore the UTF-8 business, because your URLs will naturally not include any non-ASCII characters. (Titles, sure. Filenames, nuh-uh.)
* I checked in a different browser because I got the URL from "inside". You don't have to be signed in to GWT to see the page. And I have no idea why it didn't obfuscate :(
Thanks Sally ~ will look into this deeper. The reason I asked is because of the discrepancy that I sometimes see in GWT between the number of URLs in my (long ago) submitted sitemap.xml, and the number of URLs that they show in the index. For some unexplained reason, the latter is occasionally much lower than the former. I don't care if they don't match exactly, but if there are 50 submitted and only 20 in their index, then something is amiss. And I hasten to add that the missing pages are not dupes and are linked consistently into other primary pages. It's yet one more mystery when dealing with Google...
|You can name the text file anything you wish. |
Ahh, someone with the facts. There you have it.
Last week I broke down, and updated my "Master" index. I too, became concerned that my filecounts were flaky, depending upon where I looked. It was painful, but I found about 5 problem files (out of ~350). Now, everything agrees - masterfile, sitemap.html and sitemap.txt. I feel better, when I THINK I know what I am doing - ha-ha.
Now, if I could only find out which 2 files Google is NOT indexing - I haven't figured out how to do that yet. But it is only 2 files, so the heck with it.
|if I could only find out which 2 files Google is NOT indexing |
What I did was to use site:domainname.com as my Google query, then I manually copied each returned address into a text file, sorted them, and compared to my own list. After doing all that (it was a considerably smaller site than yours!), then I went into GWT and did a Fetch as Googlebot for each of the missing URLs. That process just took place in the past few days and so far, nothing has changed so I cannot say definitely that the problem is solved.