Forum Moderators: Robert Charlton & goodroi
Seems strange that Google are introducing this service now and not try to improve googlebot when following links.
[google.com...]
But what I get when trying to submit to google sitemaps again is the same error
"We were unable to access the URL you provided due to a restriction in robots.txt. Please make sure the robots.txt file and the Sitemap URL are correct and resubmit your Sitemap"
I already uploaded the new robots.txt file ... what can I do now?
How do I invite all bots again to crawl my site?
I have a free Google Sitemap Validation service at [nodemap.com...]
I know there are excellent web based XML validators available, but I built this software
specifically for validating your sitemap.xml *or* compressed
sitemap.xml.gz file. Hopefully it is straightforward and easy.
This service allows you to validate your Google Sitemap XML files. Your
file may optionally be gzip compressed. Each report you generate may be
stored in your account. You have the option to send each report via
email to the recipient you specify. There is also a quick-help feature
that allows you to ask a technical question, or make a comment, about
your report.
+ works with text/xml content-type
+ checks and reports UTF-8 Byte Order Mark
+ converts the file to unix line terminators if necessary
+ re-encodes the xml file to UTF-8 if the file isn't UTF-8 (*see note)
+ gzips xml files
+ better error handling on web server redirects.
+ shows line numbers against your xml file if the file doesn't
validate.
Take care,
Waitman
Added Site Map Entire Dropped from Google
[groups-beta.google.com...]
Don't think I will touch this one. I have a comprehensive sitemap onsite that has been picked up by google numerous times.
The way I read it is even minor site changes means you have to remake and or resubmit a new sitemap....who needs the trouble?
Ann
Is that ok?
Also, do I have to list every URL on each page that I want google to spider? If I have deep links, will google see them if I give the URL of the page the deep links are on.
Is google asking me to give it every URL I want spidered, even those several levels down?
>creating a sitemap page www.mydomain.com/sitemap.html
Always a good idea (but has nothing to do with Google SiteMaps)
Learn more about Google SiteMaps here:
[google.com...]
(The link in my profile leads you to a Google SiteMap tutorial)
>Also, do I have to list every URL on each page that I want google to spider?
Yes, put the URLs of all pages you want Google to crawl in your sitemap XML file
>Is google asking me to give it every URL I want spidered, even those several levels down?
Especially those
Here are some links to sitemap generators:
[code.google.com...]
I've uploaded 48,500 URLs (oh, before you call me a spammer - this is a local professional directory site) in xml.gz Google sniffed around - then replied: "Denied URLs" - and listed [subdomain.mysite.com...]
This does seem to work for the sitemaps I've done, with each folder having it's own sitemap which is submitted to Google.
Anyone know if this is o.k to do or would it be better to combine all these sitemaps into one big file in the root directory?
I think I am just going to submit each URL manually (I'm up to about 50...made 40 in the last few days)...
It took them 22 hours and my sitemap was still pending, so I'll give it another shot when I hear how 30 to 50%+ of webmaster submit sitemaps routinely. I've never been an early adopter.
1.Your URLs must not include embedded newlines.
2.You must fully specify URLs because Google tries to 3.crawl the URLs exactly as you provide them.
4.Your sitemap files must use UTF-8 encoding.
5.Each sitemap file must have no more than 50,000 URLs.
What do they mean by embedded new lines and UTF-8 encoding? How do I ensure i have UTF-8 encoding?
"What do they mean by embedded new lines and UTF-8 encoding? How do I ensure i have UTF-8 encoding? "
the URL cannot be on more than one line. One line per URL and one URL per line.
UTF-8 encoding is unicode, more characters are available than other encodings (windows 1258, etc) If you are a graphics person, it can be thought of "like" the difference between a 16 color palette and a 65,536 color palette.
You should use an editor that allows you to save in Unicode/UTF-8 such as Notepad (Windows) or BBEdit (Mac). Notepad will insert a Byte Order Mark (BOM) at the beginning of the file to signify that it is UTF-8, which may appear to be odd characters if you look at it in something else.
If you are creating a script then just use an encoding function on your output.
Take care,
Waitman
I still think the best ay is to create a sitemap yourself using ASP or PHP
Google has a link to a PHP third party solution that for me is way better than the Pyhton script. [code.google.com...]
Since there is no "physical" subdomain directory, I placed the sitemap in the root for domain mysite.com.
Well then, you have to figure out some way to serve them up separate sitemap files for each subdomain then.
As far as the rest of the world is concerned, different subdomains are different machines, possibly under the control of different people. You only let a machine tell you about themselves.
How do I ensure i have UTF-8 encoding?
The simple answer is that if you are using only characters generally available in english (which is what is used for the vast majority of URLS) you can use any old text editor.
If you want to use any other languages or special symbols, you have to verify that your editor outputs in UTF-8.
It's a very distrustful world we live in. :) I've split up the universal sitemap into separate subdomain specific sitemaps and submitted them separately under each subdomain. Thanks BigDave.
And the 2nd thing: I'm using php and mysql to generate the sitemap. Php creates sitemap files in all my directories - but I don't know if they are utf-8 encoded. Since it's all done automatically (fopen() etc..), I don't save those files in an editor. So, how to make sure my sitemaps use utf-8 encoding?
Thanks.
do I need to place my sitemap index in a root dir or in the sub.mydomain.net?
If the URLs that you want crawled is on sub.mydomain.net, then you have to serve that to Google from sub.mydomain.net.
If you have URLs that you want crawled on both sub.mydomain.net and www.mydomain.net, then you will need 2 different sitemaps.
And the 2nd thing: I'm using php and mysql to generate the sitemap. Php creates sitemap files in all my directories - but I don't know if they are utf-8 encoded. Since it's all done automatically (fopen() etc..), I don't save those files in an editor. So, how to make sure my sitemaps use utf-8 encoding?
PHP strings are in ASCII. ASCII and UTF-8 overlap for the first 128 characters.
If you are using only the following characters, you will have no problems.
A-Z
a-z
0-9
!"#$%&'()*+,-./:;<=>?@[\]^_`{¦}~
Otherwise, you can take a look at the utf8_encode function.
You would have been able to find this all out for yourself in a few minutes by checking the PHP manual, and doing a search on UTF-8 on the web.
included in sitemap:
www.mysite.com/testdir/index.html
included in robots.txt:
Disallow: /testdir/
Googlebot still grabs /testdir/index.html and a search on site:mysite.com shows /testdir/index.html.
In my opinion, robots.txt should take preference.