Welcome to WebmasterWorld Guest from 18.207.136.184

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Sitemaps

Googlebot getting tired?

     
1:13 am on Jun 3, 2005 (gmt 0)

New User

10+ Year Member

joined:Nov 30, 2003
posts:10
votes: 0


I found an interesting new service from Google called Google sitemaps (I haven't seen this mentioned elsewhere. Seems you can give Googlebot a helping hand if some pages are not getting indexed?

Seems strange that Google are introducing this service now and not try to improve googlebot when following links.

[google.com...]

7:43 pm on June 15, 2005 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:13012
votes: 222


Johan, I have other sites with hundreds and thousands of pages, but I'm not going to make the effort on those sites till I find out if it's worth the time it takes to do it, and won't damage my current listings/spidering. I don't necessarily expect that the sitemap feature will do any better; what I DON'T expect is that it will do worse than before I created and submitted the file, which is what is seeming to be the case.
8:09 pm on June 15, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 20, 2003
posts:1741
votes: 0


I have a problem with my robots.txt and sitemaps.
Today tried to upload my xml sitemap and then discovered that robots.txt file is banning all bots!
So I changed that using a known working and allowing robots.txt file.

But what I get when trying to submit to google sitemaps again is the same error

"We were unable to access the URL you provided due to a restriction in robots.txt. Please make sure the robots.txt file and the Sitemap URL are correct and resubmit your Sitemap"

I already uploaded the new robots.txt file ... what can I do now?
How do I invite all bots again to crawl my site?

6:34 am on June 16, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 18, 2002
posts:154
votes: 0


Check for rewrites (.htaccess...). Try to delete the robots.txt for a while and wait for the bots. Should work if offsite links point to your site. Also, use a sitemap validator to check your XML (http://code.google.com/sm_thirdparty.html links to free sitemap tools) and the HTTP response (should be 200).
6:13 pm on June 17, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 11, 2003
posts:73
votes: 0


Hello,

I have a free Google Sitemap Validation service at [nodemap.com...]

I know there are excellent web based XML validators available, but I built this software
specifically for validating your sitemap.xml *or* compressed
sitemap.xml.gz file. Hopefully it is straightforward and easy.

This service allows you to validate your Google Sitemap XML files. Your
file may optionally be gzip compressed. Each report you generate may be
stored in your account. You have the option to send each report via
email to the recipient you specify. There is also a quick-help feature
that allows you to ask a technical question, or make a comment, about
your report.

+ works with text/xml content-type
+ checks and reports UTF-8 Byte Order Mark
+ converts the file to unix line terminators if necessary
+ re-encodes the xml file to UTF-8 if the file isn't UTF-8 (*see note)
+ gzips xml files
+ better error handling on web server redirects.
+ shows line numbers against your xml file if the file doesn't
validate.

Take care,

Waitman

ann

8:34 pm on June 17, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 25, 2002
posts:2605
votes: 0


This was posted on their discuss sitemaps section:

Added Site Map Entire Dropped from Google
[groups-beta.google.com...]

Don't think I will touch this one. I have a comprehensive sitemap onsite that has been picked up by google numerous times.

The way I read it is even minor site changes means you have to remake and or resubmit a new sitemap....who needs the trouble?

Ann

11:00 pm on June 17, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 27, 2003
posts:637
votes: 1


Ok, here's the short story. One of my sites was down for 4 days a while ago when gbot decided to crawl it. I lost over 10k pages and was down to about 300 in the index. I did the sitemap, and now I'm back up and having more pages added, and traffic is increasing.
11:51 pm on June 17, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 1, 2004
posts:1987
votes: 0


Here's a shorter story... created my site_map.asp and give it to G. Now my MSN visitors are increasing by several 1000%. Could msn be looking at my site_map.asp?
1:25 am on June 18, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


is this the next goldrush now? google sitemaps turn MSN into Google2?
6:28 am on June 18, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 20, 2003
posts:167
votes: 0


If Google introduced an autodiscovery mechanism like RSS, there will be much lesser work for the webmasters and Google to identify the sitemaps.

Of course, we will miss the stats in that case.

6:36 am on June 18, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 3, 2005
posts:72
votes: 0


Sorry if this is a dumb question. I am a marketer more than a computer "geek" so I am sending google the URLs of my main pages in a text file format, and creating a sitemap page www.mydomain.com/sitemap.html.

Is that ok?

Also, do I have to list every URL on each page that I want google to spider? If I have deep links, will google see them if I give the URL of the page the deep links are on.

Is google asking me to give it every URL I want spidered, even those several levels down?

12:02 pm on June 18, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 18, 2002
posts:154
votes: 0


>I am sending google the URLs of my main pages in a text file format
Use this form (has nothing to do with Google SiteMaps):
[google.com...]

>creating a sitemap page www.mydomain.com/sitemap.html
Always a good idea (but has nothing to do with Google SiteMaps)

Learn more about Google SiteMaps here:
[google.com...]
(The link in my profile leads you to a Google SiteMap tutorial)

>Also, do I have to list every URL on each page that I want google to spider?
Yes, put the URLs of all pages you want Google to crawl in your sitemap XML file

>Is google asking me to give it every URL I want spidered, even those several levels down?
Especially those

Here are some links to sitemap generators:
[code.google.com...]

12:27 pm on June 18, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 5, 2005
posts:123
votes: 0


Any ideas on how a Themed Canonical Stucture can be correctly represented with google sitemaps?
12:37 pm on June 18, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 18, 2002
posts:154
votes: 0


Flat. You can export your URLs recursively to keep them in hierarchical order, but the given output format is flat. Thus I won't bother with the hierarchical listing, which burns more resources than a sequential output ordered by lastModification descending, if that attribute is indexed.
2:00 am on June 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 25, 2004
posts:81
votes: 0


They don't seem to like subdomains.

I've uploaded 48,500 URLs (oh, before you call me a spammer - this is a local professional directory site) in xml.gz Google sniffed around - then replied: "Denied URLs" - and listed [subdomain.mysite.com...]

3:36 am on June 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 31, 2002
posts:73
votes: 0


I made my XML site map manually.

2 hours later the bot went nuts spidering anything and everything on the FTP. I know some other people who only mapped out the real major pages in there site and submitted the map, same thing happened to them but the crawl took a little longer.

5:08 am on June 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 3, 2005
posts:72
votes: 0


..18 hours...still pending...is that normal?
6:51 am on June 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 20, 2003
posts:167
votes: 0


Yes. Sometimes Google updates the status report after a delay.

And you might feel shocked to see an update now saying submitted 18 hours ago, downloaded 18 hours ago ;)

7:08 am on June 19, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member bigdave is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 19, 2002
posts:3454
votes: 0


They don't seem to like subdomains.

Was the sitemap on the subdomain?

I would assume that you could only have the sitemap on the exact domain being served, just the same as a robots.txt.

7:29 am on June 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 25, 2004
posts:81
votes: 0


Since there is no "physical" subdomain directory, I placed the sitemap in the root for domain mysite.com. The sitemap included the URLs for both www.mysite.com and [subdomain.mysite.com....] They denied [subdomain.mysite.com....]
7:46 am on June 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 12, 2005
posts:64
votes: 0


At the moment I'm planning to add a sitemap for each folder of my domain.

This does seem to work for the sitemaps I've done, with each folder having it's own sitemap which is submitted to Google.

Anyone know if this is o.k to do or would it be better to combine all these sitemaps into one big file in the root directory?

8:22 am on June 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 3, 2005
posts:72
votes: 0


I've decided to re-do my front page so it acts as a site map and is easier for surfers and robots to follow. It seems like some have had success so far, but the idea of being "flagged" for some reason by Google still runs thru the back of my mind....

I think I am just going to submit each URL manually (I'm up to about 50...made 40 in the last few days)...

It took them 22 hours and my sitemap was still pending, so I'll give it another shot when I hear how 30 to 50%+ of webmaster submit sitemaps routinely. I've never been an early adopter.

9:07 am on June 19, 2005 (gmt 0)

Full Member

10+ Year Member Top Contributors Of The Month

joined:June 3, 2005
posts:298
votes: 12


I still think the best ay is to create a sitemap yourself using ASP or PHP etc… unless you have a non db site. Its very easy to do and I not even a programmer!
12:43 pm on June 19, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Apr 5, 2004
posts:325
votes: 0


There's a couple of points regarding the simple text file sitemap i want to clear up before i submit my sitemap. Google states,


1.Your URLs must not include embedded newlines.
2.You must fully specify URLs because Google tries to 3.crawl the URLs exactly as you provide them.
4.Your sitemap files must use UTF-8 encoding.
5.Each sitemap file must have no more than 50,000 URLs.

What do they mean by embedded new lines and UTF-8 encoding? How do I ensure i have UTF-8 encoding?

1:02 pm on June 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 11, 2003
posts:73
votes: 0



Hello,

"What do they mean by embedded new lines and UTF-8 encoding? How do I ensure i have UTF-8 encoding? "

the URL cannot be on more than one line. One line per URL and one URL per line.

UTF-8 encoding is unicode, more characters are available than other encodings (windows 1258, etc) If you are a graphics person, it can be thought of "like" the difference between a 16 color palette and a 65,536 color palette.

You should use an editor that allows you to save in Unicode/UTF-8 such as Notepad (Windows) or BBEdit (Mac). Notepad will insert a Byte Order Mark (BOM) at the beginning of the file to signify that it is UTF-8, which may appear to be odd characters if you look at it in something else.

If you are creating a script then just use an encoding function on your output.

Take care,

Waitman

1:05 pm on June 19, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 19, 2004
posts:562
votes: 0



I still think the best ay is to create a sitemap yourself using ASP or PHP

Google has a link to a PHP third party solution that for me is way better than the Pyhton script. [code.google.com...]

4:03 pm on June 19, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member bigdave is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 19, 2002
posts:3454
votes: 0


Since there is no "physical" subdomain directory, I placed the sitemap in the root for domain mysite.com.

Well then, you have to figure out some way to serve them up separate sitemap files for each subdomain then.

As far as the rest of the world is concerned, different subdomains are different machines, possibly under the control of different people. You only let a machine tell you about themselves.

How do I ensure i have UTF-8 encoding?

The simple answer is that if you are using only characters generally available in english (which is what is used for the vast majority of URLS) you can use any old text editor.

If you want to use any other languages or special symbols, you have to verify that your editor outputs in UTF-8.

5:24 pm on June 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 25, 2004
posts:81
votes: 0


> As far as the rest of the world is concerned, different subdomains are different machines, possibly under the control of different people. You only let a machine tell you about themselves.

It's a very distrustful world we live in. :) I've split up the universal sitemap into separate subdomain specific sitemaps and submitted them separately under each subdomain. Thanks BigDave.

5:51 pm on June 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 4, 2004
posts:84
votes: 0


Guys,
regarding subdomains..
I'm using a 301 redirect from www.mydomain.net to sub.mydomain.net.. I've got a dir called "sub.mydomain.net" (which acts as a subdomain along with an .htaccess file) - and.. do I need to place my sitemap index in a root dir or in the sub.mydomain.net?

And the 2nd thing: I'm using php and mysql to generate the sitemap. Php creates sitemap files in all my directories - but I don't know if they are utf-8 encoded. Since it's all done automatically (fopen() etc..), I don't save those files in an editor. So, how to make sure my sitemaps use utf-8 encoding?

Thanks.

6:48 pm on June 19, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member bigdave is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 19, 2002
posts:3454
votes: 0


do I need to place my sitemap index in a root dir or in the sub.mydomain.net?

If the URLs that you want crawled is on sub.mydomain.net, then you have to serve that to Google from sub.mydomain.net.

If you have URLs that you want crawled on both sub.mydomain.net and www.mydomain.net, then you will need 2 different sitemaps.

And the 2nd thing: I'm using php and mysql to generate the sitemap. Php creates sitemap files in all my directories - but I don't know if they are utf-8 encoded. Since it's all done automatically (fopen() etc..), I don't save those files in an editor. So, how to make sure my sitemaps use utf-8 encoding?

PHP strings are in ASCII. ASCII and UTF-8 overlap for the first 128 characters.

If you are using only the following characters, you will have no problems.

A-Z
a-z
0-9
!"#$%&'()*+,-./:;<=>?@[\]^_`{¦}~

Otherwise, you can take a look at the utf8_encode function.

You would have been able to find this all out for yourself in a few minutes by checking the PHP manual, and doing a search on UTF-8 on the web.

7:55 pm on June 19, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 15, 2002
posts:520
votes: 0


Interesting - I just found that sitemap.xml supercedes robots.txt. For example:

included in sitemap:
www.mysite.com/testdir/index.html

included in robots.txt:
Disallow: /testdir/

Googlebot still grabs /testdir/index.html and a search on site:mysite.com shows /testdir/index.html.

In my opinion, robots.txt should take preference.

This 188 message thread spans 7 pages: 188
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members