Welcome to WebmasterWorld Guest from

Message Too Old, No Replies

Google Sitemaps

Googlebot getting tired?



1:13 am on Jun 3, 2005 (gmt 0)

10+ Year Member

I found an interesting new service from Google called Google sitemaps (I haven't seen this mentioned elsewhere. Seems you can give Googlebot a helping hand if some pages are not getting indexed?

Seems strange that Google are introducing this service now and not try to improve googlebot when following links.



4:00 am on Jun 7, 2005 (gmt 0)

10+ Year Member

If you have less than 100 pages on a site, there's a good one you can do from the browser- it came from google groups also- <snip>

I've tried several- some take forever, some hang, some don't work- that one performs well


[edited by: lawman at 2:56 pm (utc) on June 8, 2005]
[edit reason] No tools please [/edit]


7:49 am on Jun 7, 2005 (gmt 0)

10+ Year Member

hi all,

i am using (for the moment) the lame version of Google sitemaps (thatís the .txt option). Creating a full list of my 'important' content takes just a little more time.

I saw my Google ranking rise from 6 to 7 within one day after i put the Google sitemap (.txt) online. I had some big troubles with my 'mother site' and underlying 'daughter sites'. Now I can point out in one simple way all my content what was normally for the Google bot to deep to find.

So yeah I am working now on the xml version for even better coverage of the Google sitemaps.



1:28 pm on Jun 7, 2005 (gmt 0)

10+ Year Member

In submitting .txt file, i just want to verify if all that i need to do is list all of my URLS in notepad, then save it as .txt file, then upload it to my website. Does it already satisfy UTF-8 encoding?


1:37 pm on Jun 7, 2005 (gmt 0)

10+ Year Member

utf8 is a must, otherwise it wouldnot work.


3:08 pm on Jun 7, 2005 (gmt 0)

10+ Year Member

Do you know guys a PHP/MYSQL script that can automatically make a sitemap.xml file once i update my website, which is also using PHP/MYSQL, so that i dont need to submit my sitemap to google everytime i update my site? Thanks a lot.


5:54 pm on Jun 7, 2005 (gmt 0)

10+ Year Member

I'd like to comment that I have a > 5000 page site about 6 months old which had not ever been deepcrawled by Gbot. This weekend I created and uploaded a sitemap. Gbot has been sniffing daily at the sitemap.xml file, and a few hours ago boom deep crawl. Thanks for the tip GG!


6:09 pm on Jun 7, 2005 (gmt 0)

10+ Year Member

Hi guys I noticed the referrals comming from this site, glad I found you

Can you please use the domain name <snip> to generate xml sitemaps as the indext.php is a testing script (that I acidently left live and linked :)

If you can edit the previous posts to reflect this that would be great (I have removed that script for the minute so its a dead link.



[edited by: lawman at 2:58 pm (utc) on June 8, 2005]
[edit reason] No Url Drops Please [/edit]


6:25 pm on Jun 7, 2005 (gmt 0)

10+ Year Member

I'm still trying to get my text sitemap through without a parsing error.

I can't find anywhere it specifies whether it caan be .txt or if it should be .xml

Can anyone who's been successful with their text file sitemap sticky me with the first 10 lines or so of it?



6:39 pm on Jun 7, 2005 (gmt 0)

10+ Year Member

Rather than putting in a list of urls if you can't auto-generate a site maps would it not be easier to just copy the xml structure and edit it by hand? Here's what I mean:

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.google.com/schemas/sitemap/0.84">

Simply paste that into notepad, paste a new <url></url> pair for each url and tweak whatever else you need, and save it as sitemap.xml. Can't be as good as generating it but gotta be better than just a list or urls.


7:00 pm on Jun 7, 2005 (gmt 0)

10+ Year Member

Unless you need to do this 10,000 times.....


7:02 pm on Jun 7, 2005 (gmt 0)

10+ Year Member

Of course, but the control it gives you still has to be better than:


just one after another in a text file surely?


10:40 pm on Jun 7, 2005 (gmt 0)

10+ Year Member

Sorry to blow my own trumpet here, but hey you guys linked to me - you even got me to subscribe to see where the hits were coming from ;) lol

Give my website a razz you can submit a domain name or a links directory (directorys only at the mo - make sure your default document is the one fired when you type just the sub-dir or domain - I am fixing this)

It has the limitations of most the online generators at the mo - ie can't decipher pag creation date or priority. But it will produce a easy to edit xml document like the one you describe - its also a little quicker than cut and paste

If you have any problems with the generator- give me a shout and I'll help you out.

I can soon have you xml file up there an indexed!



[edited by: lawman at 2:59 pm (utc) on June 8, 2005]
[edit reason] No Signatures Please [/edit]


9:58 am on Jun 8, 2005 (gmt 0)

10+ Year Member


I just want to add some information about the php script mentioned earlier:

phpSitemap - "create your personal google sitemap file", current version 1.2.2

Current features:
- Set filter for file and directory names for files to be excluded.
- Reads the last modification time and sets the change frequency accordingly.
- You can specifiy the intial (fixed) priority of each page.
- Create a sitemap.xml file and submit this url to google.

New features:
- File information can now manually set (for each file): enable/dissable a file, last modification time, change frequency, priority
- All settings (also for files) will be stored and are used for further runs.

Known limitations:
- Not tested with huge sites
- Cannot handle dynamic links (like index.php?id=13) - this will be integrated in the near future.



[edited by: lawman at 3:08 pm (utc) on June 8, 2005]
[edit reason] No URL Drops Please [/edit]


10:53 am on Jun 8, 2005 (gmt 0)

10+ Year Member

Did anyone bother to read rules and regulations?

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales.

How will this affect adsense publishers?


11:30 am on Jun 8, 2005 (gmt 0)

10+ Year Member

>How will this affect adsense publishers?

You don't need a Google account to participate in the Google SiteMaps program. Submitting sitemaps to Google helps Googlebot to adjust the crawling of your web site. It has no impact on your rankings. It does not directly increase traffic to your site. You give Google your sitemap.xml file for free, thus you're not advertising sales.

Since it seems that related links are tolerated in this thread, here is my Google SiteMap Tutorial:

[edited by: lawman at 3:09 pm (utc) on June 8, 2005]
[edit reason] Link Drops Are Not Tolerated [/edit]


1:09 pm on Jun 8, 2005 (gmt 0)

10+ Year Member

jcmiras -- regarding utf-8 and text files question:

If your working environment and language is English, your computer probably interpret characters as either ASCII, "Latin-1" or utf-8. ASCII defines the first 128 characters; unaccented letters and numbers and punctuation, such as what I am writing here. Latin-1 (a.k.a. ISO-8859-1) defines an additional 128 characters that contain some symbols and accented characters used in languages mostly derived from Latin (French, Spanish, Italian, for example). UTF-8 defines pretty much all characters in use in modern languages today, including "CJK" (Chinese, Japanese and Korean).

ASCII is a subset of UTF-8.

URLs which typically don't have more than letters, numbers and the special characters like //: &? = and so on. So unless you see accented characters in your URLs, you should be fine. Make sure not to use MS Word to create your file as it may do unexpected things to letters -- a text editor (like notepad in Windows) would be better.

It's unclear (to me) what would happen if your text file had URLs that contained accented characters -- you might need to do some fancy entity escaping or encoding, in which case, I recommend just using XML (which you can also, which declares the character set being used and is only slightly more verbose than typing the URLs in by hand.


2:20 pm on Jun 8, 2005 (gmt 0)

10+ Year Member

I have it written in java. Able to submit over half million url in less than a minute.

May want to look at - <snip> Python version 2.2 [google.com...]

[edited by: wakahii at 2:25 pm (utc) on June 8, 2005]

[edited by: lawman at 3:09 pm (utc) on June 8, 2005]
[edit reason] No Link Drops Please [/edit]


3:11 pm on Jun 8, 2005 (gmt 0)

WebmasterWorld Administrator lawman is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Link drops/signatures are not allowed. Please read TOS [webmasterworld.com] and govern yourselves accordingly.



10:47 pm on Jun 8, 2005 (gmt 0)

10+ Year Member

Cheers lawman, sorry about that

Anyway I have now added a feature thats picks up the PR of you page and assigns it to the priority tag of the xml file that you want to produce. Although this is not perfect it should make SitemapsPal a lot easier to edit large files. (thanks to those that have requested larger outputs and shown an interest)

If anybody has anymore questions on Sitemaps and how to use them - fire away. I plan to stick around this forum - not just advertise my latest creation.

Its a good job somebody posted a link to my site from here otherwise I'd never have found ya.




6:57 pm on Jun 9, 2005 (gmt 0)

10+ Year Member

We have released an ASP.Net script that will parse through an IIS site and also parse through log files if they are available. The log file parsing really helps with dynamically generated urls ( for instance shopping carts). The results are outputted as an XML file that conforms to the google sitemap specification.

I am awaiting a response from a moderator before posting the url.


12:17 pm on Jun 10, 2005 (gmt 0)

10+ Year Member

To get to our sitemap generator, you can also view my profile and from there follow a link to our download site.
Replace the www with the word sitemap on our domain name


1:02 pm on Jun 10, 2005 (gmt 0)

10+ Year Member

I began using Googles SiteMap earlier this week. And have had the googlebot visit my site at least twice a day. However when I check my site on Google, I notice that my page cache has not changed. They still have an earlier version of my site listed (one that really was not ready to be indexed and cached). Am I missing something or is there a difference between a site that is visited by a googlebot and one that is crawled? And shouldnt this sitemap with all the visits, clear up my cache and description issue?


1:39 pm on Jun 10, 2005 (gmt 0)

10+ Year Member

I think the intention of the sitemap utility is for Google to uncover more of the "hidden web". In particular pages from dynamic database driven sites. The other thing to note is that it is still in Beta so there is no guarantee that the results will show up immediately in the serps.


6:27 pm on Jun 10, 2005 (gmt 0)

10+ Year Member

I created a little tutorial on how to use Xenu and Excel to create a sitemap for a large site.

Check it out here:



6:30 pm on Jun 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



6:43 pm on Jun 10, 2005 (gmt 0)

10+ Year Member

Hi Guys!

I created an xml sitemap, and submitted it to Google last night, but as I'm no expert at this, I'm not sure if I did it right.

However, Google has dowloaded it, as says its "ok", and I have no errors in that section of "My Sitemaps".

Does this mean I've done it right, or is my sitemap in storage until Google finds the time to check it?

I'd be grateful if anyone could shed any light on this, thanks in advance, FTWB05.


8:15 pm on Jun 10, 2005 (gmt 0)

10+ Year Member

Update to previous post....

I've just checked Google to see if it had cached any more of my pages since I uploaded my site map last night, and guess what?

Every single page that I included in the sitemap has been cached, whereas before I only had 5 or 6, and was pulling my hair out as to why....

The site is only a couple of months old, so I was just waiting for Google to take its natural course, but submitting my sitemap has definitely speeded things up, off to check where those pages are appearing in Googles listings now....

....and to add more pages to my sitemap (I had to hand do it, couldn't figure out how to dynamically create one!)

Thanks, FTWB05


8:37 pm on Jun 10, 2005 (gmt 0)

10+ Year Member

I am glad to hear of your good fortune. I am in a similar situation, new site, new sitemap,, but I am not getting the good results you are.


9:30 pm on Jun 10, 2005 (gmt 0)

10+ Year Member

Thanks, it does seem like good news, of course it could be a coincidence that Google decided to cache so many pages 24hrs after I submitted my sitemap - but unlikely.

All is not rosey though - I havn't seen a big jump in traffic, and another, older site that I submitted a sitemap for hasn't been cached.

Maybe I'll have to wait for an update of the listings - just too impatient!


9:04 am on Jun 11, 2005 (gmt 0)

10+ Year Member

Google Sitemaps is a double-edged sword ;) Be very careful!

Using this, I have managed to get several of my URLs crawled and indexed, which were URL only or supplemental results for a long time.

However I got into another trouble now. As I used the Python based script provided by Google, it included ALL the files in my directory. Several of these were identical to the home page (used for testing or PPC) and because I was not very sincere in editing the list produced by Sitemap generator, these pages have also been indexed and I am afraid, I will get another round of duplicate content penalty.

What next?

I removed these pages from the site location. I have also deleted the reference from the sitemap.xml file. But how do I get them removed from the index. Looks like I will have to fiddle with the URL remove utility once again (scary thought, eh?)

So be very diligent in checking your auto-generated XML file and remove all the references to files you don't want indexed - before you submit it to Google.

Someone mentioned this tool will help Google discover a large part of the hidden web. You bet!

This 188 message thread spans 7 pages: 188

Featured Threads

Hot Threads This Week

Hot Threads This Month