Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Site Map Questions

         

Frank_Rizzo

2:38 pm on Oct 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's some questions specifically on Google's sitemap facility.

1. Is it just a case of generating a sitemap file, storing it on the server and telling G where to find it?

2. Does the sitemap need to be refreshed regularily, only when there are changes or every few weeks?

3. I have an existing site_map.html which is a basic page showing the main sections of the site. Should this be ditched or renamed and not be confused with G's sitemap file?

4. Should I split sitemaps into two sections:

a) Main pages / articles (about 500 pages)
b) Message board (about 40,000 messages / 4,000 pages)

5. Are there any specific instructions for sitemapping a phpbb which has been modded for SE optimization.

6. Can I use something like Xenu to generate the sitemap file?

7. Do other SE's use sitemap files? If yes can the same file be used?

TIA.

Frank_Rizzo

10:10 am on Oct 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm using the google sitemap creator (python script). This seems pretty ineffective in the way it generates the file.

The way I see it I either have to feed it a list of URL's, or let it traverse the directory structure and pick up all things I do not want it to.

URL list
I have to first generate a list of all URL's which I want sitemapped and then paste it into the config file or an external file.

Why can I not just tell it to seek widgets.com/index.html and follow all pages from that?

Directory Traversal
This bit is scary. I tell it to start at /home/widgets/public_html and it created a sitemap with every file including stuff such as:

php include files
username and pas s word files
source code for custom cgi programs

I know I can do a filter 'drop' but what if the site is huge and has dozens of directories / hundreds of files which I don't want sitemapped?

Not only that but this method will not read message board pages, guestbooks, links database etc.

Isn't there just a simple too which I can point at index.html and let it build the file from that?

[edited by: Frank_Rizzo at 10:11 am (utc) on Oct. 29, 2006]