Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- Add Sitemap To robots.txt For Autodiscovery


StupidScript - 1:33 am on Apr 13, 2007 (gmt 0)


What would be the advantage or disadvantage of using this .xml feed

For one, Yahoo, Google, MSN, Ask and IBM are all supporting this same method. For another, it is included in the robots.txt file, which all of them hit first, and so makes it easier for them to find your sitemap.

Here's some PHP for generating a Sitemaps.org-compatible sitemap.xml file. Please post any modifications you might make to it:

<?php
/*########################################################
# Generates a sitemap per specifications found at: #
# http://www.sitemaps.org/protocol.html #
# DOES NOT traverse directories #
# 20070712 James Butler james at musicforhumans dot com #
# Based on opendir() code by mike at mihalism dot com #
# http://us.php.net/manual/en/function.readdir.php#72793 #
# Free for all: http://www.gnu.org/licenses/lgpl.html #
# #
# Useage: #
# 1) Save this as file name: sitemap_gen.php #
# 2) Change variables noted below for your site #
# 3) Place this file in your site's root directory #
# 4) Run from http://www.yourdomain.com/sitemap_gen.php #
# #
# <lastmod> -OPTIONAL #
# YYYY-MM-DD #
# <changefreq>-OPTIONAL #
# always #
# hourly #
# daily #
# weekly #
# monthly #
# yearly #
# never #
# <priority> -OPTIONAL #
# 0.0-1.0 [default 0.5] #
# #
# Add completed sitemap file to robots.txt: #
# Sitemap: http://www.yourdomain.com/sitemap.xml #
# #
########################################################*/

######## CHANGE THESE FOR YOUR SITE #########
# IMPORTANT: Trailing slashes are REQUIRED!
$my_domain = "http://www.yourdomain.com/";
$root_path_to_site = "/root/path/to/site/";
$file_types_to_include = array('html','htm');
############## END CHANGES ##################

$xml ="<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
$xml.="<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">\n";
$xml.=" <url>\n";
$xml.=" <loc>".$my_domain."</loc>\n";
$xml.=" <priority>1.0</priority>\n";
$xml.=" </url>\n";
## START Modified mike at mihalism dot com Code ######
function file_type($file){
$path_chunks = explode("/", $file);
$thefile = $path_chunks[count($path_chunks) - 1];
$dotpos = strrpos($thefile, ".");
return strtolower(substr($thefile, $dotpos + 1));
}
$file_count = 0;
$path = opendir($root_path_to_site);
while (false!== ($filename = readdir($path))) {
$files[] = $filename;
}
sort($files);
foreach ($files as $file) {
$extension = file_type($file);
if($file!= '.' && $file!= '..' && array_search($extension, $file_types_to_include)!== false) {
$file_count++;
### END Modified mike at mihalism dot com Code ######
$xml.=" <url>\n";
$xml.=" <loc>".$my_domain.$file."</loc>\n";
$xml.=" <lastmod>".date("Y-m-d",filemtime($file))."</lastmod>\n";
$xml.=" <changefreq>monthly</changefreq>\n";
$xml.=" <priority>0.5</priority>\n";
$xml.=" </url>\n";
}
}
$xml.="</urlset>\n";
if($file_count == 0){
echo "No files to add to the Sitemap\n";
}
else {
$sitemap=fopen("sitemap.xml","w+");
if (is_writable("sitemap.xml")) {
fwrite($sitemap,$xml);
fclose($sitemap);
echo "DONE! <a href='sitemap.xml'>View sitemap.xml</a><br>\n";
echo "Remove items you do not want included in the search engines.<br>\n";
echo "Modify < changefreq > and < priority > to taste.<br>\n";
echo "Add 'Sitemap: ".$my_domain."sitemap.xml' to robots.txt.<br>\n";
}
else {
exec("touch sitemap.xml");
exec("chmod 666 sitemap.xml");
if (is_writable("sitemap.xml")) {
fwrite($sitemap,$xml);
fclose($sitemap);
exec("chmod 644 sitemap.xml");
echo "DONE! <a href='sitemap.xml'>View sitemap.xml</a><br>\n";
echo "Remove items you do not want included in the search engines.<br>\n";
echo "Modify < changefreq > and < priority > to taste.<br>\n";
echo "Add 'Sitemap: ".$my_domain."sitemap.xml' to robots.txt.<br>\n";
}
else {
echo "File is not writable.<br>\n";
}
}
}
?>


Thread source:: http://www.webmasterworld.com/robots_txt/3310622.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com