Forum Moderators: open

Message Too Old, No Replies

whats the limit for googlebot

how do you create a site map for 20,000+ pages

         

NedFlanders

1:05 am on Mar 20, 2003 (gmt 0)

10+ Year Member



I have a shopping site that has over 20,000+ products and so 20,000+ pages. The pages are generated from a db via php, but I have found that as long as the php script does not contain the variable id (ie search.php?id=55) then it crawls them fine.

So basically I have 20,000 pages that I could include in a sitemap that I could link from the homepage. Which I understand from reading this forum, would be a good thing.

But the file would be very big, would googlebot ignore the file, crawl part of it, or even all of it?

Thanks for any help in advance.

deejay

1:57 am on Mar 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




1 Google only indexes the first 100k of a page - make sure your pages are smaller than this.

2 GoogleGuy has suggested 100 links (from memory) from a page as a maximum to stop it looking spammy. I can't recall if this was said in reference to internal or external links specifically, but it's a good rule of thumb.

20,000 pages /100 links per page = 200 pages.

You don't want links to 200 sitemap pages from your index page, so you need a two level approach.

From your index page link to two pages:
- Sitedir1
- Sitedir2

From Sitedir1 link to 100 level 2 sitemap pages:
- Sitemap1
- Sitemap2... etc... to Sitemap100.

Repeat for Sitedir2, linking to Sitemap101-200.

On each of the 200 sitemap pages, place links to 100 of your actual pages - bingo, 20,000 pages.

............

This approach minimises PR transfer from the homepage without running afoul of GoogleGuy's 100 link guideline, and sees all your pages can be reached by the spider in three steps from the homepage.

As the site grows you might want to add another level, placing only one link from the homepage to a top level sitemap directory, from which the sitedir pages are linked, but this does add another step for the spider.

theBear

2:11 am on Mar 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do something like this:

#!/usr/bin/perl
@directory_path = qw!forxxxx!;
$pathfull="/home/virtual/xxxxxx.com/var/www/html/Catalog";
$BUFFEND=q[<!--#include virtual="/include/footer2.yyyyyyyyyyy" --></body>
</html>];
foreach $directory (@directory_path)
{
if (opendir(DIRHANDLE,$directory))
{
@files = grep(!/^\.\.?$/,readdir(DIRHANDLE));
if (closedir(DIRHANDLE))
{
foreach $file (@files)
{
open (DATAFILE, "<$directory/$file");
read (DATAFILE,$data, 50000);
close DATAFILE;
$fctr++;
$newfile .= $data . "<br>";
if ($fctr == 250)
{
$part++;
open (DATAFILE, ">$pathfull/part$part.shtml");
$index .= qq[<a href="/Catalog/part$part.shtml">Part $part</a><br>];
$BUFF=q[<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><html lang="en"><head><title>yyyyyyyyyyy.com Detailed Product Catalog Page] . qq[ $part </title><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">] .
q[<!--#if expr="$HTTP_USER_AGENT = /^Mozilla/" --><!--#else --><meta name="robots" content="noindex, follow"><meta name="keywords" content="yyyyyyyyyyy.com site tree"><meta name="description" content="The yyyyyyyyyyy Store Catalog part] . qq[ $part"><!--#endif --><!--#include virtual="/include/style.yyyyyyyyyyy" -->
</head><body bgcolor="#FFFFFF"><!--#include virtual="/include/header.yyyyyyyyyyy" --><!--#include virtual="/include/leftbarbookstore.yyyyyyyyyyy" -->
</table><table cellpadding=0 cellspacing=0 bgcolor="#ffffff" width=640 class=dept><tr><td><font face="arial,helvetica,sans-serif" size=2 color="#000080"><br>&nbsp;<a href="/">Home</a>
&nbsp;&#187;&nbsp;<a href="/ourStore/">Shopping</a>&nbsp;&#187;&nbsp;<a href="/Catalog/">Product Catalog</a>
&nbsp;&#187;&nbsp;Part $part</font></td></tr></table><table cellpadding=10 width=640 cellspacing=0 border=0 bgcolor="#ffffff">];
$ifile = $BUFF . "<tr><td class=dept>" . $newfile . "</td></tr></table></table>" . $BUFFEND;
print DATAFILE $ifile;
close DATAFILE;
$fctr = 0;
$newfile = "";
}
}
}
else
{
print "$directory close failed please correct and rerun\n";
}
}
else
{
print "$directory open failed please correct and rerun\n";
}
}
if ($fctr > 0)
{
$part++;
open (DATAFILE, ">$pathfull/part$part.shtml");
$index .= qq[<a href="/Catalog/part$part.shtml">Part $part</a><br>];
$BUFF=q[<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><html lang="en"><head><title>yyyyyyyyyyy.com Detailed Product Catalog Page] . qq[ $part </title><meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">] .
q[<!--#if expr="$HTTP_USER_AGENT = /^Mozilla/" --><!--#else --><meta name="robots" content="noindex, follow"><meta name="keywords" content="yyyyyyyyyyy.com site tree"><meta name="description" content="The yyyyyyyyyyy Store Catalog part] . qq[ $part"><!--#endif --><!--#include virtual="/include/style.yyyyyyyyyyy" -->
</head><body bgcolor="#FFFFFF"><!--#include virtual="/include/header.yyyyyyyyyyy" --><!--#include virtual="/include/leftbarbookstore.yyyyyyyyyyy" -->
</table><table cellpadding=0 cellspacing=0 bgcolor="#ffffff" width=640 class=dept><tr><td><font face="arial,helvetica,sans-serif" size=2 color="#000080"><br>&nbsp;<a href="/">Home</a>
&nbsp;&#187;&nbsp;<a href="/ourStore/">Shopping</a>&nbsp;&#187;&nbsp;<a href="/Catalog/">Product Catalog</a>
&nbsp;&#187;&nbsp;Part $part</font></td></tr></table><table cellpadding=10 width=640 cellspacing=0 border=0 bgcolor="#ffffff">];
$ifile = $BUFF . "<tr><td class=dept>" . $newfile . "</td></tr></table></table>" . $BUFFEND;
print DATAFILE $ifile;
close DATAFILE;
$fctr = 0;
$newfile = "";
}
open (DATAFILE, ">$pathfull/index.shtml");
$BUFF=q[<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><html lang="en"><head>
<title>yyyyyyyyyyy.com Detailed Product Catalog Index] . qq[</title>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">] .
q[<!--#if expr="$HTTP_USER_AGENT = /^Mozilla/" --><!--#else --><meta name="robots" content="noindex, follow">
<meta name="keywords" content="yyyyyyyyyyy.com site tree"><meta name="description" content="The yyyyyyyyyyy Store Catalog Index] . qq[ ">
<!--#endif -->
<!--#include virtual="/include/style.yyyyyyyyyyy" -->
</head><body bgcolor="#FFFFFF"><!--#include virtual="/include/header.yyyyyyyyyyy" -->
<!--#include virtual="/include/leftbarbookstore.yyyyyyyyyyy" -->
</table><table cellpadding=0 cellspacing=0 bgcolor="#FFFFFF" width=640 class=dept>
<tr><td><font face="arial,helvetica,sans-serif" size=2 color="#000080"><br>&nbsp;<a href="/">Home</a>
&nbsp;&#187;&nbsp;<a href="/ourStore/">Shopping</a>&nbsp;&#187;&nbsp;Product Catalog</font></td></tr></table><table cellpadding=10 width=640 cellspacing=0 border=0 bgcolor="#ffffff">];
$ifile = $BUFF . "<tr><td class=dept>" . $index . "</td></tr></table></table>" . $BUFFEND;
print DATAFILE $ifile;
close DATAFILE;
$fctr = 0;
$newfile = "";
exit;

Or search the web for a sitemap building script.

Cheers,