Welcome to WebmasterWorld Guest from

Forum Moderators: phranque

Message Too Old, No Replies

Sitemaps For a PHP Site

Any software to smoothly handle 30K + pages?



9:55 pm on Nov 8, 2008 (gmt 0)

10+ Year Member


1. Site with 20,000 to 30,000 pages live data.
2. Product inventory constantly changes and is updated daily.
Some pages have info changes, and others are deleted if the product is sold.
3. Levels 1 and 2 of my site structure have maybe.... less than 100 pages. But
at Level 3, the content explodes into thousands of pages.
4. Only about 1/2 the pages have already been spidered and ranked
on a 2 to 3 year old site.
5. Since all of my Level 1 and 2 pages have been found, I feel it's safe to assume
I'll need a sitemap to guide Google and other SE's through the 3rd Level.
6. Most sitemap software times out/runs out of memory before the sitemap can
be completed.


1. Is there any software that can handle this?

2. How often, if ever, should I update the sitemap?

Thanks for any help you can give!


2:17 pm on Nov 9, 2008 (gmt 0)

WebmasterWorld Senior Member rocknbil is a WebmasterWorld Top Contributor of All Time 10+ Year Member

6. Most sitemap software times out/runs out of memory before the sitemap can
be completed.

I'm presuming 1) you've styled your sitemap generator after the samples in G's webmaster tools, and 2) you're requesting the generator from a browser?

Have you tried/have the ability to run a program from the command line? This is the best solution, really. It would probably be fine if you can.

If you can't, the way I get around timeouts where it has to be requested from a browser is to use fork(). Fork is pretty simple in concept:

if $pid = fork() { &parent_process; }
else { &child_process; }

When initiated, a child process is started - this is your sitemap generator. A process id is assigned to the child process and if it exists, the parent process - an immediate response to the browser "job in progress" - is returned.

However, you have to be careful. If you don't understand fork, the parent process can hang waiting for the child to finish, or the child may exit prematurely. One way to approach this is to close the STDOUT filehandle when starting the child, and make the program wait for the child before exiting:

if $pid = fork() { &parent_process_do_not_exit; }
else {
close (STDOUT);

The above is perl code, but equivalents can be concluded in any server side language.


3:44 am on Nov 10, 2008 (gmt 0)

10+ Year Member

I totally understand your comments and agree.  We have used fork() on several processes, but the server host has now blocked command line access and even system command acccess from inside PHP because of server loading issues.  We have heard of software available that will map these large sites without all the memory and system resource useage.  This is what we were trying to find.

I really just need to buy a dedicated server so I'd have full control. Until then, I'm trying to find something to get by.

Thanks for responding! Any other thoughts are appreciated.


10:33 pm on Nov 10, 2008 (gmt 0)

5+ Year Member

I have written my own and it handles well around 980K+ URLs. I use cron and submit sitemap every six hours. Though memory consuming due to lots of SQL usage but my site updates every minute


7:51 am on Nov 17, 2008 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

anything you can do to decrease load times for your site will allow google to spider more of your 3rd level pages.
in some cases you can get some efficiency with cacheing or serving compressed content.

Featured Threads

Hot Threads This Week

Hot Threads This Month