homepage Welcome to WebmasterWorld Guest from 54.167.138.53
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

    
Sitemaps For a PHP Site
Any software to smoothly handle 30K + pages?
canthavejust1

10+ Year Member



 
Msg#: 3782775 posted 9:55 pm on Nov 8, 2008 (gmt 0)

Given:

1. Site with 20,000 to 30,000 pages live data.
2. Product inventory constantly changes and is updated daily.
Some pages have info changes, and others are deleted if the product is sold.
3. Levels 1 and 2 of my site structure have maybe.... less than 100 pages. But
at Level 3, the content explodes into thousands of pages.
4. Only about 1/2 the pages have already been spidered and ranked
on a 2 to 3 year old site.
5. Since all of my Level 1 and 2 pages have been found, I feel it's safe to assume
I'll need a sitemap to guide Google and other SE's through the 3rd Level.
6. Most sitemap software times out/runs out of memory before the sitemap can
be completed.

Question:

1. Is there any software that can handle this?

2. How often, if ever, should I update the sitemap?

Thanks for any help you can give!

 

rocknbil

WebmasterWorld Senior Member rocknbil us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3782775 posted 2:17 pm on Nov 9, 2008 (gmt 0)

6. Most sitemap software times out/runs out of memory before the sitemap can
be completed.

I'm presuming 1) you've styled your sitemap generator after the samples in G's webmaster tools, and 2) you're requesting the generator from a browser?

Have you tried/have the ability to run a program from the command line? This is the best solution, really. It would probably be fine if you can.

If you can't, the way I get around timeouts where it has to be requested from a browser is to use fork(). Fork is pretty simple in concept:

if $pid = fork() { &parent_process; }
else { &child_process; }

When initiated, a child process is started - this is your sitemap generator. A process id is assigned to the child process and if it exists, the parent process - an immediate response to the browser "job in progress" - is returned.

However, you have to be careful. If you don't understand fork, the parent process can hang waiting for the child to finish, or the child may exit prematurely. One way to approach this is to close the STDOUT filehandle when starting the child, and make the program wait for the child before exiting:

if $pid = fork() { &parent_process_do_not_exit; }
else {
close (STDOUT);
&child_process;
}
waitpid($pid,0);

The above is perl code, but equivalents can be concluded in any server side language.

canthavejust1

10+ Year Member



 
Msg#: 3782775 posted 3:44 am on Nov 10, 2008 (gmt 0)

I totally understand your comments and agree.  We have used fork() on several processes, but the server host has now blocked command line access and even system command acccess from inside PHP because of server loading issues.  We have heard of software available that will map these large sites without all the memory and system resource useage.  This is what we were trying to find.

I really just need to buy a dedicated server so I'd have full control. Until then, I'm trying to find something to get by.

Thanks for responding! Any other thoughts are appreciated.

jcodemasters

5+ Year Member



 
Msg#: 3782775 posted 10:33 pm on Nov 10, 2008 (gmt 0)

I have written my own and it handles well around 980K+ URLs. I use cron and submit sitemap every six hours. Though memory consuming due to lots of SQL usage but my site updates every minute

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3782775 posted 7:51 am on Nov 17, 2008 (gmt 0)

anything you can do to decrease load times for your site will allow google to spider more of your 3rd level pages.
in some cases you can get some efficiency with cacheing or serving compressed content.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved