Welcome to WebmasterWorld Guest from 54.167.83.224

Forum Moderators: goodroi

Message Too Old, No Replies

xml sitemaps for large sites

     
7:03 am on Jan 29, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 15, 2004
posts: 941
votes: 0


Hi all and thanks in advance or any info/insight you can provide me.

We want to create an xml sitemap for our site/portal.
The xml sitemap will be used for the googlebot (mainly)
Problem is that we have some hundrents of thouands of pages.

So, is there a good programm/script, able to create an xml sitemap of that size?

1:52 pm on Jan 29, 2009 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3080
votes: 67


take your pick ...
[google.com...]
[code.google.com...]
2:57 pm on Jan 29, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 15, 2004
posts: 941
votes: 0


some of them i have tested already (plus many others free scripts that can by run on our server (apache/php))

our main concern is the performance of the crawler, when spidering almost 1 million pages and the file size of the xml sitemap.

The question is: have you used any such crawler for a very large site? How did the script performed under constant updates in many hundrents of pages?

We dont mind the cost of the script, as long as we know that it will work.

2:33 pm on Jan 31, 2009 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3080
votes: 67


why the concern for performance? this is not something you need to run in real time. it is not even something you need to update every night. you could setup a weekly process that runs in the middle of the night.

you may want to ask yourself what you hope to accomplish with generating a sitemap for a million pages. simply creating a sitemap does not guarantee the search engines will index the pages and it definitely does not guarantee any ranking.

5:46 am on Feb 28, 2009 (gmt 0)

Preferred Member

5+ Year Member

joined:Aug 25, 2007
posts:531
votes: 0


Google might like sitemaps though, it is there for a purpose. omoutop, try breaking your site up into smaller pieces to create a sitemap, it has helped me.

Take care.

12:30 am on Mar 27, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 20, 2004
posts:875
votes: 2


have you used any such crawler for a very large site?

I nightly run one with over 5,000,000 pages.

How did the script performed under constant updates in many hundrents of pages?

Poorly. It's written in Python and does some very stupid things. You can't even exclude entire directories from the crawl (although you can exclude them from the output).

12:31 am on Mar 27, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 20, 2004
posts:875
votes: 2


it is not even something you need to update every night

That depends on how many new pages get created in a day.