Welcome to WebmasterWorld Guest from 54.161.110.186

Forum Moderators: goodroi

Message Too Old, No Replies

xml sitemaps for large sites

     
7:03 am on Jan 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi all and thanks in advance or any info/insight you can provide me.

We want to create an xml sitemap for our site/portal.
The xml sitemap will be used for the googlebot (mainly)
Problem is that we have some hundrents of thouands of pages.

So, is there a good programm/script, able to create an xml sitemap of that size?

1:52 pm on Jan 29, 2009 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



take your pick ...
[google.com...]
[code.google.com...]
2:57 pm on Jan 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



some of them i have tested already (plus many others free scripts that can by run on our server (apache/php))

our main concern is the performance of the crawler, when spidering almost 1 million pages and the file size of the xml sitemap.

The question is: have you used any such crawler for a very large site? How did the script performed under constant updates in many hundrents of pages?

We dont mind the cost of the script, as long as we know that it will work.

2:33 pm on Jan 31, 2009 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



why the concern for performance? this is not something you need to run in real time. it is not even something you need to update every night. you could setup a weekly process that runs in the middle of the night.

you may want to ask yourself what you hope to accomplish with generating a sitemap for a million pages. simply creating a sitemap does not guarantee the search engines will index the pages and it definitely does not guarantee any ranking.

5:46 am on Feb 28, 2009 (gmt 0)

5+ Year Member



Google might like sitemaps though, it is there for a purpose. omoutop, try breaking your site up into smaller pieces to create a sitemap, it has helped me.

Take care.

12:30 am on Mar 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



have you used any such crawler for a very large site?

I nightly run one with over 5,000,000 pages.

How did the script performed under constant updates in many hundrents of pages?

Poorly. It's written in Python and does some very stupid things. You can't even exclude entire directories from the crawl (although you can exclude them from the output).

12:31 am on Mar 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



it is not even something you need to update every night

That depends on how many new pages get created in a day.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month