Forum Moderators: goodroi
We want to create an xml sitemap for our site/portal.
The xml sitemap will be used for the googlebot (mainly)
Problem is that we have some hundrents of thouands of pages.
So, is there a good programm/script, able to create an xml sitemap of that size?
our main concern is the performance of the crawler, when spidering almost 1 million pages and the file size of the xml sitemap.
The question is: have you used any such crawler for a very large site? How did the script performed under constant updates in many hundrents of pages?
We dont mind the cost of the script, as long as we know that it will work.
you may want to ask yourself what you hope to accomplish with generating a sitemap for a million pages. simply creating a sitemap does not guarantee the search engines will index the pages and it definitely does not guarantee any ranking.
have you used any such crawler for a very large site?
I nightly run one with over 5,000,000 pages.
How did the script performed under constant updates in many hundrents of pages?
Poorly. It's written in Python and does some very stupid things. You can't even exclude entire directories from the crawl (although you can exclude them from the output).