homepage Welcome to WebmasterWorld Guest from 54.205.52.110
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
xml sitemaps for large sites
omoutop




msg:3837103
 7:03 am on Jan 29, 2009 (gmt 0)

Hi all and thanks in advance or any info/insight you can provide me.

We want to create an xml sitemap for our site/portal.
The xml sitemap will be used for the googlebot (mainly)
Problem is that we have some hundrents of thouands of pages.

So, is there a good programm/script, able to create an xml sitemap of that size?

 

goodroi




msg:3837360
 1:52 pm on Jan 29, 2009 (gmt 0)

take your pick ...
https://www.google.com/webmasters/tools/docs/en/sitemap-generator.html
[code.google.com...]

omoutop




msg:3837409
 2:57 pm on Jan 29, 2009 (gmt 0)

some of them i have tested already (plus many others free scripts that can by run on our server (apache/php))

our main concern is the performance of the crawler, when spidering almost 1 million pages and the file size of the xml sitemap.

The question is: have you used any such crawler for a very large site? How did the script performed under constant updates in many hundrents of pages?

We dont mind the cost of the script, as long as we know that it will work.

goodroi




msg:3838912
 2:33 pm on Jan 31, 2009 (gmt 0)

why the concern for performance? this is not something you need to run in real time. it is not even something you need to update every night. you could setup a weekly process that runs in the middle of the night.

you may want to ask yourself what you hope to accomplish with generating a sitemap for a million pages. simply creating a sitemap does not guarantee the search engines will index the pages and it definitely does not guarantee any ranking.

CWebguy




msg:3859695
 5:46 am on Feb 28, 2009 (gmt 0)

Google might like sitemaps though, it is there for a purpose. omoutop, try breaking your site up into smaller pieces to create a sitemap, it has helped me.

Take care.

eeek




msg:3879768
 12:30 am on Mar 27, 2009 (gmt 0)

have you used any such crawler for a very large site?

I nightly run one with over 5,000,000 pages.

How did the script performed under constant updates in many hundrents of pages?

Poorly. It's written in Python and does some very stupid things. You can't even exclude entire directories from the crawl (although you can exclude them from the output).

eeek




msg:3879770
 12:31 am on Mar 27, 2009 (gmt 0)

it is not even something you need to update every night

That depends on how many new pages get created in a day.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved