|70,000 page site & sitemap options|
We have a directory site that has just over 70,000 pages but in 10 years, have never had a sitemap on there.
I was hoping to get some input on the best way to do this. Should we have just one large XML sitemap, or is it best to build our own HTML one? I worry a little about a huge XML feed, but am I right to worry about this?
The second question is about how best to release a new version of one of our job sites.
This other site currently has around 5,000 pages. The new version has a big directory that has been added to it but I am unsure how best to release this? Should we do like 2,000 pages per week, per month, all 70,000 pages all in one go?
What I don't want to happen is for Google or any other SE to suddenly find all of these new pages and then penalise us for it.
If anyone has any insights into these 2 points, I will be very grateful to hear about them.
Google limits a single sitemap file to 50k urls.
So you would need to split your data into multiple files.
It's not an answer to what you were asking, but just keep that bit of info in mind.
OK thanks for that :)
On way you could do this using A1 Sitemap Generator or any other sitemapper program that supports splitting sitemaps at custom URL count intervals would be...
Split sitemap generation at e.g. 2000 URLs per file.
You will then just have to edit the generated sitemap index file a little. And then add a reference to one new sitemap-xyz!.xml file every day or week or whatever :)
|Should we have just one large XML sitemap, or is it best to build our own HTML one? |
As bcc1234 said, 50,000 is the limit, so two is fine for the XML File, or you could break them down smaller if you like, and as to the second part of the question, I usually use both.
|I worry a little about a huge XML feed, but am I right to worry about this? |
I wouldn't. Do what works best for you.
If it's a site you're going to be adding to often, you might look into something that automates the creation of them. I'm sure you can find something you could run a CRON to create the files, but personally, I don't like running any bots on my sites to create the files, which is generally what you have to do using 'off the shelf' software, so what I do instead is use one that creates them dynamically when they are requested rather than having them be static.
Google particularly limits 50000 urls or the size of file upto 10 GB. So, for so robust sites like directory, you have to segregate your urls category wise and then you have to make different XML sitemaps for different category and submit it in google webmaster tool.
Then make a final XML sitemap where you have to locate the other sitemap's address and it would also have to submit on google webmaster tool. This is the process i think for the best in dexing of your huge urls. If there are other processes, let me know.
I'm assuming those 70,000 URLs are not all the same type of page, like a product page for example. If that's the case, consider breaking up the list of URLs into logical groups and submitting them in separate files.
This can help in tracking sitemap indexing stats, as well as make it easier to adjust the other options such as changefreq, last modified etc.
You may also want to read this Google paper which also gives examples of how CNN and Amazon structure their sitemaps: "Sitemaps: Above and Beyond the Crawl of Duty" [www2009.eprints.org...]