Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is it possible to submit millions of unique pages to be indexed twice daily?

         

gaiadata

7:45 pm on Nov 25, 2008 (gmt 0)

10+ Year Member



Hi all

One of our sites is in the following form:
domain.com/default.asp?pid=15&gid=264371&la=2

Values are:
pid=3 (actually more pages, but only 3 are of importance for now)
gid=5 million (worldwide regions)
la=20 (languages)

So we are talking about submitting :
3x5,000,000x20=300 million pages

All pages include absolutely original content (regional in nature) which updates every 12 hours or so.

We want Google to index them all :)
Anybody thinks this is possible?

Thanks

Lame_Wolf

9:19 pm on Nov 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Anybody thinks this is possible?

No.
Others might, but I don't.

anallawalla

9:43 pm on Nov 25, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Short answer, no. Indexing 300 million pages would take 6-12 months, if at all. It took us 3 months to get 2.4 million pages indexed (no sitemap.xml).

I have just finished writing the specs of a smart sitemap.xml updating routine where only sitemaps referencing changed pages are updated, thus improving the chances of changed or new pages getting crawled.

As a matter of interest, how long does your sitemap.xml generation process take to run?

wheel

9:55 pm on Nov 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nope. I don't think even wikipedia or very popular forums manage to pull this kind of thing off do they? If they did, it's because they've got the backlinks.

And that's the kicker. It's not the number of pages, it's the authority of the site that will increase the probability of more pages being indexed. I've got a dead site with 32,000 pages, but only 280 indexed :).

The only thing I would suggest is you look into something like pingomatic. Perhaps that will help increase crawling.

jimbeetle

10:18 pm on Nov 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yep, you have to get the links. And PR dilution among that number of pages is going to make it very difficult to make even a very, very small fraction stick in the main index, much less get frequently crawled.

HuskyPup

12:15 am on Nov 26, 2008 (gmt 0)



So we are talking about submitting :
3x5,000,000x20=300 million pages

Call me cynical but just how is this possible:

All pages include absolutely original content (regional in nature) which updates every 12 hours or so.

Meaning:

600 million pages per day or
4,200 million pages per week or
18,000 million pages per 30 day month or
219,000,000 million pages per year

I suggest you read this:

[googleblog.blogspot.com...]

You need your own search engine:-)

CainIV

6:54 am on Nov 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You would also likely need a page rank 9 or ten to even get a fraction of those pages indexed with fresh cache dates in the index.

gaiadata

7:28 am on Nov 26, 2008 (gmt 0)

10+ Year Member



Thanks a lot for your answers.
I guess I was not very clear on something. Although our pages update every 12 hours, we do not need Google to index them twice daily. I only mentioned this to make a point that all pages are "live" so to speak, and not archived with no change in content.
What we need really is for the titles on each page to be indexed, so that they can be displayed on relative search results.
If I understand correctly Google allows submitting 1000 site maps with 50,000 urls in each map, a total of 50 million urls. This comes close to our number, so I was wondering if anyone else has ever submitted this kind of numbers and whether they had success in indexing millions of urls.
We haven't yet submitted site maps, but within 1,5 years of online presence Google has indexed slightly more than 1 million pages without any effort from our part.