Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How can i index more than 5 million pages?

want to index more than 5 million pages in google

         

mianoor

6:52 pm on Nov 23, 2006 (gmt 0)

10+ Year Member



Hello,

My client have more 5 million pages website, he want to me index his site on google/yahoo...

Can anyone guide me that how can i index all pages?

What should i do? or tool i used...?

I don't want spam way, just white hat

Regards,

Mianoor

Pirates

1:22 am on Nov 24, 2006 (gmt 0)



Its diffucult for me to percieve a 5 million page website that is in need of your help unless its full of spam inwhich case not interested.

egurr

1:26 am on Nov 24, 2006 (gmt 0)

10+ Year Member



If it were easy everyone would do it. I have a potential client that wants about twice that indexed. It's already about 100000 but all but 1000 in supps. The thing is, for most terms supplemental is fine, the targeted keyword still ranks.
To get that many pages ranked is essentially about a 2 million dollar plus project. You can still do it with cold fusion, PHP, etc. but you need very big arrays and very tight code.

theBear

1:31 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"You can still do it with cold fusion, PHP, etc. but you need very big arrays and very tight code. "

Huh, could you enlighten me about verry big array etc ....?

Pirates

1:31 am on Nov 24, 2006 (gmt 0)



Yeah I know the answer to this. Nahh not gonna post it. Anyway good luck.

sun818

1:38 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How does a site like eBay or Amazon do it?

Pirates

1:40 am on Nov 24, 2006 (gmt 0)



badly

Well ebay own shopping sites like shopping.com and dealtime.co.uk do it better but mostly rely on serving and optomising two pages at once. For instance "dealtime" would be one phrase and "dealtime-" would be second.

[edited by: Pirates at 1:46 am (utc) on Nov. 24, 2006]

Leosghost

1:42 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



personally I would have said "inelegantly" ;-)

Pirates

1:49 am on Nov 24, 2006 (gmt 0)



Amazon on the other hand just do it through quality. So in brief ebay = wan'kers amazon = good.

[edited by: Pirates at 1:50 am (utc) on Nov. 24, 2006]

vite_rts

1:59 am on Nov 24, 2006 (gmt 0)

10+ Year Member



last time i looked, ebay, one of the top 10 sites in the world, had

oh, a grand total of 1,900 pages indexed by Google, amazing heh

Pirates

2:05 am on Nov 24, 2006 (gmt 0)



Shame google doesn't treat the same distain the shopping sites they own that are preventing genuine sites listing with there #*$! results, sites like dealtime.

theBear

2:08 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Currently an ebay site search returns over 120,000,000 pages as an estimated page count.

[edited by: theBear at 2:08 am (utc) on Nov. 24, 2006]

tedster

2:16 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The key I think, is do it gradually, and double check Google's response with every step.

I'm currently working with a site that has 400,000 urls indexed, growing up from 200,000. Their newly redesigned domain actually has over 30 million URLs, but we will not "expose them" to googlebot all at once. Clearly, if MSN got nuked for a mere 10 million at once, then even a respected business with that many URLs all designed for visitors, will still have a problem dumping them all at once.

There's a lot of detailed planning required just to be able to show these new URLs gradually -- including the link structure as well as the savvy use of url re-writing and robots.txt. (I'm getting paid OK for the job, but not nearly 2 million. I can only wish!)

for most terms supplemental is fine

That's an important insight. It's especially true because the URLs can still be returned for the long tail searches, which is all you need for that kind of deep content. There's little value in millions of URLs if the key "fat belly" search terms start to tank.

g1smd

2:24 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




>> My client have more 5 million pages website. <<

Does a site really need to get that many pages indexed?

I mean, does a shopping site really need to get every product indexed with every colour variation, every size variation, every design variation, every different manufacturer variation, every purchase-quantity variation, and every other type of variation, published each as individual pages?

I think not.

.

Additionally, I am troubled by the fact that you have seemingly taken this job on, and are getting paid for it, and then you have to ask in the forum what to do. Does the client know that you don't know what to do?

[edited by: g1smd at 2:30 am (utc) on Nov. 24, 2006]

Pirates

2:26 am on Nov 24, 2006 (gmt 0)



Supplemental can be a tag on unchechecked pages I think. But I would like to put the focuss on ebay and the shopping sites they own and there miss use of there database to create listings on shopping sites epinions shopping.com dealtime to name a few.

Pirates

2:35 am on Nov 24, 2006 (gmt 0)



Additionally, I am troubled by the fact that you have seemingly taken this job on, and are getting paid for it, and then you have to ask in the forum what to do. Does the client know that you don't know what to do?

1. I have not taken the job on
2. If I did I could probably do it without help. But if I needed forum help I would ask for it as there are some great people here.

[edited by: jatar_k at 3:13 am (utc) on Nov. 24, 2006]

Whitey

2:44 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To get that many pages ranked is essentially about a 2 million dollar plus project.

Why do you believe that it will take this amount of financial resources to get it going?

Fryman

3:20 am on Nov 24, 2006 (gmt 0)

10+ Year Member



Agreed, what does money have to do with getting a site indexed?

jatar_k

3:25 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I'm trying to figure out how a 5 mil page site isn't indexed

new site?
bad history, already got removed?
urls so GET stringed out that a bot can't follow?

mianoor, a bit more info about why it isn't indexed would help get better answers. Unless you mean ranking it, which is something all together different.

as Ted said, a gradual approach is the best way.

vite_rts

3:25 am on Nov 24, 2006 (gmt 0)

10+ Year Member



@ thebear

Hi there, just a question, how did you do your page count?

site:www.ebay.com yields a max of 23,000 yahoo, 1,500 msn,
1900 google

I guess i am missing quite a lot of stuff here,

I will have to amend some techniques

ashear

3:26 am on Nov 24, 2006 (gmt 0)

10+ Year Member



There is alot of dependencies to this.

Since I am in charge of optimizing some of the sites you are talking about I know this quite well.

ashear

3:29 am on Nov 24, 2006 (gmt 0)

10+ Year Member



@ thebear
Hi there, just a question, how did you do your page count?

site:www.ebay.com yields a max of 23,000 yahoo, 1,500 msn,
1900 google

I guess i am missing quite a lot of stuff here,

I will have to amend some techniques

Try site:ebay.com, eBay uses sub domains. When you envoke a site:www.ebay.com you only see pages indexed on WWW.

vite_rts

3:39 am on Nov 24, 2006 (gmt 0)

10+ Year Member



ahh, 97,000,000 on yahoo

theBear

3:41 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remove the www ....

mianoor

7:51 am on Nov 24, 2006 (gmt 0)

10+ Year Member



The site using Cold Fusion and having separate page for each product(s). they manufactured...

jatar_k

8:07 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



sounds like the answer is mod_rewrite or whatever rewrite option you have available on your server

idolw

9:21 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



just get links to internal pages.
what's the topic about?

briggidere

9:42 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



vite,

i get 1830 showing in google's main index and 2620 with the supps so i don't think you are missing anything.

I have a couple of questions for people with sites that have millions of pages. Are they all selling things, with millions of products. Who has written the content for the millions of pages, is it valuable unique content etc?

Unless there is a massive product database i don't see how people could have such a large site.

If you were able to write 20 pages of content a day and you wanted a 1,000,000 page site it would take you 136 years to get the content done. How can anyone do this? Even if you wrote 100 pages a day it'd still take 27 years. Saying that, this is talking about one person doing it, not 100 people, so i know it's achievable, but if anyone who's running these sites can give me a bit more detail i'd appreciate it.

tedster

10:32 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One that I'm working with is the online version of a long-established brick & mortar busines - plus their mail order catalog. They have a huge staff and a very technical product line -- with decades worth of valuable content built up. In their field their print catalog is considered a near "bible" for afficianados, with detailed overviews of the technologies involved and guides to product selection and differentiation.

They start with a 250,000 SKU database, without even counting category, sub-category, sub-sub-category pages, comparison pages, product details, online manuals, and so on. There are many businesses like this learning to make good use of the web, and playing catch-up in many cases.

There's no doubt that their existing customers now want to be able to access all this information online - and I'm sure new customers will too. The challenge is not to show a similar profile to the search engines as a spam play while all this is made available.

gpmgroup

10:50 am on Nov 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unless there is a massive product database i don't see how people could have such a large site.

Really?

We've just built a site with over a million pages and it isn't a product database, and yes before you ask all pages are unique (not just slicing and dicing.)

This 39 message thread spans 2 pages: 39