Forum Moderators: open
My question is this how many pages are to many. Do you get penalized if you do to many? What I am planning on doing is adding extra information to my book price comparsion site "removed" The new info I will be adding will a info from an open source encyclopedia, dictionary and thesaurus. If the user chooses the extra info form of the search they will get a page something like this mockup "removed" with the extra info on each side. They will also be able to browse the encyclopedia, dictionary and thesaurus to find articles on authurs, related to words to search with etc. Between the encyclopedia with 200,000 plus entries the dictionary, the thesaurus, the browsable book database (also on the way) and the authors database (also on the way) I should be creating well over 500,000 dynamic pages when it is crawled when its all done. With most pages have info from the other sources and being optimized the best I can figure out how to do including mod_rewrite to make nice friendly URLs.
Now is this a good idea?
I keep forgetting about removing links the other forums I use dont have that rule.
[edited by: Sulla at 6:02 am (utc) on Aug. 18, 2003]
It may not be a good idea to have that many indexed regularly anyway because it will eat up a ton of bandwidth.
If I was in your position, I would use mod_rewrite to make the URLs friendly. Then, I would build a site map and place it near the top of each page to feed the bots the exact pages I wanted indexed. If they do more great, but at least do the ones I want. If you do this, be careful not to put more than 100 links per page.
There's also a lot of bad bots out there who think nothing of sucking up entire sites at a fairly rapid rate. If you do a search here, you'll find a very good list of them. I used it as my starting point and continue to add to it as others show up.
Example, in Google, somehow my member directory of thousands and thousands of members was spidered by Google. I was quite suprised because it went through the whole dynamicaly driven membership list, and then starting going through every single members profile page and indexing it. Every page is the same except for the users personal information (ie age, gender, username, etc).
My site has a truly pathetic ranking on Google, I wonder if this is attributing to that. Otherwise it is quite an optimized site. I think something must have happened to the poor site and got it into Googles "bad list" or something.
If you don't want him wasting time and bandwidth indexing your members then I recommend you try regular expressions in robots.txt. You can do something like this:
User-agent: Googlebot
Disallow: /*members=*$
Disallow: /*profile=*$
or whatever is used in the url to distinguish those pages. I had to use something similar to get him and his relatives to behave on my site. The pages I didn't want him crawling are taking awhile to drop out of the index. You may see the same thing. But, at least he follows that robots.txt plus the regular expressions.
Also I had some one tell me they could just do it with Multiviews instead of using mod_rewrite. I cant find much info on Multiviews would it be better to use than mod_rewrite?