Forum Moderators: open
The site in question is a database driven store, has about 50K widget models in about 500 categories. All 500 category pages are listed in google. The category pages contain widget models in drop down "<select>" lists. The question is that so far I have not let google spider "see" those individual 50K widget pages - (its a database driven page and comes only on submitting the form). It is very tempting for me to let google spider see and index those 50K pages because they can be very very targeted. The fear that I have is that if google considers those as SPAM .. I will end up losing much more. The question is .. can that be considered spam.
1) 50,000 database driven pages are too many from a single site?
2) All 50,000 pages are very similar to each other .. because the only things that change in the template are category, model number and price .. rest of the page is all same.
3) Title of the page in each case will be slightly different because I will use category, model number of the widget in that.
So in the end no two pages are exactly same .. but all 50K pages are very similar. Should I go ahead and let google spider them or not?
regards
Jaski
If users can click on a drop down menu and view the individual pages, then there certainly isn't anything wrong with Google seeing it also.
dcheney, perhaps I failed to communicte properly. Each page is absolutely unique because its one page per model. Its just that in Kilobyte terms it will be 2% difference per page because rest of the template of the page is same every time.
Thanks for answering both of you.
Now, if you have ALL 500 of your categories listed and all of your 50K products are listed on those 500 pages, then each of your products has at least some representation in Google. But, as soon as you open the gates for Google to crawl those 50K pages, you can expect that a good number of pages at your site won't make it into the index. Among those pages will likely be some of your 500 category pages. Therefore, you'll have some categories and products well represented, but many cats and products will have no representation at all. That's a major bummer.
TIP: If you really want to give this a try, put your category pages in your root directory or at least put the specific pages deeper in the directory tree than your cat pages. Googlebot "seems" (and the jury is still out on this) to like higher level pages more than deeper ones. Setting it up this way might encourage the bot to get all of your cat pages first then crawl the products until it gets bored or time runs out.
Also - there's no need to worry about duplicate content, unless you've got pages that actually are identical. I've got many pages in my movie database where the actor has only been in one film. The page for those people contains my site template and nothing else different except for their name, "no picture available", and the name of the one movie they were in. I've seen no penalties for this other than the "pages have been filtered because they are very similar - click here to see the unfiltered results." They're still in the index and the relevant one "does" show in the SERPS.
Hope that helps.
G.
If it is indeed a danger to lose some of my category level pages when I open gates to google for widget level pages then it will be a poor trade off in my case. So this post by Grumpus really gets me thinking.
But if by making widget level links more difficult to find for google helps in keeping the category level links intact I think I can write a PHP script just for Googlebot in which there are 6-7 levels of links to follow (can make them "noindex,follow") before it gets down to widget level links. But that really sounds odd.
Any more suggestions :)
/category.php?Cat=X
/details.php?Prod=Y
You could move the details page to:
/Products/details.php?Prod=Y
This would help in lowering the PR and put more priority on the bot to crawl your 500 detail pages first, then hit the 50K.
Again - and I stress this as if it were me, I wouldn't take the risk - I've had little or no luck in nudging to bot to index what I want it to index. It doesn't go in the order it finds things on a page and it doesn't go in the order that it finds a link to a page. I've even created a little maze for the bot via my robots.txt where it:
a) hits my main page
b) isn't allowed to hit any of the "browse" pages except for the "browse by what's new" pages. It CAN go directly to detail pages that are directly linked from my homepage, though.
So, You'd expect that it'd hit the detail pages that are directly linked from my homepage and the "what's new" site map pages first, right?
Nah. It will usually get the "what's new" pages and some of the detail pages on the homepage (but even then, I've got some of those detail pages listed in the DMOZ and they STILL don't get crawled). Once it hits the site map pages, it'll hit a details page (at random, of course) and then immediately start following links on that page.
Now, I realize that my site has a much more complicated navigation system than most because of all the data and how it all relates to other data. It's designed so that a visitor can get from one page to the related information they want with as few clicks as possible, so there's a veritable "web" of navigation and not a discernable route from the homepage to the end of the site. For me, it'd be silly to create this route as my visitors wouldn't be able to get to what they want - a web site works like an encyclopedia where you go right to what you want and then go to related topics. Unfortunately, bots have to look at web sites like a Novel where they start at the beginning and read to the end (or in the case of large sites, read until they hit the deadline and have to go write the book report before they finish reading it).
G.
IE; if the page is for Widget Model 2341B, edit the programming so it generates a page to include this keyword in the title. IE: <title> Order the Widget 2341B from us! </title>
Google should not have any problem spidering the page. Probably wont spider all 50,000 at once, but come back over time and grab it in chunks.
The beuty of this is when soemoen does a keyword search for widget 2341b, your page will rank high because that keyword is in the title of the page.
enjoy.