Forum Moderators: open

Message Too Old, No Replies

Will google consider this as spam?

50,000 pages database driven site.

         

jaski

5:02 am on Aug 29, 2002 (gmt 0)

10+ Year Member



Hi All,
I am in a fix .. "To do or not to do?"

The site in question is a database driven store, has about 50K widget models in about 500 categories. All 500 category pages are listed in google. The category pages contain widget models in drop down "<select>" lists. The question is that so far I have not let google spider "see" those individual 50K widget pages - (its a database driven page and comes only on submitting the form). It is very tempting for me to let google spider see and index those 50K pages because they can be very very targeted. The fear that I have is that if google considers those as SPAM .. I will end up losing much more. The question is .. can that be considered spam.

1) 50,000 database driven pages are too many from a single site?

2) All 50,000 pages are very similar to each other .. because the only things that change in the template are category, model number and price .. rest of the page is all same.

3) Title of the page in each case will be slightly different because I will use category, model number of the widget in that.

So in the end no two pages are exactly same .. but all 50K pages are very similar. Should I go ahead and let google spider them or not?

regards
Jaski

dcheney

5:10 am on Aug 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To me the question is about your #2. If there isn't anything unique on the page that someone is going to be searching for, then why have it in google?

WebGuerrilla

6:28 am on Aug 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




The best "money terms" are individual product pages that include specific model numbers or brand names. If you have a database of 50,000 different types of widgets, you'd be foolish not to make that content available.

If users can click on a drop down menu and view the individual pages, then there certainly isn't anything wrong with Google seeing it also.

jaski

7:57 am on Aug 29, 2002 (gmt 0)

10+ Year Member



Thanks WebGuerrilla .. that was what I thought but was just not confident enough .. I will let google gobble them up this time then.

dcheney, perhaps I failed to communicte properly. Each page is absolutely unique because its one page per model. Its just that in Kilobyte terms it will be 2% difference per page because rest of the template of the page is same every time.

Thanks for answering both of you.

Grumpus

11:28 am on Aug 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I disagree. Speaking from experience, it's is impossible to "guide" googlebot to crawl the pages that you want her to. My site is PR5 and that means somwhere between 7500 and 12000 pages get hit each month on my database driven site.

Now, if you have ALL 500 of your categories listed and all of your 50K products are listed on those 500 pages, then each of your products has at least some representation in Google. But, as soon as you open the gates for Google to crawl those 50K pages, you can expect that a good number of pages at your site won't make it into the index. Among those pages will likely be some of your 500 category pages. Therefore, you'll have some categories and products well represented, but many cats and products will have no representation at all. That's a major bummer.

TIP: If you really want to give this a try, put your category pages in your root directory or at least put the specific pages deeper in the directory tree than your cat pages. Googlebot "seems" (and the jury is still out on this) to like higher level pages more than deeper ones. Setting it up this way might encourage the bot to get all of your cat pages first then crawl the products until it gets bored or time runs out.

Also - there's no need to worry about duplicate content, unless you've got pages that actually are identical. I've got many pages in my movie database where the actor has only been in one film. The page for those people contains my site template and nothing else different except for their name, "no picture available", and the name of the one movie they were in. I've seen no penalties for this other than the "pages have been filtered because they are very similar - click here to see the unfiltered results." They're still in the index and the relevant one "does" show in the SERPS.

Hope that helps.

G.

jaski

1:01 pm on Aug 29, 2002 (gmt 0)

10+ Year Member



In my case there are no directories .. just one PHP script which generates a page from database depending on what is requested. So there are no deep pages in that sense. The only thing that changes is the query string (ie. the things after "?" in [mydomain.com...] ). There are three levels of links ie cat_level1 > cat_level2 > widgets but that is based only on the query string and not directories.

If it is indeed a danger to lose some of my category level pages when I open gates to google for widget level pages then it will be a poor trade off in my case. So this post by Grumpus really gets me thinking.

But if by making widget level links more difficult to find for google helps in keeping the category level links intact I think I can write a PHP script just for Googlebot in which there are 6-7 levels of links to follow (can make them "noindex,follow") before it gets down to widget level links. But that really sounds odd.

Any more suggestions :)

Grumpus

1:31 pm on Aug 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you have this now:

/category.php?Cat=X
/details.php?Prod=Y

You could move the details page to:
/Products/details.php?Prod=Y

This would help in lowering the PR and put more priority on the bot to crawl your 500 detail pages first, then hit the 50K.

Again - and I stress this as if it were me, I wouldn't take the risk - I've had little or no luck in nudging to bot to index what I want it to index. It doesn't go in the order it finds things on a page and it doesn't go in the order that it finds a link to a page. I've even created a little maze for the bot via my robots.txt where it:

a) hits my main page
b) isn't allowed to hit any of the "browse" pages except for the "browse by what's new" pages. It CAN go directly to detail pages that are directly linked from my homepage, though.

So, You'd expect that it'd hit the detail pages that are directly linked from my homepage and the "what's new" site map pages first, right?

Nah. It will usually get the "what's new" pages and some of the detail pages on the homepage (but even then, I've got some of those detail pages listed in the DMOZ and they STILL don't get crawled). Once it hits the site map pages, it'll hit a details page (at random, of course) and then immediately start following links on that page.

Now, I realize that my site has a much more complicated navigation system than most because of all the data and how it all relates to other data. It's designed so that a visitor can get from one page to the related information they want with as few clicks as possible, so there's a veritable "web" of navigation and not a discernable route from the homepage to the end of the site. For me, it'd be silly to create this route as my visitors wouldn't be able to get to what they want - a web site works like an encyclopedia where you go right to what you want and then go to related topics. Unfortunately, bots have to look at web sites like a Novel where they start at the beginning and read to the end (or in the case of large sites, read until they hit the deadline and have to go write the book report before they finish reading it).

G.

Weblamer

1:32 pm on Aug 29, 2002 (gmt 0)

10+ Year Member



Jaski, here is a BIG tip when it comes to database driven pages. One of the main things that googlebot looks at these days is page title. So a good thing to do is on each generated page make sure the title is changed.

IE; if the page is for Widget Model 2341B, edit the programming so it generates a page to include this keyword in the title. IE: <title> Order the Widget 2341B from us! </title>

Google should not have any problem spidering the page. Probably wont spider all 50,000 at once, but come back over time and grab it in chunks.

The beuty of this is when soemoen does a keyword search for widget 2341b, your page will rank high because that keyword is in the title of the page.

enjoy.

ciml

4:22 pm on Aug 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Grumpus, I would think of the PageRank decreasing with link depth rather than URL depth. So moving those pages one link further from the home page might be useful, but adding an extra '/' to the URL shouldn't make a difference.