Forum Moderators: open

Message Too Old, No Replies

Newbie querystring question

Google indexing problem

         

mansfield smooth

6:59 pm on Sep 12, 2002 (gmt 0)

10+ Year Member




Hi

I run a two year old ecom site that has recently been updated.

Before the update google was spidering & indexing all the dynamic asp pages from the index page through the category pages and right down to the product pages.

The subsequent update altered the querystring, however the page & category content stayed the same.

Three months have passed, googlebot has visited many times but does not index further than the second level (category pages) despite there being deep links to the 2nd & 3rd level (category & product pages) on the index page.

Does anyone know why this may be?

--------------------------------------------------
Example urls:

Before update (these indexed)

www.domain.com/prod_info.asp?pid=1 (product page) & www.domain.com/prod_list.asp?cid=2 (category page)

After update

www.domain.com/store.asp?section=dept&deptid=1 (cat page) indexes

but not indexes the prod page : www.domain.com/store.asp?section=dept&deptid=1&parentid=2

AkanDian rain

7:03 pm on Sep 12, 2002 (gmt 0)

10+ Year Member



It's common for GoogleBot to stop after going a couple levels in. The exception tends to be sites that have high PageRank values.

WebGuerrilla

7:16 pm on Sep 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Google is pretty good about crawling urls with a single variable. But when you start adding more, the crawl percentage drops quite a bit.

mansfield smooth

7:47 pm on Sep 12, 2002 (gmt 0)

10+ Year Member



Wow, thanks for the quick replies.

We are at PR6 at the moment, that should imply it is because of too many variables in the querystring.

Will try to amend this and hopefully will get indexed again.

Thanks again :)

DaveN

8:40 pm on Sep 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



mansfield_smooth.

have you tried putting in a product site map on the root.

DaveN

bcc1234

8:42 pm on Sep 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



that should imply it is because of too many variables in the querystring.

Get rid of all parameters. That will get your site crawled up to 7 levels deep.

Grumpus

8:58 pm on Sep 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remember, Google will only crawl pages with a PR only a certain amount lower than your front page. If the page hasn't been crawled and doesn't have incoming links, it makes a PR assumption to determine whether to crawl it. (This is where the directory structure and PR -1 per "/" comes into play).

Similarly, your "&" works the same. Three "&"'s and 1 "/" as in your example means that if your "/index.asp" has a PR of 5, that your product page has a base PR of 1.

Here's another interesting thing...

If "/index.asp" has a PR5
Then "/Prods/Prods.asp" should have an index of 4, right? WRONG!
And "/Prods/Prods.asp?ID=1" should be three, right? WRONG!
And "/Prods/Prods.asp?ID=1&Cat=1" should be 2, right? RIGHT!

Somehow google takes into account the fact that since sometimes the page has parameters and in the second example above, it doesn't, that the one that doesn't probably duplicates content passed via another parameter.

In the example above, the guessed/base PR goes in the following order (from top to bottom) 5,3,4,2

The "?" actually helps.

(You can see an example of this because I've been too lazy to fix the link on my site since I noticed it). Go to my page in my profile and click on the "Movie Updates" in the "See What's New" box along the right. Now, check the toolbar PR. Click "NEXT" which will add an "offset=X" and check the PR. Then, click the "PREV" and it'll bring you back to the first page, but the "offset=0" parameter will be there. Check the PR. Viola! Pretty funky, huh? I really should get around to fixing that, eh?

G.

Giacomo

7:30 am on Sep 13, 2002 (gmt 0)

10+ Year Member Top Contributors Of The Month



mansfield_smooth,

We are at PR6 at the moment, that should imply it is because of too many variables in the querystring.

I suspect the reason for your product pages not being indexed is not "too many parameters", but the fact that your current product pages share their filename (store.asp) with your category pages: there is obviously a limit to the max. no. of dynamic pages (same filename + different querystring params) that Google indexes, so I guess that going from "prod_info.asp" and "prod_list.asp" to "store.asp" made Googlebot stop crawling your site earlier.

I definitely agree with bcc1234's suggestion: make your URLs spider-friendly (i.e., no querystring at all) if you want to have all your content indexed. Unfortunately, URL rewriting is slightly more complicated to implement with ASP+W2K than with (PHP or Perl)+Apache.

mansfield smooth

11:13 am on Sep 13, 2002 (gmt 0)

10+ Year Member



Thanks guys.

Will try to tweak the querystring first.

If still no luck then will try a full url rewrite.

Your advice should now get my content indexed.

Cheers :)