Forum Moderators: open

Message Too Old, No Replies

Session IDs Freshbot & Deepbot tend to favor

Recent activity observations on 2 new dynamic sites

         

skipfactor

8:18 pm on May 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I’m comparing 2 similar product sites here (California Widgets vs. New York Widgets):
1st site is 2 months old, PR3, single Yahoo backlink, product pages are laid out like: “product-detail.asp?id=00”
2nd site is 3 months old, PR4, many more backlinks, product pages are: “product-detail.asp?abc=00000”

1st site:
First Deepcrawl picked up all 77 products (id=1 thru id=77)

2nd site:
Has been through 3 deepdrawls and to date has yet to have any product detail pages (abc=00001-99999) indexed. It should be noted that I used a 3-digit identifier here that doesn’t contain “id”, whereas I used the actual “id=” on the 1st site. In addition, both sites have fewer than 100 products; I just used a 5-digit product id numbering convention on this 2nd site.

On May 9th, Freshbot grabs a new product added to the DB. The kicker is I didn’t have a new product id at the time, so I assigned it a “0”. So Freshbot grabbed a “product-detail.asp?abc=0”.

My Opinion:
It appears that at least on newer sites, Deepbot & Freshbot initially shy away from large ID numbers. Makes sense, they must think there’s perhaps more work there than they ready to commit to yet on a new customer.

Whenever possible, start ids at “id=1” instead of “id=10000”. Nice lesson from Freshbot & more DB work to do before the crawlers return. I feel like the next Deepcrawl will pick up my “id=10000”s, but it’s another long wait if he doesn’t.

kpaul

9:49 pm on May 13, 2003 (gmt 0)

10+ Year Member



Interesting observation. Thanks for sharing.

I wonder if using 'id' has anything to do with it?

sullen

10:01 pm on May 13, 2003 (gmt 0)

10+ Year Member



hmmm, I think this might be a coincidence. I have a site with similar dynamic product pages, with the form product.asp?prodid=ABC0089 and so far, 17 of 20 have been indexed (they've been available to Google for a month, previously banned by robots.txt for various technical reasons).

Google itself states that indexing of dynamic pages is dictated by a number of factors including PR and site size (I think....possibly I read that somewhere else). How big are your sites? Could that have something to do with it? Or I guess it could be that the cut-off is between 7 and 12 characters.

skipfactor

10:36 pm on May 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The sites are so similar. They are both regional: one sells under 100 big widgets for one company, the other the sells 100 big widgets for another company.

They both went online with around 75 pages. The 2nd older site (PR4, should be PR5 this update but who knows now) has put on 10 more static pages and many more backlinks but the products on both hover under 100.

It's just odd that Freshbot ONLY grabbed the first product detail page ever off of the site & it had a new anamoly id of one digit with product detail links all around it with 5 digit ids. Freshbot has returned since the first grab and only grabbed the one digit id again.

I'd say there are very high odds it's just a coincidence, but it just might expedite indexing of a new dynamic site if one starts at "id=1" whenever possible, couldn't hurt.

skipfactor

4:55 am on May 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On the 2nd site mentioned above, I changed all of my IDs to read "product-detail.asp?abcID=00" instead of "product-detail.asp?abcID=00000".

Freshbot is at the moment picking up all of these changed 2-digit id product detail urls/pages. And it's not because it's fresh w/ the digit change: these product detail URLs are dynamically generated and change frequently; Freshbot (& Deepbot) never gave them a look until that accidental "id=0".

skipfactor

1:34 pm on May 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I wonder if using 'id' has anything to do with it?

Didn't appear to here as I changed only the digits on id# & left the "abcID=" format intact.

This APPARENTLY had an effect on Freshbot as he is now so aroused he's disobeying the robots.txt and gobbling up disobeyed duplicate product detail pages (same product, different picture, different disobeyed page).