Page depth versus number of links - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Page depth versus number of links

optimal PR passing strategy?

HenryUK

12:39 pm on Mar 13, 2003 (gmt 0)

10+ Year Member

A subject that I hope is of general interest.

I have around 25,000 different individual, unique data pages on one site. There is a link structure on the site that allows all of these pages to be indexed, and it has worked quite successfully so far.

Theoretically there are a couple of extreme possibilities for the structure of the link hierarchy.

I could have a single page, reached from the home page, with 25,000 links on it. This would have the advantage of being very close to the home page in that each data page would then only be two clicks from the home page. However, I am aware that not only would there be substantial "dilution" but the spiders would not crawl all of such a big page.

At the other extreme, I suppose, you could have a series of 25,000 pages, each with one link on to the next page. The link dilution would be small, but the last page would be a LONG way from the home page!

No doubt the ideal structure lies somewhere in between, but where?

At present I have a nested structure that works by having around 50 links on each page. The first page links to 50 pages, and each of those pages links to a further 50 pages, each of which has around 50 links on - giving me "capacity" for 125,000 data items at this level.

So, 50 links per page and the data pages are 4 clicks from the home page.

I have simplified this somewhat to make it more general - but apart from certain obvious things affecting my own case (eg reducing the number of links per page to the cube root of my total pages), has anyone played around with balancing the depth vs number of links question?

All opinions very welcome...

ciml

12:58 pm on Mar 13, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Henry, your 50 links per page approach seems sensible.

There is a threshold beyond which PageRank seems to be lost (i.e. the pages linked to get less than the PR of the parent, minus 'd', divided by the number of links).

The threshold is more than 50 and less than 300 (I'm pretty sure of that). Google's guideline [google.com] of less than 100 per page seems to be the best indicator we have of where it is.

vitaplease

1:09 pm on Mar 13, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

As ciml, says.

Also, look at it from a crawling perspective:

[stanford.edu...]

slides 13 to 17 (worst case scenario)

davemarks

1:30 pm on Mar 13, 2003 (gmt 0)

10+ Year Member

Sorry to slid of topic slightly, but...

vitaplease just out of interest, assuming that link was one of your coursenotes at uni or some such.... What course are you doing, or what course does that relatesto?

Thanks

vitaplease

1:36 pm on Mar 13, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

davemarks, just rip the last part of that url..

[stanford.edu...]

davemarks

5:12 pm on Mar 13, 2003 (gmt 0)

10+ Year Member

Cheers, should have though of that :( Thanks

jomaxx

6:00 pm on Mar 13, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

IMO your 50/50/50 concept is pretty close to optimal in a theoretical sense, but my belief is that you should build your site map for humans rather than robots. That means you have to organize information alphabetically, chronologically, by subject, etc., instead of just creating a laundry list of URL's. Unfortunately that also means that the final solution will be irregularly structured and you may have trouble keeping to 100 links on a page.

On my own large site, visitors have 5 basic "views" of the database that they can drill down on, although only the alphabetical one leads down to every single page.

HenryUK

6:14 pm on Mar 13, 2003 (gmt 0)

10+ Year Member

jomaxx

I agree with you in principle - and it may be something that we change - but the simplest and best access to our data for our visitors is supplied by dedicated search forms, which will be much faster in getting the users to relevant content.

Unfortunately, as we know, the spiders won't follow these, much less fill them in with a variety of suitable terms!

The links are accessible to users but I haven't made them more user-friendly because I don't want users, particularly new users, to think that this is the way we present our data.

It was only the fact that SEs can't access the data that led me to build the index...

I suppose if it is there it might as well be usable. Another site owned by my employer (but not run by me) has adopted your approach with massive success...

cheers, Henry

jomaxx

6:32 pm on Mar 13, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I know what you mean about access via searchboxes, but I personally find forms-only sites to be intimidating. People intuitively like to browse, not type keywords into a blind searchbox to see if anything comes up.

Look at Amazon, for example. Probably most people *initially* use the search box, but Amazon's database is criss-crossed with relevant links: books by author, bestseller lists, customer recommendation lists, books by subject, "Customers who bought this book also bought" and so on.