Forum Moderators: open
I have around 25,000 different individual, unique data pages on one site. There is a link structure on the site that allows all of these pages to be indexed, and it has worked quite successfully so far.
Theoretically there are a couple of extreme possibilities for the structure of the link hierarchy.
I could have a single page, reached from the home page, with 25,000 links on it. This would have the advantage of being very close to the home page in that each data page would then only be two clicks from the home page. However, I am aware that not only would there be substantial "dilution" but the spiders would not crawl all of such a big page.
At the other extreme, I suppose, you could have a series of 25,000 pages, each with one link on to the next page. The link dilution would be small, but the last page would be a LONG way from the home page!
No doubt the ideal structure lies somewhere in between, but where?
At present I have a nested structure that works by having around 50 links on each page. The first page links to 50 pages, and each of those pages links to a further 50 pages, each of which has around 50 links on - giving me "capacity" for 125,000 data items at this level.
So, 50 links per page and the data pages are 4 clicks from the home page.
I have simplified this somewhat to make it more general - but apart from certain obvious things affecting my own case (eg reducing the number of links per page to the cube root of my total pages), has anyone played around with balancing the depth vs number of links question?
All opinions very welcome...
There is a threshold beyond which PageRank seems to be lost (i.e. the pages linked to get less than the PR of the parent, minus 'd', divided by the number of links).
The threshold is more than 50 and less than 300 (I'm pretty sure of that). Google's guideline [google.com] of less than 100 per page seems to be the best indicator we have of where it is.
Also, look at it from a crawling perspective:
[stanford.edu...]
slides 13 to 17 (worst case scenario)
[stanford.edu...]
On my own large site, visitors have 5 basic "views" of the database that they can drill down on, although only the alphabetical one leads down to every single page.
I agree with you in principle - and it may be something that we change - but the simplest and best access to our data for our visitors is supplied by dedicated search forms, which will be much faster in getting the users to relevant content.
Unfortunately, as we know, the spiders won't follow these, much less fill them in with a variety of suitable terms!
The links are accessible to users but I haven't made them more user-friendly because I don't want users, particularly new users, to think that this is the way we present our data.
It was only the fact that SEs can't access the data that led me to build the index...
I suppose if it is there it might as well be usable. Another site owned by my employer (but not run by me) has adopted your approach with massive success...
cheers, Henry
Look at Amazon, for example. Probably most people *initially* use the search box, but Amazon's database is criss-crossed with relevant links: books by author, bestseller lists, customer recommendation lists, books by subject, "Customers who bought this book also bought" and so on.