Forum Moderators: coopster & phranque

Message Too Old, No Replies

DB Schema for Directory Database

Advice on best set up?

         

Nick_W

8:53 am on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi all,

I'm at the beginning stage of creating some directory software and hoped to get some thoughts/advice on the following. My main issue is the editors table, I'm assuming I can find the category admin by following the parent category to its root.

Does anyone see any problems with the following?

Categories table

* catid -> int pri key auto inc
* editorid -> int not null
* catparent -> int null
* catname -> varchar(100) not null
* catdesc -> text null

Links table

* linkid -> int pri key auto inc
* catid -> int not null
* title -> varchar(100) not null
* desc -> varchar(255) not null
* url -> varchar(255) not null
* dateadded -> date null
* email -> varchar(100) not null
* validated -> enum(y,n)

Editor table

* editorid -> int pri key auto inc
* name -> varchar(100) not null
* email -> varchar(100) not null
* status -> enum(masteradmin, catadmin, editor)

The masteradmin can edit anything and, add new catagories. The catadmin can edit anything and add sub-catagories in his section. An editor can only manipulate the listings in his own category.

Many thanks for your thoughts...

Nick

Woz

9:19 am on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Assuming you are going to make the directory searchable, I would, read that as I do, add keywords to both the Cats and Links table. Despite the misuse of the Keyword tag in times past, a good set of keywords can be very useful for returning relevant results. If you have administrative control over entries you can assure the quality of the keywords. It also reduces the prospect of returning "Sorry, no results found" which is a turnoff for the user.

If you are going to rank the entries rather than list them Alphabetically, then you need ranking fields.

I would add DateLastEdited to Categories, and DateLastConfirmed to links which can be either set manually or set by a checking spider.

I also add an Active field to both Cats and Links to turn either on or off. If you find a link 404 or server down, better to turn the link off pending confirmation than delete it and have to re-enter the information. Having the active field in Cats means you can populate a new category behind the scenes and then switch it on when you are happy with the contents.

A few thoughts

Onya
Woz

jmccormac

9:33 am on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



An alias table may be a good addition. The catid sound be unique by default (it is I think in the cat table).

Also extending Woz's 'Active' suggestion to the editor table would be a good thing. The editorid in the categories table seems to be unique and may cause a separate cat id to be generated if there is another editor (been working through the night so this may not make sense. :) ) It may be better to have a separate categories/editors permissions table. The duplicate catid has been at the core of Dmoz's problems for a while now.

I think that there was a directory database example (Php/MySQL) in one of the Wrox Beginning PHP4 books.

Regards...jmcc

bcc1234

9:47 am on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you just want to have authorization in the binary form (allowed/not allowed) then you can create another table that would link editors to categories. And use application logic to populate that table.

Something like:

category
-catid
...

editor
-editorid
...

cerel
-catid
-editorid

After that just add all combinations of users and categories.

This approach does not scale well, but works out pretty good for smaller systems.

If you want to implement multiple access levels (ie, can edit categories, but not add) - then you might want to create user groups and tasks (or commands).

You might create independent modules with app logic (commands).
For example:
EditAnyCategory
EditOwnedCategory
EditOwnedAndBelowCategory

And user groups that would contain user lists.

Then you would have a tables with groups,command and with permissions:

editor
...

egrel
-editorid
-groupid

group
...

command
...

cmdaccess
-commandid
-groupid
-access

So if a particual user is in a group that has access to EditOwnedCategory - he can only edit that category; if any - well then any category; if he has access to EditOwnedAndBelow - ....

So you can just create different commands and define tasks -
AddAnyCategory
AddCategoryBelowOwned
RemoveAnyCategory ...
EditLinkInOwnedCategory
EditLinkInAnyCategory

That's the most flexible approach I've ever seen.
You can just split the functionality into independent commands. You would also need a command dispatcher to check access and execute the logic.

So EditAnyCategory - would not need to check if the ediot has access to a particular category, but EditOwnedCategory would need to check if the editor is actually in the group that is linked to the category in cmdaccess.

Also, you might want to user another table to relate links to categories, just in case you will need to place the same link in several categories. Also, you might want to add some sequence variable if you ever need to control the order of the entries (not but the name or date).
Something like:

lcrel
catid
linkid
seqnum
Make seqnum real (float). That will make things easier later on. You can basically return results sorting first by the seqnum and then by alphabet. That way if seqnum is null (or some set value) by defaul - it won't be relevant until you need it.

You can do the same thing with categories. Instead of any category having only one parent - make it a separate table.

P.S. I don't feel like checking spelling. This thing came out way too long :)

aspdaddy

10:01 am on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Nick, I have done one of these db's and the problems surface when its full of links and hard to change the design :(

So I would change a couple of things, just for peace of mind.

Dont allow any nulls in the database, use -1, empty string, special date etc to indicate missing information. I you need to scale it or change platform later on nulls in the schema can cause a lot of headaches.

Make sure link_id is type long int :)
Make validated a bit, or bool

I would put status in its own table, and a status_id in the editor table again just for consistency. If you keep the design normalised (3NF) all your querying will be consistent and lfe will be much easier

To rank links I added times_viewed(long) to the main table, this caused overhead but couldnt think of another way. Guess you could do the same for the cats table and come up with a formula.
rank = catrank / linkrank?

For keywords/searching I added a table , with two cols & composite key:
link_id(long int), keyword(string 32)
In retrospect this was a bad idea, coz every link has to have keywords entered. If could spider the link and bring back the k/words though, it may be ok.

Hope this helps.

Nick_W

10:55 am on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



add keywords to both the Cats and Links table
Yep, silly oversight. Cheers!

If you are going to rank the entries rather than list them Alphabetically, then you need ranking fields
Yes, I'll do a 'recommended' style thing for the top 2/3 and the rest alphabetically.

DateLastEdited to Categories, and DateLastConfirmed
Yes, not certain on how to implement DateLastConfirmed in the admin module but I'm sure I can work it out ;)

add an Active field to both Cats and Links to turn either on or off
Damn fine idea...

An alias table may be a good addition. The catid sound be unique by default (it is I think in the cat table).
Do you mean instead of the catname in the categories table I would have a seperate table for the category names? - Allowing me to change names without muching things up?

The editorid in the categories table seems to be unique and may cause a separate cat id to be generated if there is another editor (been working through the night so this may not make sense. ) It may be better to have a separate categories/editors permissions table. The duplicate catid has been at the core of Dmoz's problems for a while now.
Sorry, I just can't grasp that. Sometimes I'm not as quick as I'd like to be :) Can someone explain in idiot terms what this means?

bcc1234

This approach does not scale well, but works out pretty good for smaller systems
It's gotta scale...

I only half get what you're saying about groups and commands but it looks pretty neat and flexible. I'll have to re-read a few times to see if I can get my brain round it. In the meantime, if anyone can can make it more wilsonfriendly then that would be smashing!

aspdaddy

I you need to scale it or change platform later on nulls in the schema can cause a lot of headaches.
I can't envision having to port it to anything but another SQL DB, would nulls still be an issue?

Make validated a bit, or bool
How's that done in SQL?

I would put status in its own table, and a status_id in the editor table again just for consistency. If you keep the design normalised (3NF) all your querying will be consistent and lfe will be much easier
Yes, I thought about this for a while and even wrote it down but in the end settled for the above as I could see no benefit to having it seperate. So, what am I missing? ;)

Thanks All!
It's at times like this I really appreciate WebmasterWorld, sometimes I almost take it for granted but, this thread just goes to show how much value we all get here huh?

Many thanks guys...

Nick

bcc1234

1:05 pm on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I only half get what you're saying about groups and commands but it looks pretty neat and flexible. I'll have to re-read a few times to see if I can get my brain round it. In the meantime, if anyone can can make it more wilsonfriendly then that would be smashing!

Ok, I'll give you an example of a few use-cases and you can generalize from there.

You have the following objects: user, group, command, category, link.

In your database you have:
user
-usernum
...

group
-groupnum
...

command
-cmdnum
-cmdname
...

groupcatrel
-groupnum
-catnum

cmdaccess
-cmdnum (fk)
-groupnum (fk)
-acclevel (either boolean or an integer if you want multiple access levels for your commands)

groupuserrel
-groupnum
-usernum

I'll drop the rest of the tables like category and link.

You create a command dispatcher that would be invoked with a command name among other parameters. The job of the command dispatcher is to keep user session and check access in cmdaccess. Once the checks are complete the appropriate command code is invoked.

Let's say you want some particular set of users to be able to add listings to THEIR categories, and edit the name of the category itself.

So you create two commands:
-AddListingToOwnedCategory
-EditOwnedCategoryName

Let's say you have two such users id 1 and 2.
You create a group for them (For all users that are allowed to only do those two things).
The group id would be 1.

You also have a category "Stuff" with id 1.

So you have:
user
1 "mike"
2 "john"

group
1 "limited mods"

groupuserrel
1 1
1 2

category
1 "Stuff"

groupcatnum
1 1

command
1 AddListingToOwnedCategory
2 EditOwnedCategoryName

cmdaccess
1 1 true
2 1 true

What should those commands do:
AddListingToOwnedCategory - once the dispatcher calls this command, the command knows that the user is authorized to call it. The command itself has to check if the user is in the group linked to that category. If he is - the command adds a listing specified by other request parameters. If not - displays an error saying the user cannot add to THIS category.

What happends with EditOwnedCategoryName - pretty much the same thing. If the user's group is linked to the cat - the name edit goes through.

So the job of the command dispatcher is to track user's session and check the access.
As I mentioned before, if you create another command AddListingToAnyCategory - the command itself does not need to check the user's permissions regarding the category, it can just add the listing. If the dispatcher allowed that command to execute for that particular user - it's all fine.

So basically you create as many fine-grained commands as you need to perform tasks.
You can have something like
AddListingToOwnedCategory
AddListingToOwnedOrBelowCategory
AddListingToAnyCategory

All three commands do the same thing, but limit acceess in different ways (that's not command authorization, but business logic permissions). The authorization is done by the command dispatcher. No need to have redundant code in hundreds of commands.

You use groups to assign what particular users can do.

After you planned all your commands, plan your groups.
For example:
"Plain mods" - might be linked to commands: AddListingToOwnedCategory, EditListingInOwnedCategory, DeleteListingFromOwnedCategory.

"Meta mods" - will have the same access and in addition: AddListingToOwnedOrBelowCategory, etc...

"Admin" - will have commands like: AddListingToAnyCategory, DelteAnyListing, EditAnyCategory.

Uff.

That's about it.

It might look like a huge overkill, but I designed a system based on the same model over 3 years and the software works great (4 major revisions and still the same model). I have my store (the one in the profile) using it. It only has something around 5 basic commands: AddToCart, AddPaymentToOrder, etc..

The same framework supports message boards, accounting/inventory tracking intranet apps, and lots of other stuff I developed for my clients.

If you spend a bit more time and include error handling into the command dispatcher - your commands might just pass errors to the dispatcher for a formatted output.
So you'll have a basic framework to build upon.

Hope this helps.

Hmm, I don't feel like checking my post. Again, it got too long. If something does not make sense - I'll go over it tomorrow.

Nick_W

4:12 pm on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Right, I understand much better now, thanks for taking the time to show me what you mean ;)

However (hehe, you knew it was coming...) I don't see why my permission checks can't be handled by a seperate script/class that will do all the perm checks and then execute the function.

Seems to me to be much simpler that way?

Here is the DB Schema after all the help given in this thead:

Categories table

* catid int pri key auto inc
* edid int not null
* catparent int null
* active enum(y,n)
* catname varchar(100) not null
* catdesc text null
* catkeys text null // keywords
* catlastedit date null

Links table

* linkid int pri key auto inc
* catid int not null
* active enum(y,n)
* title varchar(100) not null
* desc varchar(255) not null
* keys text null // keywords
* url varchar(255) not null[blue]
* rec [blue]enum(y,n)
// recommended?
* added date null
* lastchecked date null
* email varchar(100) not null
* validated enum(y,n)

Editor table

* editorid long int pri key auto inc
* name varchar(100) not null
* email varchar(100) not null
* status enum(masteradmin, catadmin, editor)

Can I normalize or improve on that guys?

Cheers

Nick

jmccormac

12:33 am on Jan 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>>The editorid in the categories table seems to be unique and may cause a separate cat id to be generated if there is another editor (been working through the night so this may not make sense. ) It may be
better to have a separate categories/editors permissions table. The duplicate catid has been at the core of Dmoz's problems for a while now.

The catid is auto incrementing but the editorid in the categories table is singular. What happens when you add another editor to a category? (I may be wrong on this as my SQL is not that good.) It looks like a new row will be created for each different editorid thus creating a number of catids for one particular category. Having a unique catid is very important for the structure of the directory. If it is not unique, then there is the risk of links ending up in the wrong category or not being included at all.

It would be best to keep the editorid aspect out of the categories table and create a number of catid fields in the editor table to allow an editor to edit multiple tables.

The alias table would essentially give the names by which the category is also known as:
catid, catname, alias

The editors/catid permissions table would essentially be:
catid, catname, editorid

You could tweak it and allow editor type and active in that table as well.

Regards...jmcc

brotherhood of LAN

12:50 am on Jan 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



are you going to add a search to the directory?

If the category names are repeated, i.e. "associations","news" you could replace them with integers and have a table of words/phrases with id's

If someone searched for "directories" when your cat titles are "directory" you could also add a 'stem' field in the word table to morph different words with the same meaning into one useful word by backreferencing to the word "directory".

aspdaddy

10:34 am on Jan 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>The catid is auto incrementing but the editorid in the categories table is singular.

In this design thats how it should be, because editorid is a foreign key on the editors table.

>It would be best to keep the editorid aspect out of the categories table

If you want multiple editors per cat that is the way to go. The relation between editor and category would be a new index (table):

category_editor_index:
editor_id, category_id

It all depends if you really want a normalised design, it would mean adding more indexes for keywords:

keyword_link_index:
keyword_id, link_id

keyword_category_index:
keyword_id, category_id

and a keyword table:
keyword_id, keyword

ukgimp

10:49 am on Jan 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nick

If any of these ideas have been covered and I have missed them then sorry folks.

1. It’s a pain but directories need to be reviewed periodically, so why not create a field for next review date that is set to 6 months since last checked. Then you can get them to flag up when a cat editor views their section. After all site can still be automatically linked checked for code 200 but what if the content has changed drastically or the domain hijacked.

2. Click thru tracking option for those that want it. OK it can make the links “invisible” bust customers may want this option.

3. What happens if you want more than one editor per cat. I was told of a hack where you create a field separated by $ or double pipe? But that will not scale well I reckon.

4. A separate table of countries so that if you want a local can be set.

5. A classification scheme to the content of the site (pdf, sound, video etc) so it can be shown and searched for in an advanced search.

6. Would a separate table for user levels be an option that way more could easily be added and someone could be more than one level if required. Might be useful for test/admin procedures..

7. Rather than active/non active enum type stuff. What about more than on/off. You may have a site that you never want included but why risk and editor adding it after another has declined it. So active/not active/do not include.

8. Good admin search for URL’s keywords etc.

9. To aid with ranking what about some sort of quality quotient that would be subject (score out of ten). In addition to an editors choice.

That is for now.

Cheers

Nick_W

2:47 pm on Jan 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




>It would be best to keep the editorid aspect out of the categories table

If you want multiple editors per cat that is the way to go. The relation between editor and category would be a new index (table):

category_editor_index:
editor_id, category_id

I like this idea but, would it not slow things down some? There is no logical way to add a key to it?

Nick

jmccormac

1:35 am on Jan 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>>>It would be best to keep the editorid aspect out of the categories table
>>If you want multiple editors per cat that is the way to go. The relation between editor and category would be a new index (table):
>>category_editor_index:
>>editor_id, category_id

>I like this idea but, would it not slow things down some? There is no logical way to add a key to it?

As a quick fix you could have an auto incrementing id as the primary key. The solution for the index above using editorid and catid would work and would probably be fast enough.

Regards...jmcc

Clark

2:12 am on Jan 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'll have to bookmark this link. An awesome discussion on how to set up a database and the types of issues that come up. Well done!

aspdaddy

5:47 pm on Jan 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>There is no logical way to add a key to it?

Most db support a composite key, both fields together will be unique, but you can add an identity/autonum if you like.

Speed can be an issue beacuse inner joins are seen as expensive, but on the other hand you have smaller better indexed data.