Database design

Forum Moderators: open

Message Too Old, No Replies

Database design

rknuppel

5:04 pm on Aug 26, 2005 (gmt 0)

I have a quick question regarding database design. I run several MYSQL databases and I have always got in the habit of including an autoincrement ID along with each row of almost all tables. This is so I can link up to any row by an ID. My question is: Is this bad database design? For example.

Lets say I have a Teams table with, TeamName, Mascot, Wins, Losses. I include an ID at the beginning so I can link on a team by their ID. Should I not include this and just link using the TeamName? I do this sort of thing all over the place. Then I even have certain linking tables that pretty much contain all ID's... like TeamsConferences has ID, TeamID, ConferenceID.

I'd love to hear your thoughts on this kind of setup. And how performance might be affected by going either way here.

txbakers

5:42 pm on Aug 26, 2005 (gmt 0)

having an auto-increment for a row ID is a good practice. It is an easy and accurate way of identifying a unique row.

Using that ID as the foreign key (to link tables, etc.) is OK also, but a bit more dangerous. If ever you need to reformat that table, it's possible that those keys could indeed change, rendering your joins a mess.

I think the best solution is to still use the ID as the unique key in the table, but to create another id number for that item throughout your database.

A bit more programming, but safer in the long run.

rknuppel

5:52 pm on Aug 26, 2005 (gmt 0)

That is exactly the problem I'm encountering. I'm trying to reload a table and now other tables are gonna be screwed up because of this auto id. It won't be the same. So if I used another ID to join on, what is the purpose of the autoincrement ID still?

txbakers

8:07 pm on Aug 26, 2005 (gmt 0)

The autoincrement will give you a unique ID to THAT ROW, regardless of the data. That's very useful if you have to modify/delete something from just that table, you don't have to reference it by other sources.

PLUS, you can control your semi-unique numbers based on the data.

globalissa

9:54 pm on Aug 26, 2005 (gmt 0)

Thought an actual structure might help illustrate the relationship:

CREATE TABLE `your_tablename` (
`id` int(10) NOT NULL auto_increment,
`username` varchar(40) default NULL,
`password` varchar(40) default NULL,
`lastupdate` timestamp(14) NOT NULL,
`foreign_key` int(6) default NULL,
PRIMARY KEY (`id`)
)

... so the id field is autoincrementing and you can have as many as you like. The foreign_key field is used to link into a related record in another table.

physics

9:56 pm on Aug 26, 2005 (gmt 0)

I think the best solution is to still use the ID as the unique key in the table, but to create another id number for that item throughout your database.

Can you clarify this a little?

You mean have like
autoid,id,Team,...
34234,5,My team,...

Where autoid is auto-increment and id is say created by your script when the team is entered?

Under what conditions do the auto increment field values change when using, say MySQL?

ergophobe

3:32 pm on Aug 27, 2005 (gmt 0)

If you have it do full inserts when you dump your mysql data, the autoincrement fields should never change (by which I mean that they shouldn't have to change).

Where they would change is if your MYSQL dumps look like this for the table above

INSERT INTO `your_tablemname` (`id`, `username`, `password`, `lastupdate`, `foreign_key`) VALUES (NULL, 'me', 'mypass', NOW(), 'fk');

INSERT INTO `your_tablemname` (`username`, `password`, `lastupdate`, `foreign_key`) VALUES ('me', 'mypass', NOW(), 'fk');

In this case you will get all new consecutive autoincrement IDs. If you do full inserts like

INSERT INTO `your_tablemname` (`id`, `username`, `password`, `lastupdate`, `foreign_key`) VALUES ('21', 'me', 'mypass', NOW(), 'fk');

Your autmo-increment value will stay the same.

physics

3:52 am on Aug 28, 2005 (gmt 0)

OK, thanks for the clarification. But it seems so redundant to me to have two id fields. Also, if you're using scripts to enter the data say from user input then you have to calculate a secondary unique id every time you insert something as opposed to just letting mysql do it's job with auto increment? Not saying it's wrong just saying it's a pain.
So if I back up my databases with, say, mysqldump --opt then is there any danger of the auto increment fields getting messed up? If not I might prefer to avoid the extra complications of the extra id field and just build in some sort of 'rebuild' function into my script that will rebuild the relationships between the tables with the new unique ids should a situation arise where the ids got messed up.

txbakers

2:52 pm on Aug 28, 2005 (gmt 0)

It is a bit redundant to have two ID fields, but they are for different purposes. Remember - a table has rows. The auto-increment field gives a perfect ID to THAT ROW. this is regardless of the data in the databse. In fact, I usually call my auto-increment field "rrn" which back in main frame days stood for "relative record number". It has nothing to do with the data.

Then, within the data, I have different codes to identify users. So if one user creates four "teams", I would also write the code to give each team a unique ID number, which would be part of the data, NOT dependent on the row.

Yes, it's a bit more work. But it serves two purposes, first, it gives a data-centric ID, which serves as the identified for that TEAM throughout the database, regadless of the row it's on. And TWO - it keeps the numbers a reasonable length.

I kind of regret using the RRN for entry ID now. In one table I'm up over 100,000 entries, so the resulting "entry number" is indeed in 6 figures. So for another 6 years I'll be OK with a 6 digit number, but it didn't have to be that way, if I made the number unique for the USER, not the DATABASE.

Hope that clears it up a bit. All you said was true, it is a bit redundant, and does require extra coding, but if ever I need to reformat the tables the DATA identifier will be there while the ROW identifier might change.

rknuppel

3:00 am on Aug 29, 2005 (gmt 0)

Thanks for all the discussion guys! Good stuff. I think I understand the idea of having a row ID (RID, autoincrement) and a data ID (DID, generated when the row is inserted). But my main question now is, how do you go about generating a DID? Like if I had 4 teams to insert into a Teams table. What kind of logic would you use to give these teams a DID?

Also, not sure I see the benefit of having a RID too? I mean shouldn't the DID be enough? I still don't see a reason I'd ever use a RID if I had a DID (which is unique to each row right).

Thanks.

physics

4:29 am on Aug 29, 2005 (gmt 0)

Good points rknuppel. Yeah for tables that just hold one type of data such as team info then a unique DID for each row should be enough (I THINK ... ).
The logic I'd use would be: when I want to insert a new row, find the maximum value for all existing DID for that table and then add 1 to it ... use this for the new record's DID.
Of course this is more or less what auto-increment is supposed to do (and I assume it does it more efficiently) so it's a shame really that it's possible for auto-increment fields to get changed on a table reformat.

txbakers

12:07 pm on Aug 29, 2005 (gmt 0)

The reason for having an auto-increment RID would be to help you make updates/deletes easily.

If your table had 10 columns, and you didn't have a unique RID, you would have to use all 10 fields to get a unique row (if you could) to update/delete that row.

If you had a unique ID for the Row, you could just reference that.

To make a unique DID, you would have to check the MAX(DID) and add 1.

The other advantage to do ing that is that the identifiers for each team seem to make sense. Why would a new team start out with team ID 646?

Still, for all my pontificating, I use both. it's more work to do the unique DID, I think it's better programming, but when I don't have much time I rely on the unique RID.

aspdaddy

1:28 pm on Aug 29, 2005 (gmt 0)

I think the autoincrement is a good idea as long as you remember its maintained by the system and plan carefully for any import etc. I use it on all my tables and use it in the joins and so far havent had any problems with it.

On MSSQL you have the option of turning IDENTITY INSERT on and off which helps when importing data.

You can also add a compurted column to the table to create a more user friendly key, generated from the system key e.g (T-SQL)

[Sales ID] AS (right((replicate('0',6) + convert(varchar(6),[ID])),6))

The main arguments against that I have heard are for distributed servers where maintaining an increment over several servers is problematic.

The main benefit as stated by txt is it removes the need for composite keys to locate a single record. But you should still make those keys as unique constraints, just not primary keys.

I dont really see the reformat problem. How can you delete records when theres a foreign key constraint?

ergophobe

12:45 pm on Sep 1, 2005 (gmt 0)

build in some sort of 'rebuild' function into my script that will rebuild the relationships between the tables with the new unique ids should a situation arise where the ids got messed up.

Devil's advocate asks: how are you going to rebuild your relationships if your ids get messed up? (sounds like a question about identity theft!)

it's a shame really that it's possible for auto-increment fields to get changed on a table reformat.

Ahh, but what would you say if it were impossible to change auto-inc values? That would be a shame as well.

how do you go about generating a DID?

Is this going to be merely a foreign key or also a reference number (i.e. "your reservation number is" doesn't really want a 64 character string)? In PHP you can use a timestamp, the uniqid() function with added entropy and some hash function and virtually guarantee a unique id. Of course you still have to check, so it's a query to the server either way (i.e. check for max value and guarantee uniqueness or check for uniqueness and possibly regenerate).

The main arguments against that I have heard are for distributed servers where maintaining an increment over several servers is problematic.

That's the only time I've bothered myself - when you need to ensure uniqueness across different machines. E.g. I had an Access database where independent replicas got synchronized and I wanted the ids to stay the same, so I avoided collisions by essentially using an id that included a reference to the data enterer so say I could never end up inputting a record with the same number as a number that you input, because the last digits always keyed to me.

I dont really see the reformat problem. How can you delete records when theres a foreign key constraint?

I don't really see the reformat problem either. Or rather, I don't see how having a second ID helps you there. To answer the question though, MySQL does not enforce referential integrity (yet), so you can delete anything anywhere.

grandpa

1:08 pm on Sep 1, 2005 (gmt 0)

MySQL does not enforce referential integrity

Which means a little more work for a programmer. I'm in with the RID group. I only use them to update or delete a row, knowing that no other row was affected. There is no relationship between any tables using auto-index. Enforcing data relationships is my job. Processing rows is the job of MySQL.

ergophobe

8:46 pm on Sep 3, 2005 (gmt 0)

Enforcing data relationships is my job.

That's sort of like comparing programming languages and saying "type juggling is my job". It is in C, but it isn't in Scheme. The folks who write low-level embedded software naturally have no interest in Scheme and want to type-juggle. The AI crowd want the computer to do as much as possible so they can focus on a higher level of abstraction.

The only RDMS I've worked with is MySQL, so I've always had to enforce referential integrity myself, but it's only my job because I happen to work with a given platform.

txbakers

10:33 pm on Sep 3, 2005 (gmt 0)

but it's only my job because I happen to work with a given platform.

Exactly. In fact, it's one of the main reasons I'm considering moving to mssql in the future. Certainly would be a lot easier to issue one delete command rather than three.

ergophobe

11:59 pm on Sep 3, 2005 (gmt 0)

And if you're still on mysql 3.x, you don't have transactions either, so you can never be sure that you really have ensured data integrity without going to great lengths. I end up with "rollback" arrays so that after all the deletes or inserts are done, I can check to make sure that they went okay and, if not, try again and if not, restore the data to the prior state.

Honestly, I think all that should be the RDMS's job, not the programmer's. But then again, I like Scheme. So there you have it.

physics

3:06 am on Sep 7, 2005 (gmt 0)

Devil's advocate asks: how are you going to rebuild your relationships if your ids get messed up?

Right, doh!

aspdaddy

9:57 am on Sep 8, 2005 (gmt 0)

Certainly would be a lot easier to issue one delete command rather than three.

Not so. If I delete an Invoice on my databas it automatically deletes all the invoice lines for me. But if I try to delete a Product it moans because there are related sales and I dont allow cascades on that table.

If a database doesnt support foreign key constraints, cascades and triggers then its not much better than a multi-user spreadsheet is it?

Honestly, I think all that should be the RDMS's job, not the programmer's.

It is - by definition, thats what the R stands for in RDBM. Rule 10.
[en.wikipedia.org...]

Database design

rknuppel

txbakers

rknuppel

txbakers

globalissa

physics

ergophobe

physics

txbakers

rknuppel

physics

txbakers

aspdaddy

ergophobe

grandpa

ergophobe

txbakers

ergophobe

physics

aspdaddy

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week