Database Normalization

I have what seems to me a fairly complex database (mySQL) question/discussion.

Background:
The site in question involves widgets (how original!). There is a bunch of information stored in the database about each widget (name, model #, region, bubbles-per-hour, category). On the site, users can browse the information by selecting one of several sorting schemes (alpha by name, alpha by region, etc). One such sorting scheme is a category based one. Users can click a category name and the page builds a list of links to the widgets in that category.

Implementation:
Here's what I'm currently working with. All information is stored in one table. For sorting by name, listing the contents is easy and fast: simply grab all the rows, alphabetize the name strings, and output the list. Same for alpha by region.

Sorting by category is somewhat more complicated. Here I have to pull all the rows, then loop through them, searching the 'categories' field in each entry for a match in the comma seperated string to the user selected category.

Since each widget can fall into one or more categories (If each widget had only one category, this would be easy. Just build a table for each category and output all the contents.), this ends up being a lot of searching. At a few hundred widgets, this is no big deal. But when numbers reach into the thousands, or, heaven forbid, the tens of thousands, searching every 'categories' field in the DB will quickly become unweildy.

A Solution (My Question)?
After reading about normalizing databases, I came up with the following 'plan'. What I don't know is whether this is really any better than the situation described above. I was hoping someone with more DB experience than myself might be able to tell me whether this will prove to be a more efficient way to organize and retrieve the data...

Remove the 'categories' column from the widget_info table and create a new table called widget_categories. Name the columns in this table according to the category names used on the page. For each widget entered into the DB, create an identical unique ID in each table. Stuff the widget info into widget_info, and set a boolean TRUE in any column of widget_categories that matches the categories of the widget.

When the user selects a category to sort widgets by, the script goes to the widget_categories table and searches for rows where that category is set to TRUE. It then takes all of the IDs that match the search, and pulls THOSE rows from widget_info to build the item list.

The Question:
Again, what I'm not sure of is whether or not this search (of widget_categories for TRUE in a given column of each row) will be faster and more efficient than pulling the string of category names from the column in widget_info and searching for a match. My suspicion is that it would be, but my experience with mySQL is limited and my knowledge of how databases run on a programmatic level is practically non-existent. Implementing this change will involve quite a bit of re-coding, so I want to make sure I've mapped out the most efficient means possible before starting to write the code. (Which means any other suggestions for increasing the speed of the DB interaction would be welcomed and appreciated!)

Also, I know you can set fields in mySQL to NULL/NOT NULL, but is there a way to set the data type for a field to a boolean (true, false) value?

Thanks in advance for any responses.

cEM

Database Normalization

Search speed and efficiency...

createErrorMsg

jollymcfats

createErrorMsg

jollymcfats

Salsa

createErrorMsg

jollymcfats

createErrorMsg

jollymcfats

jollymcfats

createErrorMsg

jollymcfats

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week