Forum Moderators: phranque
Suppose there is an accomodation database which had a column to store the type of accomodation in. There are hotels, b&b´s, lodging houses, appartments, farm holidays, campsites etc. And there are Hotels, Frühstückspensionen, Pensionen, Bauernhöfe, Campingplätze. I.e. the type of accomodation needs to be stored in different languages. Neither type is mutually exclusive.
Now my question is this:
Do I do my own bit bashing and de- and encoding on storage or retrieval?
Do I use said SET column type which does the bit bashing for me at the price of having to store redundant information in my database.
I know I can do it both ways. But what are the implications when it comes to speed, maintainability, etc?
I think the way I would personally do it would be to have some kind of translation table and store only numeric values for each accomodation type.
I was thinking about accomodation types being numbered 1-10 and then depending on what lang the part of the site is in it could use the de_type table or the en_type table, for example. On selects you could maybe do joins with the appropriate lang table and maybe this would be the quickest.
It really is a tough situation. Maybe you can carry on from there and we can see what ideas other people might have.
I was thinking about accomodation types being numbered 1-10 and then depending on what lang the part of the site is in it could use the de_type table or the en_type table, for example. On selects you could maybe do joins with the appropriate lang table and maybe this would be the quickest.
This approach would work quite well if the relationship between company and accomodation type would be 1:1. With the two languages added to the picture this would result in a 1:n relationship.
main table
id ¦ name ¦ type
----------------
01 ¦ wxyz ¦ 0001
02 ¦ abcd ¦ 0002
en_type table
id ¦ name
----------
01 ¦ hotel
02 ¦ b&b
de_type table
id ¦ name
----------
01 ¦ Hotel
02 ¦ Frühst
However, since the relationship between company and accomodation type is 1:n we are left with a n:n relationship. We would have to use a separate table.
main table
id ¦ name
---------
01 ¦ wxyz
02 ¦ abcd
company to types
id_comp ¦ id_type
-----------------
0000001 ¦ 0000001 // company 1 is hotel and b&b
0000001 ¦ 0000002
0000002 ¦ 0000001 // company 2 is just hotel
en_type table
id ¦ name
----------
01 ¦ hotel
02 ¦ b&b
de_type table
id ¦ name
----------
01 ¦ Hotel
02 ¦ Frühst
I´d consider that pretty messy just for the sake of storing a dozen names in two languages.
we can see what ideas other people might have.
I´d consider that pretty messy just for the sake of storing a dozen names in two languages.
Isn't that what a relational database is for? I'm so far from an expert that it isn't even funny, but I've done a some work on a multi-lingual shopping cart and some other sites that use a lot of such tables.
SPEED - In terms of speed, it benchmarks pretty well if your queries are well-structured (use the MySQL EXPLAIN and also turn on the MySQL logging logs while developing and look at the logs to see which queries are hurting you). The query on the company_to_type table should be super fast as long as you index the company col (no point in indexing the type col as there will be too much redundancy) and so should the query on the types tables since they are integer queries on the primary key, not to mention that they are small tables.
MAINTAINIBILITY with many tables - I think it's better than the alternatives. Lets say you want to track user ratings of given hotels. Users rate them on a scale of 1-10. You track number of users and total points and use that to calculate rating. Now you work on a local copy of the site and want to upload new hotel data. If the rating info is in the same table as the hotel info, you have to synchronize the local and live tables. If they are separate, you simply overwrite hotel info table and the rating info take care of itself.
One small thing, I would suggest the table names
types_en
types_de
rather than en_type, de_type, since I think it's usually easier if the names are such that related tables sort together during alphabetical sorts.
Also, depending on what you're trying to do, you don't necessarily need a table for each language (this is the OSCommerce way; not sure whether it's better or worse)
main table
id ¦ name
---------
01 ¦ wxyz
02 ¦ abcd
company to types
id_comp ¦ id_type
-----------------
0000001 ¦ 0000001 // company 1 is hotel and b&b
0000001 ¦ 0000002
0000002 ¦ 0000001 // company 2 is just hotel
hotel_type table
id ¦ type ¦ language ¦ name
----------
01 ¦ 0001 ¦ 00000001 ¦ hotel
02 ¦ 0001 ¦ 00000002 ¦ hotel
03 ¦ 0002 ¦ 00000001 ¦ B&B
04 ¦ 0002 ¦ 00000002 ¦ Fruhst
languages
id ¦ language
----------
01 ¦ English
02 ¦ Deutsch
OSCommerce does this to store things like product descriptions which are only used 1x/language and are frequently set/changed/updated by regular employees rather than admins.
For interface items that the user would not typically change, they often use files with constants and the DB type is a constant name. So in PHP/MySQL, you have a constant in the default language
HOTEL
B_AND_B
And then you have files for each page that get included based on the language. So you can have
include("hotel_details/" . $language . "/hotel_details_words.php/");
hotel_details_words.php is just a list of constants like
DEFINE(B_AND_B, "Frühst");
I think the DB approach that just does everything using the power of the relation DB is much easier overall.
Tom
Isn't that what a relational database is for?
I guess you are right. And if there were hundreds or thousands of entries in the types table I probably wouldn´t have batted an eye about the whole setup.
I like your point on maintainability with many tables. Again this seems to be the whole point of normalizing your database.
Implementation I choose
There is just a type column of type int in the main table. On retrieval it is passed to the following function:
function make_description($data, $array) {
$s = array();
foreach($array as $key => $value) {
if($data & $key) {
array_push($s, sprintf("<li>%s</li>", $value));
}
}
return '<ul>' . implode('', $s) . '</ul>';
} $data is the number contained in the db. $array is the array contained in $lang['typ']['de'], $lang['typ']['en'], etc. as seen below.
$lang['typ']['de'] = array(1<<0 => 'Hotel',
1<<1 => 'Pension',
...
1<<6 => 'Campingplatz');
This way I just have to do one simple query and get nicely formatted html output very easily. While I do realize that usually (if there are a lot of data involved) it is a good way to let the RDBMS do as much work as possible, the situation at hand seems different and IMO warrants a different approach.
Andreas
Again this seems to be the whole point of normalizing your database.
Actually the example I gave is sort of denormalized in order to increase maintainability. If it were normalized, each record would have only information that is uniquely related to the primary key, and all information that is uniquely related to the primary key. My idea is actually to violate that in order to make the whole thing easier to synchronize.
Out of curiousity, how many classifications are you going to have? Will users have the ability to make new types? If only the programmers will be adding types and it's only a few (say 20 or so), you could just put them straight into an array with no DB call.
$type_array = array(
"en" => array("1" => "Hotel", "2" => "B and B", ...)
"de" => array ("1" => "Hotel", "2" => "Fruhst", ....)
);
$type = $type_array[$lang][$type_num];
Can't get much faster than that! I use the associative index "1", "2", etc to override the automatic indexing so that if someone deletes an array element, it should still work. I believe that PHP is smart enough to avoid conflict if you have integers or strings that evaluate to integers as associative indices.
Tom
If the rating info is in the same table as the hotel info, you have to synchronize the local and live tables. If they are separate, you simply overwrite hotel info table and the rating info take care of itself.
Your second proposal comes close to my chosen implementation. I´m using bit operations to store more than one type (hotel and appartments) in one number. This comes at the price that I have to loop through the types array and bitwise AND the type number with the keys.
From your example I am not sure if you were intending to do the same. Would you still use the company2types table to account for the fact that one accomodation could be of more than one type? Then you could in fact do a simple lookup in the type hash.
There will be less than 20 types and the users cannot add new types. I will be using the same approach for other types of classifications too: offered services, leisure activities, etc.
Andreas
Would you still use the company2types table to account for the fact that one accomodation could be of more than one type? Then you could in fact do a simple lookup in the type hash.
That's basically what I was suggesting, yes. You still need a mechanism for efficiently tracking all types of accomdations at each place and that seems like the simplest solution.
I´m using bit operations to store more than one type (hotel and appartments) in one number. This comes at the price that I have to loop through the types array and bitwise AND the type number with the keys.
I wonder how much you really save by eliminating multiple queries and then parsing the integer bitwise. I'd be curious to benchmark it against something like...
--- settings.php ---
define('DEFAULT_LANG', 1);
$lang_array('en' => 1, 'de' => 2);
if ($HTTP_GET_VARS['lang'] && in_array($HTTP_GET_VARS['lang'], $lang_array) {
$lang = $HTTP_GET_VARS['lang'];
} else {
$lang = DEFAULT_LANG;
}
$type_array = array(
"en" => array("1" => "Hotel", "2" => "B and B", ...)
"de" => array ("1" => "Hotel", "2" => "Fruhst", ....)
);
--- show_page.php ----
include(settings.php);
$query = "SELECT type_index as ind
. FROM accom_type at, companies c, companies_2_types c2t
. WHERE c.id = $company_id AND c2t.company_id = c.id AND c2t.accom_id = at.id";
$result = mysql_query($query);
$accomodation_options = "";
while ($type = mysql_fetch_array($result)) {
$accomodations_options .= $type_array[$lang][$type['ind']];
}
---------------
When I was saying that this seems to be the whole point of normalizing your database I was referring to the following statement of yours:
If the rating info is in the same table as the hotel info, you have to synchronize the local and live tables. If they are separate, you simply overwrite hotel info table and the rating info take care of itself.
I understood that. My point was that in the case you are quoting, the tables are actually denormalized. Usually people think of denormalized tables as combining tables that should be split. The example that I gave in fact splits tables that should be combined in order to be normalized. Since the page rating relates only to the primary key and to nothing else, full normalization would ask that both tables be one, with page_id as the primary key. I have denormalized it in order to facilitate synchronizing. Rather than normalizing and then synchronizing using SQL, I'm suggesting denormalizing and then just overwriting the part of the database that is no based on user input, but only designer input.
Cheers,
Tom
[edited by: Marcia at 6:27 pm (utc) on Sep. 19, 2002]
Each accomodation was processed once in random order. The figures show the average time for processing one accomodation.
multiple queries..: 0,00166389942169 seconds
bitwise comparison: 0,00118190050125 seconds
--------------------------------------------
difference........: 0,00048199892044 seconds
Test Script
<?phprequire_once("shared/DatabaseConnection.inc.php");
require_once("shared/Functions.inc.php");function getmicrotime($t) {
list($usec, $sec) = explode(" ",$t);
return ((float)$usec + (float)$sec);
}function populate() {
$t = <<<END
CREATE TABLE companies (
id int(10) unsigned NOT NULL auto_increment,
name char(50) default NULL,
mytype int(10) unsigned NOT NULL default '0',
PRIMARY KEY (id)
) TYPE=MyISAM;CREATE TABLE companies_2_types (
id_comp int(10) unsigned NOT NULL default '0',
id_type int(10) unsigned NOT NULL default '0',
KEY id_comp_index (id_comp)
) TYPE=MyISAM;CREATE TABLE accom_type (
id int(10) unsigned NOT NULL auto_increment,
type int(10) unsigned NOT NULL default '0',
PRIMARY KEY (id),
KEY type_index (type)
) TYPE=MyISAM
END;$x = array(
array(
'name' => 'Aaron',
'type' => array(1,5),
),
array(
'name' => 'Nick',
'type' => array(1,4,8),
),
array(
'name' => 'Zack',
'type' => array(2,3),
),
array(
'name' => 'Tay',
'type' => array(6),
),
array(
'name' => 'Isaac',
'type' => array(5,7),
),
array(
'name' => 'Jesse',
'type' => array(2,4,6),
),
array(
'name' => 'Matt',
'type' => array(1,3),
),
array(
'name' => 'Chris',
'type' => array(5,6),
),
array(
'name' => 'Leo',
'type' => array(3,5),
),
array(
'name' => 'Gil',
'type' => array(2,7),
),
);foreach(explode(';', $t) as $sql) {
my_db_query($sql);
}
foreach($x as $key => $value) {
$mytype=0;
foreach($value['type'] as $type) {
$mytype += 1<<($type-1);
}
$sql = "INSERT INTO companies SET name='$value[name]', mytype=$mytype";
$id_comp = my_db_query($sql);
foreach($value['type'] as $type) {
$sql = "INSERT INTO companies_2_types SET id_comp=$id_comp, id_type=$type";
my_db_query($sql);
}
for($i=1;$i<9;$i++) {
my_db_query("INSERT INTO accom_type SET type=$i");
}
}
}#populate();exit;
$lang = 'en';
$type_array = array(
"en" => array(
"1" => "Hotel",
"2" => "B and B",
"3" => "Campsite",
"4" => "Appartments",
"5" => "Country House",
"6" => "Pub",
"7" => "Boarding House",
"8" => "Shack",
)
);$company_ids = range (1,10);
srand ((float)microtime()*1000000);
shuffle ($company_ids);$start = microtime();
while (list (, $company_id) = each ($company_ids)) {
$query = "SELECT type as ind
FROM accom_type at, companies c, companies_2_types c2t
WHERE c.id = $company_id AND c2t.id_comp = c.id AND c2t.id_type = at.id";
$result = mysql_query($query);
$s = "";while ($type = mysql_fetch_assoc($result)) {
$s .= $type_array['en'][$type['ind']];
}
}$end = microtime();
$t1 = (getmicrotime($end) - getmicrotime($start))/10;echo "multiple queries: $t1 seconds<br>";
$lang = array();
$lang['typ']['de'] = array(1<<0 => 'Hotel',
1<<1 => 'B and B',
1<<2 => 'Campsite',
1<<3 => 'Appartments',
1<<4 => 'Country House',
1<<5 => 'Pub',
1<<6 => 'Boarding House',
1<<7 => 'Shack');reset($company_ids);
$start = microtime();
while (list (, $company_id) = each ($company_ids)) {
$sql = "SELECT mytype FROM companies WHERE id='$company_id'";
$result = mysql_query($sql);
$type = mysql_fetch_assoc($result);$s = "";
# while (list ($key, $value) = each ($lang['typ']['de'])) {
foreach($lang['typ']['de'] as $key => $value) {
if((int)$type['mytype'] & $key) {
$s .= $value;
}
}
# reset($lang['typ']['de']);
}$end = microtime();
$t2 = (getmicrotime($end) - getmicrotime($start))/10;
$td = $t1 - $t2;echo "bitwise comparison: $t2 seconds<br><br>difference: $td seconds";
?>
Uncomment populate line to build tables and insert data.