Forum Moderators: coopster
id flags term
1 3 Computer
2 3 Central Processing Unit
3 3 Computing Machine
4 3 Computing Device
etc..
The table is used in conjunction with other tables for a glossary/dictionary.
The 'flags' column is a 4 byte integer that surprisingly is a 'flag' to indicate certain things about the 'term'. For instance, the first 5 bits of the flag represents the first letter of the term
i.e. a = 1,b = 2...z = 26
So when I do a query for "what terms are under the letter A", I'd query the last 5 bits of the flags column. My question, is there a preferred way to do this query?
SELECT DISTINCT(term) as term FROM dictionary_terms WHERE CONV(RIGHT(BIN(dictionary_terms.flags),5),2,10) = 3 ORDER BY term LIMIT 0,100
This is what I've came up with, somehow it looks 'greedy' when it does the lookup :)
Anyway since you say your storing in a 4 byte integer you have 32bits (8bits to a byte).
I'd be more likely to take a bitwise comparison approach. Since the way your example works now each field in the database has to call a function. Big penalty.
So to keep things simple lets talk about a 1 byte field. So with that we have:
00000000 which each position can either be a 1 or 0 (on or off).
so say we wanted to test:
00100011 to see if bit 2 was set we would compare it to
00000010 with an or comparison.
ie: if( 00100011 == (00100011 ¦ 00000010) ) Since it's an "OR" match if either value has the bit set the resulting bit will be set. If we compare that back to the original test value and they equal we know that it has that bit set. If they differ we know the original does not have the bit set.
With that being said (hopefully I didn't bore you to much) you basically have to build your test value and use binary comparisons to check if the right bits are set. This requires no function calls and should be noticably faster depending on your data set.
Since you know your using the first 5 bits to store your first letter come up with your 26 test matches with (ie a value that just has your value for "a" set and all other bits in the 4 byte int turned off). Say for example that value is 2 (since it will be a valid int number) write your query as follows:
SELECT DISTINCT(term) as term FROM dictionary_terms WHERE dictionary_terms.flags = (dictionary_terms.flags ¦ 2) ORDER BY term LIMIT 0,100
Hopefully that makes sense and I didn't overexplain.
daisho.
vince, I was wondering the same. Since I'm in the planning stages of this it's the best time for me to find out...didnt want to end up with 000's of 'wasteful' queries being done, I thought my original code was bound to be slow.
daisho, it's probably not looking clear (maybe my description of the prob at hand) but thinking it's more because it's bad code :)
>>call a function. Big penalty
That's what worried me, I assume the query would have to do that function on each row?
I have a rough gist of these bitwise operators, and sort of planned to make use of them, 1 bit sure sounds like a good saving on a byte.
I'll be looking at the database structure, as already hinted its for storing a dictionary, but with all the various "add ons" like a glossary, flags for synonyms etc i wanted to plan this down to a T...was thinking about pasting the schema either of you are up for suggestions....cant beat a bit db normalisation once in a while eh :)
Cheers
Though for the scrifice of a little space you man want to create a seperate field for every "type" you want to store.
Ie one field for the first letter and so on. Even make it an ENUM so the database has a better idea of what you are doing.
As for the function calls yes you are right the functions will be called for "EVERY" row since that value changed on every row and cannot be precalculated. If you have to use functions always try to keep them to static (ie passed in) values. Ofcourse some things you have to use functions for on the fields and that's what they are their for.
daisho.