homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

This 62 message thread spans 3 pages: 62 ( [1] 2 3 > >     
Writing an Efficient Query
Help with an issue with SQL query
Nick_W




msg:440028
 10:56 am on May 29, 2003 (gmt 0)

Hi all,

I have to select rows from a table where one of the fields is an integer. I only want records that have '1,3,4,6 and 17' as this number. Not the others.

What is the best way to write the query? Here are a couple of ways I've thought of:

SELECT * FROM tbl WHERE colum IN $array_of_acceptable_integers;

SELECT * FROM tbl WHERE column = $val1 OR $val2 etc....

As you can see, once we get past the very basics I'm none to bright so anyone has any 'authorative tutorials' on writhing these kind of things would also be welcome.

Thanks!

Nick

 

aspr1n




msg:440029
 11:15 am on May 29, 2003 (gmt 0)

No authorative advice I'm afraid Nick, just another 'IMO'!

I think I'd select where <17 and then use PHP to iterate through the array, or pop()/unset() them.

You could try <7 and == 17 to return fewer results as well.

asp

Nick_W




msg:440030
 11:19 am on May 29, 2003 (gmt 0)

Sorry, they're not real numbers ;) just examples....

Nick

jpjones




msg:440031
 11:25 am on May 29, 2003 (gmt 0)

I do this by using your first suggestion.
SELECT * FROM tbl WHERE colum IN (1,3,4,6,17);

This field has an index on it, I presume?

JP

Nick_W




msg:440032
 11:30 am on May 29, 2003 (gmt 0)

Absolutely!

I'ts a forum id, I'm trying to create an 'selective' active list...

Nick

brotherhood of LAN




msg:440033
 11:45 am on May 29, 2003 (gmt 0)

Nick,

Have you tried these queries on a large table yet? If no maybe you could setup a table and fill it with junk values and see which one is quicker.

Wouldn't mind knowing myself - I didnt know about the "in" syntax before you posted it here- and looks like it's what I'm after.

When querying a CHAR column with a few dozen "ORs" in the WHERE clauses was quite slow for me, slower than doing them singly.....at a guess I'd hedge my bet with the "IN" query being faster.

Nick_W




msg:440034
 11:47 am on May 29, 2003 (gmt 0)

I've never used it either, but seeing as it's clearly purpose built for this kind of work I'm sure it's a winner.... ;)

If I do test, I'll post!

Thanks...

Nick

ukgimp




msg:440035
 12:10 pm on May 29, 2003 (gmt 0)

Will two queries first one builds up the list of values that are not active.

$sql1 = "get records that are not active (whichever way round)"

The use the values to build the IN parameter for the second query ($IN)

Sql2 = "SELECT * FROM tbl WHERE colum IN ($IN)"

I managed to get this type of query going for a subject table so that only bottom level subjects appeared and it seemed to do the job well.

Cheers

BCMG_Scott




msg:440036
 7:00 pm on May 29, 2003 (gmt 0)

Hi Nick, using IN is a better bet - according to MySQL docs, if all the values are constants (which they are) it will perform a binary search. Also, you should run that query through the EXPLAIN command and see if and what index it's using. Finally, if you have/do a lot of deletes on the table you should analyze/optimize the table - this will remove the empty records left behind after a delete.

Scott Geiger

dkubb




msg:440037
 7:07 am on May 30, 2003 (gmt 0)

Here's some general things I do when trying to optimize my SQL queries:

Don't ever do a "SELECT *". You're pulling back extra columns that may or may not be necessary to your application. This wastes memory both on the database server and in your application. On a heavily loaded system this could cause memory to run out, and swapping, which will kill your performance worse than just about anything else. Explicitly select only the columns you are using in your application. Think minimalist when writing SQL queries -- only do what is specifically necessary to get the job done.

There are other very valid reasons to not use SELECT *, but it would be OT to talk about it in this thread, perhaps another time.

Make sure the data-type you're using for the field is the best for the job. Use the most restrictive datatype you can that will still fit the expected values. I often see people use an unsigned INT when they could fit the values into a signed TINYINT. Of course, don't just use the data in the table to make the decision; think about what the data is and how it can vary. If the data has a well defined range that it cannot exceed, it is a perfect candidates for reduction to the smallest possible data type. With data that is unbounded it is a little trickier, the best you can do is be as restrictive as possible, and check back often to see if your assumptions hold up or if you need to expand the data type.

In general the smaller the data the quicker it's read from the file system, the smaller the index is which makes it faster to find information in the index.

If you know a value is going to be unique within a table, mark it as such with the UNIQUE attribute. If it can't contain a NULL value, mark it as NOT NULL. If its an integer and can't be negative make sure its UNSIGNED. If you have a char field, and you know it should never be more than 20 characters, set the column to CHAR(20).

Be as strict as possible. The idea is to reduce the number of unknowns that the database has to deal with. You're giving the database more clues what the data should look like (the more meta-data, the better) and the more info the database has to work with, the better it can optimize the queries.

You'll also appreciate knowing specifically what the data is supposed to look like when you write data-validation routines within the application code. (oh, the database says country_id can't be more than 3 digits long, cannot be null, and must be between 1 and 999? I'd better check all that before accepting it from the user) Your database's data model can become a concise blueprint for your application's data validation if you allow it.

Sorry, got carried away. I'll try to stay on-topic.. :)

Make sure you index any fields you're querying in the WHERE clause. Also remember that only one index per table can be used at a time (in MySQL) per SQL query. You can't just create three indexes for three columns in a table, and expect all three will be used when you query it. MySQL will look at the available indexes and choose just one of them that will allow it to eliminate the largest number of rows from consideration. What I mean by this is: MySQL always trys to do the most work in the shortest number of steps, it will do as much as possible up-front to lower the workload. Chosing the best index is one way it does this. The EXPLAIN command will show you specifically which indexes MySQL considered when preparing the query, and also show which index it thought was the best shortcut.

If you want to query multiple fields, and you want an index used for all of them, you need to set up a single index that includes all the fields together. In general remember to put the most accessed fields first in the index, and the least accessed fields last. (trust me on this, order is significant in an index. I can go into more detail if anyone is interested)

IMHO use the IN clause. I can't say specifically if its much faster; MySQL probably optimizes the OR clause and IN clause down to the same instructions under the hood anyway. I've never seen a noticable increase in speed difference between the two. I use an IN clause in this situation because its cleaner, easier to maintain and saves programmer time. Anything that saves your time allows you to spend more time structuring the code and queries to be simpler. Simpler code can be profiled and optimized easier when/if you need to do so the future.

Nick_W




msg:440038
 7:32 am on May 30, 2003 (gmt 0)

Awsome post dkubb. Thanks! ;)

I've read it once, I'll re-read it after lunch and a few (dozen) coffeess.....

Thanks for taking the time to explain so much, so well!

Nick

ukgimp




msg:440039
 7:51 am on May 30, 2003 (gmt 0)

>> I can go into more detail if anyone is interested

Please do, this sort of optimisation is rarely covered but hugely useful to all levels of user.

Thanks

RonPK




msg:440040
 10:44 am on May 30, 2003 (gmt 0)

Hello dkubb, thx for your posting. It made me realize that I've got lots of work to do, as I have several tables with more than one index...

Also remember that only one index per table can be used at a time (in MySQL) per SQL query. You can't just create three indexes for three columns in a table, and expect all three will be used when you query it. MySQL will look at the available indexes and choose just one of them that will allow it to eliminate the largest number of rows from consideration.

I had a look at the manual, and noticed it mentions USE INDEX to solve this:
SELECT ... FROM table_name USE INDEX (key_list) WHERE ...

Nick_W




msg:440041
 11:14 am on May 30, 2003 (gmt 0)

Hmmm....

I presume there is a reason for that, can we get the lowdown on this from one of the SQL gurus?

Nick

BCMG_Scott




msg:440042
 4:11 pm on May 30, 2003 (gmt 0)

dkubb, some good points. I have to add a couple items (and perhaps a correction).

First, according to MySQL docs the char vs varchar field does some funky things. If you try and set a field as char(20) it will "silently" change it to varchar(20). Conversely, if you try and set a field to varchar(2) it will set it to char(2). See 6.5.3.1 Silent Column Specification Changes of the MySQL doc for more information.

Make sure you index any fields you're querying in the WHERE clause

On the surface this would seem logical and correct, it could be wrong though. Let's say you only have one SELECT query that you ever use:

select col1, col2, col3, col4
from table
where col1 = 'blah'
and col2 = 'blahblah'
and col3 not in ('blah','blah blah');

should you create indexes on col1, col2 and col3 (thus creating 3 indexes)? NO. You will have 2 unused indexes taking up space. As you noted a query can only use one index at a time. So the better choice is to figure out how your select queries will be using indexes and which indexes each will use.

Scott

RonPK




msg:440043
 11:54 pm on May 30, 2003 (gmt 0)

>> Also remember that only one index per table can be used at a time (in MySQL) per SQL query.

> SELECT ... FROM table_name USE INDEX (key_list) WHERE ...

I did some tests with EXPLAIN SELECT and noticed that even for a query with USE INDEX (key1, key2, key3) MySQL uses only one key. One of the specified keys; it won't use key4.

So if it is necessary to use more than one key, you will indeed have to make one index for all the columns you need.

daisho




msg:440044
 4:23 am on May 31, 2003 (gmt 0)


should you create indexes on col1, col2 and col3 (thus creating 3 indexes)? NO. You will have 2 unused indexes taking up space. As you noted a query can only use one index at a time. So the better choice is to figure out how your select queries will be using indexes and which indexes each will use.

No you create a compound index.

create index idx_table_compound on table (col1,col2,col3)

dkubb did mention this in his post.

daisho.

musicales




msg:440045
 5:39 am on Jun 2, 2003 (gmt 0)

I'm trying to create an 'selective' active list

Nick_W - anything like a 'highlighted forum posts' page by any chance ;)

I'd definitely go with the in version. Simple and effective.

brotherhood of LAN




msg:440046
 5:58 am on Jun 2, 2003 (gmt 0)

The "in" clause and compound indexes are new to me, I scanned through the mysql manual with a highlighter and couldnt see much about either.

Does anyone have good resources on either?

They're making my queries more efficient anyway :) ....

dkubb




msg:440047
 6:14 am on Jun 2, 2003 (gmt 0)

I had planned to explain how MySQL uses (or fails to use) multi-column indexes, and add some extra things I remembered since my original post in this thread, but I thought it would be better to point everyone to Chapter 5 of MySQL's manual. It goes very much in depth into how to optimize queries, table designs, server setup, and even OS config for maximum performance.

[mysql.com ]

If anyone has any tidbits to share that are not within this document I would love to hear them. I'll do the same after refreshing my memory -- its been a few months since I read the document in its entirety.

BCMG_Scott: Thanks, I was aware of the silent column change that MySQL performs with CHAR to VARCHAR. I was trying not to go into any of MySQL's quirky behaviours and possibly confusing people who were wondering about optimization; although you're correct that there is some relevance to it that I had forgotten about.

However, one small point, its not entirely correct that a CHAR(20) will be converted into a VARCHAR(20) automatically in all cases. This only occurs when you attempt to create a CHAR field within a table that also contains a variable length type: VARCHAR, TEXT, or BLOB. It is very possible to create pure CHAR fields within a MySQL table, as long as you refrain from using those other column types within the same table. This can be anywhere from very simple to very difficult depending on the type of data you are storing and the degree that it can be decomposed.

From an optimization point of view, you should try to stay away from using variable length columns such as VARCHAR, TEXT or BLOB. (see section 5.2.12 in the MySQL manual -- incidentally that's one of my favorite sections in the manual, lots of good stuff to learn there)

Xuefer




msg:440048
 11:06 am on Jun 2, 2003 (gmt 0)

thx dkubb
so you are going to explain "multi-column indexes"
i have some questions:
1. what's the absolute vantage for multi rather than single column index
2. can u explain it on some sample table? e.g.: on Forum/Post table, or Category/Products table, this is more common used table struct IMHO
3. and tell us what situation we should/shouldn't use it
4. and what else should we take care of to do the best optimize
5. when will your plan be done, and where to get it? here?
:)

Clark




msg:440049
 11:51 am on Jun 2, 2003 (gmt 0)

I hope we have Nick's permission to expand this discussion beyond his original query as it's fascinating and quite useful. If not, we can always start a new thread.

So here are a couple more questions and points to people who may not quite understand what an index means or does.

(I'm making some assumptions, this is not based on actually reading the documentation so please correct errors.)

The table obviously has to be stored in some order. When you do a select on an unindexed table, it will search through the table in the order it is stored on the hard drive. I believe there is a command to sort the table in place, in a particular order (but I forgot the command he).

When you have an index, what it does is sort the database in the order you specify on the index. How do you specify an order? You give a list of fields to index and it will be sorted in that order. For example, say you have fields X and Y. Here is some data:

# ¦ X ¦ Y
---------
1 ¦ A ¦ B
2 ¦ A ¦ A
3 ¦ A ¦ C
4 ¦ B ¦ A
5 ¦ A ¦ B
6 ¦ D ¦ A
7 ¦ C ¦ D

If the index were only on X, then record 5 would switch positions with record 4 and record 7 would switch positions with record 6. An index on X AND Y would mean that the index would rearrange it like so:

# ¦ X ¦ Y
---------
2 ¦ A ¦ A
1 ¦ A ¦ B
5 ¦ A ¦ B
3 ¦ A ¦ C
4 ¦ B ¦ A
7 ¦ C ¦ D
6 ¦ D ¦ A

The actual order in the database would NOT change. There would be NEW data stored on the server telling it what the table order is for that index. Every time you add a new record, there is a new entry in every single index you specified on the table telling it in what order that new entry goes into. I'm not exactly sure how the index looks. I would guess it is just a list of record numbers telling MYSQL what order the data is in. But I'm not quite sure because then if it's search for a record starting with D, how does it know to start with record 6 if the X (and Y) field are not stored in duplicate along with the index. Perhaps someone will pipe in with the answer.

Anyways, just sharing this for some people who might not quite understand what it means to have an index and why it helps. In particular, what it means to index on more than one column.

It is important therefore to realize that adding indexes does affect performance, since more data has to be stored and every insert requires more processing....

If you only do a select once a day, and it's a batch process that spits out to you a report once a day, it doesn't matter how long it takes to generate the report. And if the entire day is spent on inserts, that is an example where indexes will not help you, it will only degrade the performance of your app.

Something else to understand, you can have one index on COLUMN X and another on COLUMN X and Y. The second index can still do work for you since COLUMN X gets indexed at the same time. It just depends on what you are trying to accomplish.

Hmm, I wanted to ask a question or two but in the midst of putting this down I forgot. I guess it'll come back to me. Hope this helps.

Nick_W




msg:440050
 12:05 pm on Jun 2, 2003 (gmt 0)

>>I hope we have Nick's permission to expand this discussion

Not required, though the courtesy is appreciated. Don't think I'm not following this just 'cos I'm quiet. It's a riveting read ;-)

Nick

Xuefer




msg:440051
 12:23 pm on Jun 2, 2003 (gmt 0)

thx Clark
if we can select unique row from table using 1 column index (the forum id that Nick_W referenced to)
do we still need multi-column index?

Clark




msg:440052
 1:08 pm on Jun 2, 2003 (gmt 0)

No, in that case you only need one column index and it will be better to only use that one.

Depending on the complexity of what you build, the amount of tuning you can do is endless. Tune your php code, your selects, optimize your my.cnf file, your php file, your indexes. Your database layout. In fact, I was recently working like crazy trying to figure out how to get the data to look the way I wanted it to, then realized that by doing a major change in the way the database looked, mysql would automatically make the data look the way I wanted it to automagically without doing any queries.

If there is anything that is important in tuning up any database, it is understanding normalization really really well and when to apply it.

lorax




msg:440053
 5:06 pm on Jun 2, 2003 (gmt 0)

And since I'm late to the party I'll just throw in another tidbit. Make sure your table structure is normalized to an appropriate level for the application you're building or it won't matter much how efficient your query is.

That's the funny thing about databases - they are relational and it really means you have to think and work with the many different pieces all concurrently when designing the initial structure or suffer performance.

sun818




msg:440054
 5:46 pm on Jun 2, 2003 (gmt 0)

Clark said:
one index on COLUMN X and another on COLUMN X and Y. The second index can still do work for you since COLUMN X gets indexed at the same time.

If you already have a multi-field index for column X and Y, what is the benefit in having the first index? There is a performance hit as two indexes are being updated unnecessarily during an insert or delete instead of one. Your second index is able to handle the job of the first index. From mySQL manual: 5.4.3 How MySQL Uses Indexes [mysql.com]:
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimiser to find rows. For example, if you have a three-column index on (col1,col2,col3), you have indexed search capabilities on (col1), (col1,col2), and (col1,col2,col3).

As for a forum example, we can use this web site as an example. A multi-column index (forum,discussion) may sort for (forum) or (forum,discussion). This index could be used effectively in the following queries:

select * from table
where forum=13
or
select * from table
where forum=13
and discussion=2910

But the index would not work for these queries:
select * from table
where discussion=2910
or
select * from table
where discussion=2910
and forum=13

martekbiz




msg:440055
 10:04 pm on Jun 2, 2003 (gmt 0)

>>>PHP to iterate through the array

Has t the the worst advice I've ever read.

Why would you want to use resources to cycle through on record after another until you find the results oyu are looking for?

Use the DB! That's what it is there for. It'll be much quicker and less server resources/OH used.

Personally, I like using in() queries on arrays to find matching results.

Aaron

Clark




msg:440056
 3:10 pm on Jun 3, 2003 (gmt 0)

Thanks Sun. You clarified a few things for me I was uncertain about.

I'm also still a bit confused on JOINS. I've fooled around and experimented (besides reading..which helped me understand it WHILE reading but never to retain in my head the meaning of the different joins).

The only JOIN I ever remember using was LEFT JOIN. I remember cases where I thought RIGHT JOIN would work, but it didn't and LEFT did.

Anyone have a clear, concise and ez way to understand the different JOINS.

Also, am I the only one that finds the typical documentation for say MYSQL or PHP to be rather confusing unless you actually already understand the concepts behind what they're saying? IOW, if you understand the concept, it does a great job of explaining how MYSQL handles that function, but if you don't know already what the function does, you really have to work hard to get it.

lorax




msg:440057
 3:14 pm on Jun 3, 2003 (gmt 0)

Clark,
That's a pretty fair assessment of the documentation. I still have to focus harder to read their documentation at times.

Re: JOINS. Good gawd - if you find such an explanation you'll be rich. JOINS are so flexible it's hard to grasp all of the possibilities let alone explain them clearly.

This 62 message thread spans 3 pages: 62 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved