Page is a not externally linkable
dkubb - 7:07 am on May 30, 2003 (gmt 0)
Don't ever do a "SELECT *". You're pulling back extra columns that may or may not be necessary to your application. This wastes memory both on the database server and in your application. On a heavily loaded system this could cause memory to run out, and swapping, which will kill your performance worse than just about anything else. Explicitly select only the columns you are using in your application. Think minimalist when writing SQL queries -- only do what is specifically necessary to get the job done. There are other very valid reasons to not use SELECT *, but it would be OT to talk about it in this thread, perhaps another time. Make sure the data-type you're using for the field is the best for the job. Use the most restrictive datatype you can that will still fit the expected values. I often see people use an unsigned INT when they could fit the values into a signed TINYINT. Of course, don't just use the data in the table to make the decision; think about what the data is and how it can vary. If the data has a well defined range that it cannot exceed, it is a perfect candidates for reduction to the smallest possible data type. With data that is unbounded it is a little trickier, the best you can do is be as restrictive as possible, and check back often to see if your assumptions hold up or if you need to expand the data type. In general the smaller the data the quicker it's read from the file system, the smaller the index is which makes it faster to find information in the index. If you know a value is going to be unique within a table, mark it as such with the UNIQUE attribute. If it can't contain a NULL value, mark it as NOT NULL. If its an integer and can't be negative make sure its UNSIGNED. If you have a char field, and you know it should never be more than 20 characters, set the column to CHAR(20). Be as strict as possible. The idea is to reduce the number of unknowns that the database has to deal with. You're giving the database more clues what the data should look like (the more meta-data, the better) and the more info the database has to work with, the better it can optimize the queries. You'll also appreciate knowing specifically what the data is supposed to look like when you write data-validation routines within the application code. (oh, the database says country_id can't be more than 3 digits long, cannot be null, and must be between 1 and 999? I'd better check all that before accepting it from the user) Your database's data model can become a concise blueprint for your application's data validation if you allow it. Sorry, got carried away. I'll try to stay on-topic.. :) Make sure you index any fields you're querying in the WHERE clause. Also remember that only one index per table can be used at a time (in MySQL) per SQL query. You can't just create three indexes for three columns in a table, and expect all three will be used when you query it. MySQL will look at the available indexes and choose just one of them that will allow it to eliminate the largest number of rows from consideration. What I mean by this is: MySQL always trys to do the most work in the shortest number of steps, it will do as much as possible up-front to lower the workload. Chosing the best index is one way it does this. The EXPLAIN command will show you specifically which indexes MySQL considered when preparing the query, and also show which index it thought was the best shortcut. If you want to query multiple fields, and you want an index used for all of them, you need to set up a single index that includes all the fields together. In general remember to put the most accessed fields first in the index, and the least accessed fields last. (trust me on this, order is significant in an index. I can go into more detail if anyone is interested) IMHO use the IN clause. I can't say specifically if its much faster; MySQL probably optimizes the OR clause and IN clause down to the same instructions under the hood anyway. I've never seen a noticable increase in speed difference between the two. I use an IN clause in this situation because its cleaner, easier to maintain and saves programmer time. Anything that saves your time allows you to spend more time structuring the code and queries to be simpler. Simpler code can be profiled and optimized easier when/if you need to do so the future.
Here's some general things I do when trying to optimize my SQL queries: