|MySQL concurrent inserts and fixed table format|
I have a large MySQL table of about 50 million records. The table is heavy with inserts, updates and selects (about 150 per second). I'm considering to change the format from dynamic to fixed(static) to improve performance. The only varchar column is 12 characters, so it shouldn't add much to the size of the table.
A few times during the day, a couple of records are deleted from this table. Not much, maybe 10 or 20 records. I read everywhere that MyISAM supports concurrent INSERT and SELECT if there are no deleted rows in middle of the table. I assume this is due to the fragmentation of the table. What I failed to find out is whether the fixed table format is a good workaround if you delete a few records from time to time. As I understand, new inserts would will the gap, and since table format is fixed, the new records would fit the gap completely and there wouldn't be any more fragmentation. Since deletes are rare and only a few records is deleted, I assume the gap would be filled really fast (in a couple of seconds) and I would have concurrent inserts all the time. Do you know of any article, of have a personal experience confirming this?
|brotherhood of LAN|
- Are you using InnoDB?
- Do you use a Primary KEY?
- Is the PK numeric and inserted in ascending order
- Are these the records being deleted?
If your answer is yes to all then you don't have to do anything as InnoDB fills up the gap and frees the disk space (using innodb-file-per-table). I'm not sure what MyISAM does in these cases.
If you have a column that you mainly query by and is in an ascending order (like time) then I'd check out InnoDB as PK searches are very quick with it.
If the records you delete are at the start of the table, you might want to consider partitioning so you can 'chop off' the older values by removing older partitions and recover the space.
Either/or, try running OPTIMIZE TABLE and if its quick enough you could periodically use that on your table.
Hopefully that gives you some ideas.
- PK is composed of 3 columns: mediumint,date,tinyint
Everything I wrote in initial post is about MyISAM. Since MyISAM uses table-lock instead of row-lock like InnoDB, locking out concurrent selects while inserting can make a huge difference in performance.
Most of the queries used all three components of primary key, but the 2nd and 3rd field are often queried in range. Table looks like this:
create table points (
Most queries are like:
select * from points where userid = ? and playdate = ? and game = ?
select * from points where userid = ? and playdate between ? and ?
Deleted records are mostly closer to the end of the table (in the last 1000 records or so).
OPTIMIZE TABLE runs way too slow to run it every day. Maybe weekly would be possible.
|brotherhood of LAN|
If all your SELECT queries use the playdata column I'd consider moving that to the start of the PK and changing the table to InnoDB. If you are able to, benchmark them to compare. If you're familiar with PHP or other scripting you can see how many queries per second each setup can do to compare for SELECT speed. You can also fork the script to simulate lots of users.
The reason for changing the PK order is that they would be somewhat in ascending order (date is always increasing, userid may be random). I'd also leave the variable length column as is with innodb.
Regarding partitioning... it would split your table up into sub-tables and could speed up a simple OPTIMIZE. Again, if your queries all include the playdata column, partition by DATE... making sure none of your regularly performed queries use more than a couple partitions, preferably one. SELECTing from 100,000 rows is much better than a million.
|As I understand, new inserts would will the gap, and since table format is fixed, the new records would fit the gap completely and there wouldn't be any more fragmentation |
I think that's right...
It's really worth testing a few combinations and seeing what works for your particular case.
Actually, play data is rarely used, it's only for archival purposes (in case player complains about the points he got). There are many queries WHERE USERID = ?, so that's why userid is the first column in PK.
I didn't think about simulating users with scripts, that's a very good idea. Thanks.
|brotherhood of LAN|
A couiple other things you can try
|SHOW TABLE STATUS LIKE 'table_name' |
The 'data free' column will tell you how much wasted space there is in the table. Compare it ot the data size. If it's continually getting bigger then you'll want to OPTIMIZE at some point or the table will gradually slow.
RE: innodb, I would suggest it as a whole MyISAM table is locked when you're updating/deleting (and INSERTs perhaps as you say (MySQL manual) [dev.mysql.com] ).
For MyISAM you could partition on userid (since your queries use that all the time) and if you don't have queries that return lots of random userid's as partitioning can be slower that way. You can optimize partitions and since they'd be smaller, would be quicker to optimize on their own. Also, PK lookups would be faster, including for data manipulation as not all 50M PK's would have to be considered to begin with.