Forum Moderators: open

Message Too Old, No Replies

insert strategies to busy myisam table

lots of reads per second; even with load data infile inserts are unbearable

         

phoenix_fly

3:54 am on Jun 26, 2008 (gmt 0)

10+ Year Member



Hello my friends

Here's the problem I'm facing: I have a table that has about 20 selects per second. Every hour, about 5 vendors (it's a e-commerce website) uploads additions to their inventory, inserting about 5.000 records each one and taking some precious 2-3 seconds while I get 40-60 queries sitting there and waiting to be run. (so, the next second I have 60-80 queries trying to run at the same time)

The 20 queries/second themselves are already a very intense load. I'd say if it was only them, we'd be at 60% of what we can expect to get from this server we're in. But when the inserts happen, our query time really climbs from 50 ms average to more than 2 seconds sometimes. And the mess takes many seconds to be solved by itself.

We're talking about a 3 million row CD/DVD inventory table, with many fulltext indexes (one for artist, other for album name, other for description and another one for all of them). So our inserts certainly are paying the price for that. But the fact is that we really have to keep all these indexes.

Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 2853361
Avg_row_length: 244
Data_length: 696384596
Max_data_length: 281474976710655
Index_length: 876121088
Data_free: 0

+----------+------------+----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
¦ Table ¦ Non_unique ¦ Key_name ¦ Seq_in_index ¦ Column_name ¦ Collation ¦ Cardinality ¦ Sub_part ¦ Packed ¦ Null ¦ Index_type ¦ Comment ¦
+----------+------------+----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
¦ inventory ¦ 0 ¦ PRIMARY ¦ 1 ¦ inv_id ¦ A ¦ 2853363 ¦ NULL ¦ NULL ¦ ¦ BTREE ¦ ¦
¦ inventory ¦ 1 ¦ index_user_id ¦ 1 ¦ user_id ¦ A ¦ 5907 ¦ NULL ¦ NULL ¦ YES ¦ BTREE ¦ ¦
¦ inventory ¦ 1 ¦ index_genre ¦ 1 ¦ genre ¦ A ¦ 75 ¦ NULL ¦ NULL ¦ YES ¦ BTREE ¦ ¦
¦ inventory ¦ 1 ¦ index_ativo ¦ 1 ¦ ativo ¦ A ¦ 3 ¦ NULL ¦ NULL ¦ YES ¦ BTREE ¦ ¦
¦ inventory ¦ 1 ¦ index_label ¦ 1 ¦ label ¦ A ¦ 45291 ¦ 5 ¦ NULL ¦ YES ¦ BTREE ¦ ¦
¦ inventory ¦ 1 ¦ index_inv_id_user_id ¦ 1 ¦ user_id ¦ A ¦ 5907 ¦ NULL ¦ NULL ¦ YES ¦ BTREE ¦ ¦
¦ inventory ¦ 1 ¦ index_inv_id_user_id ¦ 2 ¦ inv_id ¦ A ¦ 2853363 ¦ NULL ¦ NULL ¦ ¦ BTREE ¦ ¦
¦ inventory ¦ 1 ¦ index_author ¦ 1 ¦ author ¦ NULL ¦ 1 ¦ NULL ¦ NULL ¦ ¦ FULLTEXT ¦ ¦
¦ inventory ¦ 1 ¦ index_title ¦ 1 ¦ title ¦ NULL ¦ 1 ¦ NULL ¦ NULL ¦ YES ¦ FULLTEXT ¦ ¦
¦ inventory ¦ 1 ¦ index_authoretitle ¦ 1 ¦ author ¦ NULL ¦ 1 ¦ NULL ¦ NULL ¦ ¦ FULLTEXT ¦ ¦
¦ inventory ¦ 1 ¦ index_authoretitle ¦ 2 ¦ title ¦ NULL ¦ 1 ¦ NULL ¦ NULL ¦ YES ¦ FULLTEXT ¦ ¦
¦ inventory ¦ 1 ¦ index_descr ¦ 1 ¦ descr ¦ NULL ¦ 1 ¦ NULL ¦ NULL ¦ YES ¦ FULLTEXT ¦ ¦
+----------+------------+----------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+

I've tried LOAD DATA INFILE, but the insert didn't get much faster.

I am now thinking about some approach that deals with some temporary tables. For example: having the data inserted to a temporary table INVENTORY_TEMP and them doing only one insert per hour on the production 'inventory' table.

This way I'd tell my vendors: "Hey, ok, your inventory table was read, but the data will only show up at the searches at the next hour, ok?"

But I've done some testing, and the
INSERT INTO INVENTORY SELECT * FROM INVENTORY_TEMP;
doesn't get any faster than the sum of the inserts I would have avoided, though. (I thought that transfering from one table to another could somehow be faster, with mysql somehow merging the already built indexes. But it seems it doesn't do that)

I know INNODB's row-level lock would help not to make readers wait untill the 2 second write is done. But the fact is that I need to keep the fulltext indexes - and innodb doesn't support them, as you know. (Then I thought: "Mmm, what if I had one MYISAM table for the searches (reads) and other INNODB for the writes. But, despite the initial feeling, I couldn't go much further with the concept. Ended up thinking it was a stupid irrelevant idea I had.)

Any ideas, folks? Has anyone ran into this problem and found a good strategy?

Thanks a lot

Phoenix

PS: I'm running mysql 5 on RHEL 5.

physics

4:52 pm on Jun 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One approach might be pull the switcheroo using rename.

0) [only necessary first time] Create a table called inventory_staging which is a copy of inventory

1) Add new info to inventory_staging

2) Copy inventory_staging to inventory_staging_bk

3) SWAP the two table names inventory and inventory_staging (see docs for howto)

4) DROP table inventory_staging

5) RENAME inventory_staging_bk to inventory_staging

This is just a theory but it's worth a shot and I think it should work as long as the RENAME op is as fast as I think it is.

[dev.mysql.com...]
[p.s. the docs do indicate that the move op is atomic!]

LifeinAsia

6:02 pm on Jun 26, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Another option may be to look into replication: 1 DB as the master where all the "writes" (inserts/updates) go to, which pushes updates to a slave where all the reads (selects) go. The master queues up changes to the slave, which processes them in chunks so as to not affect the reads.

We had a similar problem, although it was ongoing- stat logging frequently caused slowdowns and timeouts all day long. Finally going to replication pretty much eliminated the issue overnight.

phoenix_fly

8:40 pm on Jun 26, 2008 (gmt 0)

10+ Year Member



Thanks for the replies, Phisics and LifeinAsia

Phisics, #2 takes a lot of minutes with full power CPU. Or can I do that through the unix shell, creating copies of the tables's data files?

LifeinAsia, in fact, I'll check that, thanks for the idea.

Any other approaches, folks?

physics

9:15 pm on Jun 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was just thinking that if copying the entire inventory_staging table to inventory_staging_bk is too intensive you could modify the steps by skipping step #2. Then after swapping (step 3) insert the data again into inventory_staging (otherwise the 'new' inventory staging will be missing the new data). Then you can skip 4 and 5.

LifeinAsia might be onto a more solid solution though...

physics

4:41 pm on Jun 30, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Any update on this phoenix_fly? We're interested to find out what ends up working out for you :)

phoenix_fly

8:38 pm on Jul 1, 2008 (gmt 0)

10+ Year Member



Hello phisics, thanks for asking. I was gathering some new info, sorry for the delay.

I've set up some new info at my logs, to take a closer look at how many records my vendors are inserting each time. And:
Good news: surprise, they are inserting way less than I thought.
Bad news: a one-row insert query is taking 300ms to complete!

Here's the log: (nrecs, query time, date)

+------+------+---------------------+
¦ recs ¦ elaps¦ log ¦
+------+------+---------------------+
¦ 2 ¦ 0.36 ¦ 2008-06-30 19:26:24 ¦
¦ 72 ¦ 0.44 ¦ 2008-06-30 19:33:44 ¦
¦ 1 ¦ 0.33 ¦ 2008-06-30 19:35:18 ¦
¦ 41 ¦ 0.43 ¦ 2008-06-30 19:36:51 ¦
¦ 5 ¦ 0.32 ¦ 2008-06-30 19:37:12 ¦
¦ 1 ¦ 0.33 ¦ 2008-06-30 19:45:43 ¦
¦ 200 ¦ 0.54 ¦ 2008-06-30 19:50:28 ¦
¦ 44 ¦ 0.35 ¦ 2008-06-30 19:52:39 ¦
¦ 3 ¦ 0.39 ¦ 2008-06-30 20:08:31 ¦
¦ 313 ¦ 1.29 ¦ 2008-06-30 20:13:03 ¦
¦ 2 ¦ 0.86 ¦ 2008-06-30 20:21:29 ¦
¦ 163 ¦ 1.88 ¦ 2008-06-30 20:29:30 ¦
¦ 50 ¦ 0.44 ¦ 2008-06-30 20:44:52 ¦
¦ 46 ¦ 0.48 ¦ 2008-06-30 21:39:25 ¦
¦ 0 ¦ 0.32 ¦ 2008-06-30 21:42:16 ¦
¦ 60 ¦ 0.39 ¦ 2008-06-30 21:42:41 ¦
¦ 27 ¦ 0.35 ¦ 2008-06-30 21:58:54 ¦
¦ 19 ¦ 0.36 ¦ 2008-06-30 22:39:27 ¦
¦ 259 ¦ 0.81 ¦ 2008-06-30 22:59:50 ¦
¦ 46 ¦ 0.36 ¦ 2008-06-30 23:18:24 ¦
¦ 30 ¦ 0.39 ¦ 2008-06-30 23:35:14 ¦
¦ 36 ¦ 0.82 ¦ 2008-07-01 00:25:49 ¦
¦ 79 ¦ 0.75 ¦ 2008-07-01 01:52:30 ¦
¦ 200 ¦ 0.39 ¦ 2008-07-01 08:26:34 ¦
¦ 120 ¦ 0.35 ¦ 2008-07-01 08:27:06 ¦
¦ 80 ¦ 0.35 ¦ 2008-07-01 08:37:46 ¦
¦ 10 ¦ 0.31 ¦ 2008-07-01 08:46:28 ¦
¦ 7 ¦ 0.33 ¦ 2008-07-01 08:52:15 ¦
¦ 80 ¦ 0.34 ¦ 2008-07-01 08:57:19 ¦
¦ 1 ¦ 0.31 ¦ 2008-07-01 09:09:01 ¦
¦ 59 ¦ 0.41 ¦ 2008-07-01 09:20:54 ¦
¦ 45 ¦ 0.37 ¦ 2008-07-01 09:21:38 ¦
¦ 61 ¦ 0.75 ¦ 2008-07-01 10:28:57 ¦
¦ 1 ¦ 0.32 ¦ 2008-07-01 10:36:16 ¦
¦ 2 ¦ 0.31 ¦ 2008-07-01 10:53:30 ¦
¦ 21 ¦ 0.33 ¦ 2008-07-01 10:54:33 ¦
¦ 30 ¦ 0.33 ¦ 2008-07-01 10:56:15 ¦
¦ 29 ¦ 0.35 ¦ 2008-07-01 11:25:05 ¦
¦ 117 ¦ 0.51 ¦ 2008-07-01 11:31:13 ¦
¦ 50 ¦ 0.48 ¦ 2008-07-01 11:45:31 ¦
¦ 7 ¦ 0.32 ¦ 2008-07-01 11:50:43 ¦
¦ 60 ¦ 0.44 ¦ 2008-07-01 11:52:33 ¦
¦ 30 ¦ 0.45 ¦ 2008-07-01 11:56:13 ¦
¦ 29 ¦ 0.40 ¦ 2008-07-01 11:58:30 ¦
¦ 3 ¦ 0.38 ¦ 2008-07-01 12:04:46 ¦
¦ 37 ¦ 0.34 ¦ 2008-07-01 12:18:04 ¦
¦ 0 ¦ 0.42 ¦ 2008-07-01 12:22:57 ¦
¦ 91 ¦ 0.33 ¦ 2008-07-01 12:24:11 ¦
¦ 31 ¦ 0.57 ¦ 2008-07-01 12:24:25 ¦
¦ 30 ¦ 0.35 ¦ 2008-07-01 12:39:04 ¦
¦ 10 ¦ 1.11 ¦ 2008-07-01 13:05:45 ¦
¦ 59 ¦ 0.36 ¦ 2008-07-01 13:16:16 ¦
¦ 200 ¦ 0.48 ¦ 2008-07-01 13:17:29 ¦
¦ 319 ¦ 1.98 ¦ 2008-07-01 13:27:33 ¦
¦ 30 ¦ 0.33 ¦ 2008-07-01 13:40:23 ¦
¦ 52 ¦ 0.34 ¦ 2008-07-01 13:47:01 ¦
¦ 301 ¦ 1.91 ¦ 2008-07-01 14:12:06 ¦
¦ 90 ¦ 0.85 ¦ 2008-07-01 14:13:14 ¦
¦ 10 ¦ 0.30 ¦ 2008-07-01 14:20:11 ¦
¦ 66 ¦ 0.34 ¦ 2008-07-01 14:20:16 ¦
¦ 19 ¦ 0.30 ¦ 2008-07-01 14:20:39 ¦
¦ 20 ¦ 0.43 ¦ 2008-07-01 14:33:11 ¦
¦ 240 ¦ 0.59 ¦ 2008-07-01 14:36:32 ¦
¦ 36 ¦ 0.33 ¦ 2008-07-01 14:44:37 ¦
¦ 175 ¦ 1.48 ¦ 2008-07-01 14:54:02 ¦
¦ 46 ¦ 1.75 ¦ 2008-07-01 15:13:10 ¦
¦ 35 ¦ 0.57 ¦ 2008-07-01 15:17:39 ¦
¦ 28 ¦ 0.32 ¦ 2008-07-01 15:20:09 ¦
¦ 109 ¦ 0.56 ¦ 2008-07-01 15:29:27 ¦
¦ 30 ¦ 0.47 ¦ 2008-07-01 15:34:22 ¦
¦ 25 ¦ 0.48 ¦ 2008-07-01 15:35:27 ¦
¦ 2 ¦ 0.33 ¦ 2008-07-01 15:44:28 ¦
¦ 1 ¦ 0.33 ¦ 2008-07-01 15:46:11 ¦
¦ 21 ¦ 0.37 ¦ 2008-07-01 15:52:04 ¦
¦ 119 ¦ 0.57 ¦ 2008-07-01 16:03:50 ¦
¦ 53 ¦ 0.37 ¦ 2008-07-01 16:48:14 ¦
¦ 30 ¦ 8.80 ¦ 2008-07-01 17:02:03 ¦
¦ 415 ¦ 4.72 ¦ 2008-07-01 17:05:02 ¦
¦ 10 ¦ 0.43 ¦ 2008-07-01 17:08:52 ¦
¦ 43 ¦ 0.44 ¦ 2008-07-01 17:13:32 ¦
+------+------+---------------------+

Is this normal? I mean, like we were discussing, I was thinking about some table swapping technique. But the point is: is it a really legitimate problem this one? If I had the insert demand I thought I had (5 thousand rows several times a day) ok, but do I have to use a technique like that because of so few rows inserts?

Is there something I shall do?

Thanks a lot folks!

Phoenix

PS: By the way:
I'm on a 2 Quad-core processor machine, with 16Gb ram, and RAID10 disks
And here's my my.cnf.

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
#old_passwords=1
skip-locking
key_buffer = 1024M
max_allowed_packet = 40M
table_cache = 2048
sort_buffer_size = 512M
read_buffer_size = 256M
read_rnd_buffer_size = 256M
myisam_sort_buffer_size = 256M
thread_cache_size = 16
thread_concurrency = 8
query_cache_size = 256M
tmp_table_size=64M
back_log = 100
max_connect_errors = 10000
join_buffer_size=64M
open-files = 20000
bulk_insert_buffer_size=128M
concurrent_insert=2
interactive_timeout = 600
wait_timeout = 600
max_connections=500
log-slow-queries=/var/lib/mysql/slow.log
long_query_time=1
max_heap_table_size=200M
ft_stopword_file=/etc/stopwords_pt.txt
ft_min_word_len=3

server-id = 1

physics

7:46 pm on Jul 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's interesting that inserting 1 record takes on the same order of time as inserting 200

¦ 1 ¦ 0.33 ¦ 2008-06-30 19:45:43 ¦
¦ 200 ¦ 0.54 ¦ 2008-06-30 19:50:28 ¦

I'm not sure what's causing this but it could be due to your indexes.


We're talking about a 3 million row CD/DVD inventory table, with many fulltext indexes

It might be that inserting even one record is triggering the re-index which takes about 300ms. I noticed in the above that you have ft_min_word_len=3. I often do this too but it makes things harder on the fulltext indexes!

Perhaps you can increase the number of records added per insert since it seems that the re-index is the heaviest operation. Or you could use delayed key writing:


For MyISAM tables, another strategy for reducing index flushing is to use the DELAYED_KEY_WRITE table option. With this option, data rows are written to the data file immediately as usual, but the key cache is flushed only occasionally rather than after each insert. To use delayed index flushing on a server-wide basis, start mysqld with the --delay-key-write option.

[informit.com...]

phoenix_fly

9:41 pm on Jul 3, 2008 (gmt 0)

10+ Year Member




#1 - But, tell me, is this 300 ms time ok? I mean, does mysql take all this time to add to the index the keywords from only one 244kb record? (the average row_length mysql says I have on this table)

#2 - I heard about DELAYED_KEY_WRITE option, but I didn't find out at the docs how to include it at the my.cnf. This is an command line option, right? Also, another thing that worries me about this is that updates are delayed too, and this can be bad, as we're a online store, and products may be sold twice with this delay.

physics

9:11 pm on Jul 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



#1 - I don't know if 300ms is "OK". But I have also experienced delays on indexed columns even when adding just one entry - how long depends very much on the number of entries, length of text, etc.

#2 - More info on my.cnf and how to use command line type args in my.cnf
[dev.mysql.com...]