Forum Moderators: open

Message Too Old, No Replies

Full-text search on a HEAP/MEMORY table

How to efficiently search HEAP/MEMORY table for a partial text string?

         

Jonatan

4:04 pm on Apr 9, 2007 (gmt 0)

10+ Year Member



Hi,

I will have a very large (about 2-4 GB of data) MEMORY table with 3 CHAR(100) columns that I need to make a partial text string search on. MySQL full-text search would be perfect, but since FULLTEXT indices aren't available for MEMORY tables, I wonder if I can search the table efficiently in some other way?

"WHERE column LIKE '%word%'..." would work as well, but MySQL doesn't use any index on that kind of query, so it would be much slower than a full-text search.

I would be glad for any kind of useful input, thanks!

/Jonatan

justageek

12:00 am on Apr 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm. You said that a fulltext would work for you which means that you'd be searching for a single whole word or a phrase of whole words.

If you are doing single words then you could just split your text strings up into single words and then use a regular index.

If you are searching for phrases, and don't mind limiting the amount of whole words to be searched, you could break up the strings into single words and then put them back together in chunks where the chunk is no longer than the amount of words you want to search on. Just do it sequentially and it'll work.

JAG

Jonatan

3:55 pm on Apr 11, 2007 (gmt 0)

10+ Year Member



Hi and thanks for the reply.

Do you mean to divide the three columns that I want to be able to search, into many columns where each represent a word? The problem is that if I divide my three columns that I want to be searchable into many columns, I will limit the number of words that can be used and also I will have to limit the number of chars in each word. So if I know that I need to support 11 words for each of the searchable fields, and that every word can be up to 20 characters, it will be 220 bytes per field and 660 bytes for just one row. Compare this to if I use only one column per searchable field where I only need to support 75 characters. So to divide the strings I want to search for into multiple columns would result in an almost three times bigger database.

Also I would get very many ORs in the SQL-query, and I'm not sure how much this would affect the request time. Searching for 4 words would result in 4*11*3=132 ORs in the query.

I really can't see any reason for why MySQL doesn't have full-text support for MEMORY tables. Fulltext indexes are stored as b-tree indexes which MEMORY tables support.

Any other thoughts?

Thanks!

justageek

4:29 pm on Apr 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you mean to divide the three columns that I want to be able to search, into many columns where each represent a word?

Nope. Just split up the words into one column and relate those records (should equal the number of words) back to the original text string.

If you want to do more than one word and can live with a limit then just use one column again and start adding your words together until you get to the max you want to search for. Once you get to the max then start removing one word from the beginning and add one to the end of the string.

It'll work but it'll be ugly looking.

You're best bet may be to just turn on the query cache and keep a disk based table with a fulltext index. When you start the db just make sure to seed the query cache. Of course the effectiveness of seeding depends on how often the data changes.

JAG

Jonatan

4:44 pm on Apr 11, 2007 (gmt 0)

10+ Year Member



We will have about 700 rows inserted to the table every second and the nature of the data is quite temporary (one row will be in the database for an average of 4-6 hours). Therefore it would be _much_ more effective to use a MEMORY table to avoid a lot of harddrive access.

Thanks again for the replies.

justageek

4:53 pm on Apr 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm. If you have to live from disk then maybe Falcon is a choice. You could at least spread the pain around to many disks with the partitioning which would greatly increase the speed.

JAG