Forum Moderators: open

Message Too Old, No Replies

Optimize query by removing UNIONS

rewrite query

         

redhatlab

11:00 am on Apr 26, 2009 (gmt 0)

10+ Year Member



Hi,

I have a big query per say and in top of that the query is also mean as a full text match query.

I have a phrase X and I split the phrase in unique combination of the words it is made of. The number of combination is (2^n)-1 where n is the number of words on the phrase.

The problem I am having is that long phrases create huge queries as I join them to execute them only once. Query are so big that a full Quad Hex with a ton of ram crash.

I am in the middle of a dilemma as the results are great, but it is killing the server.

What can I do?

Thank you

particleman

1:54 pm on Apr 26, 2009 (gmt 0)

10+ Year Member



why are you splitting those words up into separate queries? Are you talking mysql? If you can post the query it would probably be easier to help you out.

lammert

4:40 pm on Apr 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I understand it correctly, you are trying to create some sort of multiple keyword search function over a number of database text entries. This is normally implemented by building inverse indexes of all words in all texts.

For example your database table `lines` contains two entries:

ID, text
1, "the quick brown fox"
2, "the yellow dog is not brown"

You run a script over this table and constructs the following word list.

ID, word, text_ID
1, "the", 1
2, "the", 2
3, "quick", 1
4, "brown", 1
5, "brown", 2
6, "fox", 1
7, "yellow", 2
8, "dog", 2
9, "is", 2
10, "not", 2

You create a new table `words` with all these words as separate rows.

Now if you want to search for all lines in your `lines` table which contain the words "the" and "fox" you create a query like (non-optimized pseudo SQL)

SELECT `lines`.`ID` AS `ID`, `lines`.`text` AS `text`
FROM `lines`,`words` AS `w1`,`words` AS `w2`
WHERE `w1`.`word` = 'the' AND
`w2`.`word` = 'fox' AND
`w1`.`text_ID` = `w2`.`text_ID` AND
`w1`.`text_ID` = `lines`.`ID`;

This shouldn't kill your server if you add more instances of `words` in the query, if you optimize the joins correctly.