Forum Moderators: open

Message Too Old, No Replies

multiple self join

         

jshanman

4:55 pm on Oct 24, 2007 (gmt 0)

10+ Year Member



Is there a better way to compare against the same table other then multiple self joins or multiple queries?

Table: entry
entry_id ¦ class_id ¦ sort_order ¦ status

Table: data
data_id ¦ entry_id ¦ col_id ¦ data_int

SELECT DISTINCT entry.entry_id
FROM entry,data as d0,data as d1,data as d2
WHERE
entry.class_id = '6' AND
entry.status = '1' AND
(
(entry.entry_id = d0.entry_id AND
d0.data_int = '99' AND
d0.col_id = '26'
) OR (
entry.entry_id = d1.entry_id AND
d1.data_int = '101' AND
d1.col_id = '26'
) OR (
entry.entry_id = d2.entry_id AND
d2.data_int = '109' AND
d2.col_id = '26')
)
ORDER BY entry.sort_order,entry.entry_id DESC;

I get this: #1053 - Server shutdown in progress
this takes about 30 seconds to come back with this error.

However, if I run the same query without the third data link (d2) and its conditions, it returns the results fine very quickly.

Any Thoughts?

LifeinAsia

5:39 pm on Oct 24, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I believe the following gives you the exact same logic:
SELECT DISTINCT entry.entry_id
FROM entry,data
WHERE
entry.class_id = '6' AND
entry.status = '1' AND
entry.entry_id = data.entry_id AND
data.col_id = '26' AND
data.data_int IN ('99','101','109')
ORDER BY entry.sort_order,entry.entry_id DESC;

(However, depending on your DB, it might balk at including a column in your ORDER BY statement that's not part of the SELECT.)

Oh, and to speed things up, make sure the tables are indexed properly. And are all those values really VARCHAR? Changing to INT should greatly speed things.

[edited by: LifeinAsia at 5:41 pm (utc) on Oct. 24, 2007]

jshanman

5:58 pm on Oct 24, 2007 (gmt 0)

10+ Year Member



All these columns are bigint.

The concept of the application is that I want to filter entry_id off any data_id,col_id for multiple values, so while your shortened version makes sense in this case (since all col_id's where the same), it won't in the way the query is built. The user can try to filter for multiple values on multiple columns.

My main question is why would the db fail when self joining 3 tables? Is there a limit to the number of self joins I can/should make?

btw, I'm using mysql 5.

Thanks for the reply.

indexes are created for all these columns as well...

LifeinAsia

6:11 pm on Oct 24, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If the columns are BIGINT, then drop the single quotes around the values- you're probably wasting processing converting the strings to integers.

I don't see why you need to alias the data table as 3 distinct identities. I don't know th einternal workings of MYSQL, but you could be wasting resources anyway.

Without seeing the actual query you're trying to accomplish, it's hard to give further advice. But from what you gave, there is a lot of duplication in each ORed section that can be brought out logically to save processing. So diagram out the logic of the actual query and see if there is anything you can save by moving common calls out of the ORs.

jshanman

3:15 pm on Oct 25, 2007 (gmt 0)

10+ Year Member



Here is a better example of a now working query:

SELECT DISTINCT entry.entry_id
FROM
entry,
data as d25,
data_disp as dd25,
data as d26,
data as d28,
data as d29
WHERE
entry.class_id = '6' AND
entry.status = '1' AND
(
(entry.entry_id = d25.entry_id AND
d25.col_id = 25 AND
d25.data_id = dd25.data_id AND
dd25.data_lang_id = 1 AND
dd25.data_disp_varchar LIKE '%Lo%'
) AND (
entry.entry_id = d26.entry_id AND
d26.data_int IN(99,101,100) AND
d26.col_id = 26
) AND (
entry.entry_id = d28.entry_id AND
d28.data_int IN(116,97) AND
d28.col_id = 28
) AND (
entry.entry_id = d29.entry_id AND
d29.data_int IN(102,105) AND
d29.col_id = 29)
)
ORDER BY entry.entry_id DESC;

I put a bunch more logic in to detect if what the user entered was numeric, then don't include the ' around the value. I also detected if there were multiples of the same col_id and put those values in an IN as you suggested.

The query works great now, thanks for the pointers. I was thinking to 'big picture' on how the whole query setup needed to happen.

LifeinAsia

3:23 pm on Oct 25, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Glad it's working now. But again, I don't see the reason to create multiple aliases for the data table.

Also, since everything is ANDed (unless the stuff you left out changes that), you can drop the parentheses.

jshanman

4:45 pm on Oct 25, 2007 (gmt 0)

10+ Year Member



The multiple aliases are needed because there are possibly many data_id's per entry_id per col_id...

it's hard to explain.

the ('s are mainly for my sanity when I look at the query that my php pukes out :)

Thanks again.