| mysql query help
|
enotalone

msg:4091029 | 8:43 pm on Mar 3, 2010 (gmt 0) | Hello, my mysql knowledge is limited and after doing some reading, tests I am unable to find any solution and would appreciate any help. The task is to find products that with specific tags. The main products table CREATE TABLE IF NOT EXISTS `ds_item` ( `iid` int(11) NOT NULL auto_increment, `status` int(5) NOT NULL default '0', `title` varchar(255) NOT NULL default '', `date` int(11) NOT NULL default '0', `dupdate` int(11) NOT NULL default '0', `sale` smallint(2) NOT NULL default '0', `price_list` varchar(50) NOT NULL default '0.00', `price_sale` varchar(50) NOT NULL default '0.00', `price_sale_avg` decimal(15,2) NOT NULL default '0.00', `price_save` decimal(15,2) NOT NULL default '0.00', `store_id` int(11) NOT NULL default '0', `pid` varchar(110) NOT NULL, `sku` varchar(100) NOT NULL, `upc` varchar(100) NOT NULL, `url` text NOT NULL, `description` text NOT NULL, PRIMARY KEY (`iid`), KEY `store_id` (`store_id`), KEY `status` (`status`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=2124683 ; tags' table CREATE TABLE IF NOT EXISTS `ds_item_cat` ( `rel_id` int(11) NOT NULL auto_increment, `iid` int(11) NOT NULL default '0', `cat` int(11) NOT NULL default '0', PRIMARY KEY (`rel_id`), KEY `iid_cat` (`iid`,`cat`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=6127161 ; stores' table CREATE TABLE IF NOT EXISTS `ds_store` ( `store_id` int(11) NOT NULL auto_increment, `store_name` varchar(255) NOT NULL default '', `store_status` smallint(3) NOT NULL default '0', `currency` char(3) NOT NULL default 'usd', `note` varchar(255) NOT NULL, PRIMARY KEY (`store_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=539 ; now a sample query to get Women's Black Steve Madden Sandals SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency FROM ds_item INNER JOIN ds_item_cat AS c1 ON ds_item.iid=c1.iid INNER JOIN ds_item_cat AS gender ON ds_item.iid=gender.iid AND gender.cat='4118' INNER JOIN ds_item_cat AS brand ON ds_item.iid=brand.iid AND brand.cat='715' INNER JOIN ds_item_cat AS color ON ds_item.iid=color.iid AND color.cat='4127' LEFT JOIN ds_store ON (ds_item.store_id=ds_store.store_id) WHERE status='0' AND c1.cat IN ('4160') ORDER BY sale DESC, ds_item.dupdate DESC LIMIT 0, 10 The problem: when I run explain on this query it shows that about 93947 rows need to be scanned. Any way to avoid this by optimizing the query, tables? Thank you. the results of explain [img218.imageshack.us...]
|
syber

msg:4091426 | 3:05 pm on Mar 4, 2010 (gmt 0) | Is there any reason that you need to JOIN ds_item_cat 5 times? Also doing a LEFT JOIN at the end is dragging everything back, even if it doesn't match. Couldn't you just say: WHERE status='0' AND c1.cat = '4160' to find what you are looking for?
|
enotalone

msg:4091497 | 4:33 pm on Mar 4, 2010 (gmt 0) | Thanks syber, that many joins are needed to pull products that are color: white, gender:women, brand:Steve Madden and category:Sandals. with WHERE status='0' AND c1.cat = '4160' it would show sandals (cat='4160') that are in stock (status='0'). The last left join is simply use to connect store information where they are sold to products table on a common key (store_id). I will try the query without joining ds_store, but I have a feeling it is not going to change much.
|
enotalone

msg:4091499 | 4:36 pm on Mar 4, 2010 (gmt 0) | Just tried without the last join, no difference. explain SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description FROM ds_item INNER JOIN ds_item_cat AS c1 ON ds_item.iid=c1.iid INNER JOIN ds_item_cat AS gender ON ds_item.iid=gender.iid AND gender.cat='4118' INNER JOIN ds_item_cat AS brand ON ds_item.iid=brand.iid AND brand.cat='715' INNER JOIN ds_item_cat AS color ON ds_item.iid=color.iid AND color.cat='4127' WHERE status='0' AND c1.cat IN ('4160') ORDER BY sale DESC, ds_item.dupdate DESC LIMIT 0, 10 explain shows 152341 rows being scanned.
|
LifeinAsia

msg:4091514 | 4:48 pm on Mar 4, 2010 (gmt 0) | It *might* be more efficient if you combine the ds_item_cat JOINs with something like (may need to do some tweaking): SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency FROM ds_item INNER JOIN (SELECT ds_item.iid FROM ds_item_cat AS gender ON ds_item.iid=gender.iid AND gender.cat='4118' INNER JOIN ds_item_cat AS brand ON ds_item.iid=brand.iid AND brand.cat='715' INNER JOIN ds_item_cat AS color ON ds_item.iid=color.iid AND color.cat='4127') AS c1 ON ds_item.iid=c1.iid LEFT JOIN ds_store ON (ds_item.store_id=ds_store.store_id) WHERE status='0' AND c1.cat IN ('4160') ORDER BY sale DESC, ds_item.dupdate DESC You'll have to try and see if that helps any.
|
syber

msg:4091516 | 4:53 pm on Mar 4, 2010 (gmt 0) | I still don't understand why you need to do so many joins of the same table. Can't you filter on gender, brand and color just using a where clause?
|
LifeinAsia

msg:4091621 | 6:56 pm on Mar 4, 2010 (gmt 0) | Those are all different rows in the same table. How do you propose to write the WHERE statement?
|
LifeinAsia

msg:4091625 | 7:02 pm on Mar 4, 2010 (gmt 0) | Actually, a more efficient way might be something like: SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency FROM ds_item INNER JOIN (SELECT iid FROM ds_item_cat WHERE cat IN ('4118','715','4127') GROUP BY iid HAVING COUNT(*)=3) AS c1 ON ds_item.iid=c1.iid LEFT JOIN ds_store ON (ds_item.store_id=ds_store.store_id) WHERE status='0' AND c1.cat IN ('4160') ORDER BY sale DESC, ds_item.dupdate DESC
|
enotalone

msg:4091677 | 8:35 pm on Mar 4, 2010 (gmt 0) | Thank you LifeinAsia, I tried the sql in your last post, it was saying unknown column c1.cat so I changed it around to: explain SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency FROM ds_item INNER JOIN (SELECT iid FROM ds_item_cat WHERE cat IN ('4118','715','4127', '4160') GROUP BY iid HAVING COUNT(*)=4) AS c1 ON ds_item.iid=c1.iid LEFT JOIN ds_store ON (ds_item.store_id=ds_store.store_id) WHERE status='0' ORDER BY sale DESC, ds_item.dupdate DESC But now according to explain ds_item_cat needs to scan 1761475 rows.
|
syber

msg:4091732 | 9:42 pm on Mar 4, 2010 (gmt 0) | try correlated subqueries:
SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency FROM ds_item INNER JOIN ds_item_cat AS c1 ON ds_item.iid=c1.iid WHERE status='0' and cat = '4160' AND c1.iid IN (SELECT iid FROM ds_item_cat WHERE iid = c1.iid AND cat = '4118') AND c1.iid IN (SELECT iid FROM ds_item_cat WHERE iid = c1.iid AND cat = '715') AND c1.iid IN (SELECT iid FROM ds_item_cat WHERE iid = c1.iid AND cat = '4127')
|
LifeinAsia

msg:4091733 | 9:43 pm on Mar 4, 2010 (gmt 0) | See if switching the order helps: explain SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency FROM (SELECT iid FROM ds_item_cat WHERE cat IN ('4118','715','4127', '4160') GROUP BY iid HAVING COUNT(*)=4) AS c1 INNER JOIN ds_item ON ds_item.iid=c1.iid LEFT JOIN ds_store ON (ds_item.store_id=ds_store.store_id) WHERE status='0' ORDER BY sale DESC, ds_item.dupdate DESC It may be that with your table structure, your original query was the most efficient. How many rows do you have in the ds_item and ds_item_cat tables?
|
enotalone

msg:4091743 | 9:59 pm on Mar 4, 2010 (gmt 0) | Thanks syber, the query works but the explain shows 152698 rows being scanned by ds_item. Thanks LifeinAsia, works but did not change anything, still ds_item_cat scans 1761588 rows. at the moment ds_item has 480,007 rows, ds_item_cat 1,761,588 rows. I am starting to think like you said that there just might be no better way of doing this. ds_item_cat has a key "KEY `iid_cat` (`iid`,`cat`)" (table structure in my first post). I am not sure if indexing iid and cat separately will improve anything.
|
syber

msg:4091771 | 11:00 pm on Mar 4, 2010 (gmt 0) | one more query to try:
SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency FROM ds_item INNER JOIN ds_item_cat AS c1 ON ds_item.iid=c1.iid WHERE status='0' and cat = '4160' AND EXISTS (SELECT * FROM ds_item_cat WHERE iid = c1.iid AND cat = '4118') AND EXISTS (SELECT * FROM ds_item_cat WHERE iid = c1.iid AND cat = '715') AND EXISTS (SELECT * FROM ds_item_cat WHERE iid = c1.iid AND cat = '4127')
|
enotalone

msg:4091787 | 11:29 pm on Mar 4, 2010 (gmt 0) | Thanks Syber, still the explain shows 152698 rows being scanned. I think either I have to change my keys, table structure somehow or get used to the idea that I have a query who enjoys scanning rows.
|
|
|