homepage Welcome to WebmasterWorld Guest from 23.23.22.200
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderators: physics

Databases Forum

    
mysql query help
enotalone




msg:4091029
 8:43 pm on Mar 3, 2010 (gmt 0)

Hello, my mysql knowledge is limited and after doing some reading, tests I am unable to find any solution and would appreciate any help. The task is to find products that with specific tags.

The main products table

CREATE TABLE IF NOT EXISTS `ds_item` (
`iid` int(11) NOT NULL auto_increment,
`status` int(5) NOT NULL default '0',
`title` varchar(255) NOT NULL default '',
`date` int(11) NOT NULL default '0',
`dupdate` int(11) NOT NULL default '0',
`sale` smallint(2) NOT NULL default '0',
`price_list` varchar(50) NOT NULL default '0.00',
`price_sale` varchar(50) NOT NULL default '0.00',
`price_sale_avg` decimal(15,2) NOT NULL default '0.00',
`price_save` decimal(15,2) NOT NULL default '0.00',
`store_id` int(11) NOT NULL default '0',
`pid` varchar(110) NOT NULL,
`sku` varchar(100) NOT NULL,
`upc` varchar(100) NOT NULL,
`url` text NOT NULL,
`description` text NOT NULL,
PRIMARY KEY (`iid`),
KEY `store_id` (`store_id`),
KEY `status` (`status`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=2124683 ;


tags' table

CREATE TABLE IF NOT EXISTS `ds_item_cat` (
`rel_id` int(11) NOT NULL auto_increment,
`iid` int(11) NOT NULL default '0',
`cat` int(11) NOT NULL default '0',
PRIMARY KEY (`rel_id`),
KEY `iid_cat` (`iid`,`cat`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=6127161 ;


stores' table

CREATE TABLE IF NOT EXISTS `ds_store` (
`store_id` int(11) NOT NULL auto_increment,
`store_name` varchar(255) NOT NULL default '',
`store_status` smallint(3) NOT NULL default '0',
`currency` char(3) NOT NULL default 'usd',
`note` varchar(255) NOT NULL,
PRIMARY KEY (`store_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=539 ;


now a sample query to get Women's Black Steve Madden Sandals

SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency FROM ds_item
INNER JOIN ds_item_cat AS c1 ON ds_item.iid=c1.iid
INNER JOIN ds_item_cat AS gender ON ds_item.iid=gender.iid AND gender.cat='4118'
INNER JOIN ds_item_cat AS brand ON ds_item.iid=brand.iid AND brand.cat='715'
INNER JOIN ds_item_cat AS color ON ds_item.iid=color.iid AND color.cat='4127'
LEFT JOIN ds_store ON (ds_item.store_id=ds_store.store_id)
WHERE status='0' AND c1.cat IN ('4160')
ORDER BY sale DESC, ds_item.dupdate DESC
LIMIT 0, 10

The problem: when I run explain on this query it shows that about 93947 rows need to be scanned. Any way to avoid this by optimizing the query, tables? Thank you.


the results of explain

[img218.imageshack.us...]

 

syber




msg:4091426
 3:05 pm on Mar 4, 2010 (gmt 0)

Is there any reason that you need to JOIN ds_item_cat 5 times? Also doing a LEFT JOIN at the end is dragging everything back, even if it doesn't match.

Couldn't you just say:

WHERE status='0' AND c1.cat = '4160'

to find what you are looking for?

enotalone




msg:4091497
 4:33 pm on Mar 4, 2010 (gmt 0)

Thanks syber, that many joins are needed to pull products that are color: white, gender:women, brand:Steve Madden and category:Sandals.

with WHERE status='0' AND c1.cat = '4160' it would show sandals (cat='4160') that are in stock (status='0').

The last left join is simply use to connect store information where they are sold to products table on a common key (store_id). I will try the query without joining ds_store, but I have a feeling it is not going to change much.

enotalone




msg:4091499
 4:36 pm on Mar 4, 2010 (gmt 0)

Just tried without the last join, no difference.

explain SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description FROM ds_item
INNER JOIN ds_item_cat AS c1 ON ds_item.iid=c1.iid
INNER JOIN ds_item_cat AS gender ON ds_item.iid=gender.iid AND gender.cat='4118'
INNER JOIN ds_item_cat AS brand ON ds_item.iid=brand.iid AND brand.cat='715'
INNER JOIN ds_item_cat AS color ON ds_item.iid=color.iid AND color.cat='4127'
WHERE status='0' AND c1.cat IN ('4160')
ORDER BY sale DESC, ds_item.dupdate DESC
LIMIT 0, 10

explain shows 152341 rows being scanned.

LifeinAsia




msg:4091514
 4:48 pm on Mar 4, 2010 (gmt 0)

It *might* be more efficient if you combine the ds_item_cat JOINs with something like (may need to do some tweaking):
SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency
FROM ds_item INNER JOIN (SELECT ds_item.iid FROM ds_item_cat AS gender ON ds_item.iid=gender.iid AND gender.cat='4118'
INNER JOIN ds_item_cat AS brand ON ds_item.iid=brand.iid AND brand.cat='715'
INNER JOIN ds_item_cat AS color ON ds_item.iid=color.iid AND color.cat='4127') AS c1 ON ds_item.iid=c1.iid
LEFT JOIN ds_store ON (ds_item.store_id=ds_store.store_id)
WHERE status='0' AND c1.cat IN ('4160')
ORDER BY sale DESC, ds_item.dupdate DESC

You'll have to try and see if that helps any.

syber




msg:4091516
 4:53 pm on Mar 4, 2010 (gmt 0)

I still don't understand why you need to do so many joins of the same table. Can't you filter on gender, brand and color just using a where clause?

LifeinAsia




msg:4091621
 6:56 pm on Mar 4, 2010 (gmt 0)

Those are all different rows in the same table. How do you propose to write the WHERE statement?

LifeinAsia




msg:4091625
 7:02 pm on Mar 4, 2010 (gmt 0)

Actually, a more efficient way might be something like:
SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency
FROM ds_item INNER JOIN (SELECT iid
FROM ds_item_cat
WHERE cat IN ('4118','715','4127')
GROUP BY iid
HAVING COUNT(*)=3) AS c1 ON ds_item.iid=c1.iid
LEFT JOIN ds_store ON (ds_item.store_id=ds_store.store_id)
WHERE status='0' AND c1.cat IN ('4160')
ORDER BY sale DESC, ds_item.dupdate DESC

enotalone




msg:4091677
 8:35 pm on Mar 4, 2010 (gmt 0)

Thank you LifeinAsia, I tried the sql in your last post, it was saying unknown column c1.cat so I changed it around to:

explain SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency
FROM ds_item INNER JOIN (SELECT iid
FROM ds_item_cat
WHERE cat IN ('4118','715','4127', '4160')
GROUP BY iid
HAVING COUNT(*)=4) AS c1 ON ds_item.iid=c1.iid
LEFT JOIN ds_store ON (ds_item.store_id=ds_store.store_id)
WHERE status='0'
ORDER BY sale DESC, ds_item.dupdate DESC

But now according to explain ds_item_cat needs to scan 1761475 rows.

syber




msg:4091732
 9:42 pm on Mar 4, 2010 (gmt 0)

try correlated subqueries:


SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency
FROM ds_item INNER JOIN ds_item_cat AS c1 ON ds_item.iid=c1.iid
WHERE status='0' and cat = '4160' AND
c1.iid IN (SELECT iid FROM ds_item_cat
WHERE iid = c1.iid AND cat = '4118') AND
c1.iid IN (SELECT iid FROM ds_item_cat
WHERE iid = c1.iid AND cat = '715') AND
c1.iid IN (SELECT iid FROM ds_item_cat
WHERE iid = c1.iid AND cat = '4127')

LifeinAsia




msg:4091733
 9:43 pm on Mar 4, 2010 (gmt 0)

See if switching the order helps:
explain SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency
FROM (SELECT iid
FROM ds_item_cat
WHERE cat IN ('4118','715','4127', '4160')
GROUP BY iid
HAVING COUNT(*)=4) AS c1 INNER JOIN ds_item ON ds_item.iid=c1.iid
LEFT JOIN ds_store ON (ds_item.store_id=ds_store.store_id)
WHERE status='0'
ORDER BY sale DESC, ds_item.dupdate DESC

It may be that with your table structure, your original query was the most efficient. How many rows do you have in the ds_item and ds_item_cat tables?

enotalone




msg:4091743
 9:59 pm on Mar 4, 2010 (gmt 0)

Thanks syber, the query works but the explain shows 152698 rows being scanned by ds_item.

Thanks LifeinAsia, works but did not change anything, still ds_item_cat scans 1761588 rows. at the moment ds_item has 480,007 rows, ds_item_cat 1,761,588 rows. I am starting to think like you said that there just might be no better way of doing this.

ds_item_cat has a key "KEY `iid_cat` (`iid`,`cat`)" (table structure in my first post). I am not sure if indexing iid and cat separately will improve anything.

syber




msg:4091771
 11:00 pm on Mar 4, 2010 (gmt 0)

one more query to try:


SELECT ds_item.iid, title, sale, price_list, price_sale, price_sale_avg, ds_item.store_id, description, store_name, currency
FROM ds_item INNER JOIN ds_item_cat AS c1 ON ds_item.iid=c1.iid
WHERE status='0' and cat = '4160' AND
EXISTS (SELECT * FROM ds_item_cat
WHERE iid = c1.iid AND cat = '4118') AND
EXISTS (SELECT * FROM ds_item_cat
WHERE iid = c1.iid AND cat = '715') AND
EXISTS (SELECT * FROM ds_item_cat
WHERE iid = c1.iid AND cat = '4127')

enotalone




msg:4091787
 11:29 pm on Mar 4, 2010 (gmt 0)

Thanks Syber, still the explain shows 152698 rows being scanned. I think either I have to change my keys, table structure somehow or get used to the idea that I have a query who enjoys scanning rows.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved