Forum Moderators: open
I hope you guys can help me on this. Hereīs the problem: I have a table named 'catalog', thatīs where the products are kept. Itīs something like this:
table 'catalog'
product_id
author
title
label
description
price
vendor_id
log
From time to time, my vendors send me a .txt file with their updated inventory, wich has the following columns:
author
title
label
description
price
(same as the products table, except for the 'product_id', wich is an id for my mysql database, and not theirs [wich can be many kinds of storing data] and also the vendor_id and the log)
So, thatīs when I have to check wich CDs are new at theis inventory, you know? I could just erase their entire online inventory, and then record the new one theyīve just sent, but this ainīt good because itīs important that the product_ids of the veteran products donīt change, you know?
Then, what I do? I insert the bew data on a table I call 'pivot', wich has the same fields as the ones send me, and also a received_id:
table 'pivot'
received_id
author
title
label
description
price
Then I perform this inner join below, to find out the received_ids for the products that are in fact veteran [wich will be erased from the 'pivot' table before inserting the remaining into the catalog [so there are only new products there].
select pivot.received_id
from catalog inner join pivot on
( catalog.author = pivot.author and
catalog.titulo = pivot.titulo and
catalog.label = pivot.label and
catalog.description = pivot.description and
catalog.price = pivot.price );
The problem is this inner join with multiple ons takes too much effort and time - weīre talking about a 100k records inner join!
Isnīt there another way to compare these two tables without this inner join? I heard about some 'exists' clause, but I had problems adapting it to my situation here.
I hope you guys can give me a hand here.
Thanks a lot!
phoenix_fly
CREATE TABLE `products` (
`id` smallint(6) NOT NULL auto_increment,
`cid` smallint(6) default NULL,
`Path` varchar(50) NOT NULL default '',
`Title` varchar(50) NOT NULL default '',
PRIMARY KEY (`id`),
UNIQUE KEY `IDX_products_1` (`Path`,`id`),
UNIQUE KEY `IDX_products_2` (`Path`,`cid`),
KEY `IDX_products_3` (`Path`),
KEY `IDX_products_4` (`cid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Well, in fact I donīt. You think itīd make the inner join feasible? How could I add them - sorry, Iīm not very familiar with indexes.
Once your tables grow beyond a couple of entries, indexes are essential, and will generally speed up your query immensely.
The MySQL manual has more info:
[dev.mysql.com...]
As a rule you need an index on each column being joined. I'm a bit fuzzy on MySQL and its indexes but I think you'd need to
CREATE INDEX index1 ON catalog (author, title, label, description, price);
CREATE INDEX index2 ON pivot (author, title, label, description, price);
Why not just use the natural key to solve the problem? Make it a unique constraint (not really an index) on the table and set your own key to auto-increment.
ALTER TABLE [myTable] WITH NOCHECK ADD
CONSTRAINT [myIndex] UNIQUE NONCLUSTERED
(
[author],
[title],
[label],
[price],
) ON [PRIMARY]
Now just insert all the new data, any that already exists wont be duplicated.
Be careful with indexes as they actually slow down performance for inserts (they have to be recreated), but can help speed up SELECT queries using WHERE clauses. Never have more than a couple of indexexs per table if you can help it.
select pivot.received_id
from catalog inner join pivot on
( binary catalog.author = binary pivot.author and
binary catalog.titulo = binary pivot.titulo and
binary catalog.label = binary pivot.label and
binary catalog.description = binary
pivot.description and
binary catalog.price = binary pivot.price );
Iīve put the indexes and things didnīt get any better. Then I decided to get less perfectionist, and took away all the binaries... ok, mysql, no problem, ths isnīt really thaaaat necesary for me. Well, then the join that was taking 3 minutes, surprised me with the output screen in, guess what, 1 second!
Thanks for all the inputs, my fellow coders.And sorry for not having mentioned this 'binary' issue, as I thought it might not be it. But as you directed me to the only one thing that could explain the slowness, and it didnīt solve it, I could start suspecting about the binary.
Take care
phoenix_fly