|Index in join query|
Index in join query
| 12:02 am on Oct 16, 2010 (gmt 0)|
Hi,I am using left join in my 2 tables having around 2 million records which takes around 8 hours to run,
SELECT t1.id,t1.expense FROM t1 LEFT JOIN t2
ON (t1.expense=t2.expense AND t1.code=t2.code AND t1.name=t2.name AND t1.price=t2.price AND t1.charges=t2.charges AND t1.url=t2.url) WHERE t2.id is NULL
What should be the best index for this query as i haven't indexed yet to speed up the process from hours to minutes.
| 11:03 pm on Oct 16, 2010 (gmt 0)|
The first thing I notice when reading your post, is the large amount of columns you join on, which probably means a great deal of duplicate data between your tables. Have you considered whether it is possible to denormalize your tables? If the multiplicity of the relationship between the tables is small, that might be a good option.
That said, it is usually a good idea to put indexes on the columns you seek on, e.g. those that you join on (expense, code, name, price, charges, url and id).
Specifically, you could try creating a compound index on each table containing the columns you join on (expense, code, name, price, charges, url in that order), and an index on the id column of table t2.
Hope it helps.
| 4:53 am on Oct 17, 2010 (gmt 0)|
Thanks Peter, I have tried by putting index on field id of both tables and checked by explain query which worked well and time from 8 hours have been decreased to around 40 minutes which is great but if i can do it more tweak to get more fast and optimised results.
| 10:31 am on Oct 17, 2010 (gmt 0)|
Glad to hear it! That's 10x improvement... Let's see if we can get another 10x. ;-)
If you haven't tried it yet, I would try the compound index, that is one index covering all the columns you join on in the same order as you use them in your query. In your case, one index covering expense, code, name, price, charges and url in that order.
How many transactions do the tables get?
Also if you could post your table definitions I might be able to provide more help.
| 10:08 am on Dec 23, 2010 (gmt 0)|
It works fine and gets back to me with in 30 minutes in case of 1 million records but if it is touching 1.5 million of the records it never finishes.
Below are explain results, it returns me 2 rows as:
id=1, select_type=SIMPLE, table=t1, type=ALL, possible_keys=NULL, key=NULL, key_len=NULL, ref=NULL, rows=1548249, extra=''
id=1, select_type=SIMPLE, table=t2, type=ref, possible_keys=(model,mer) key=mer, key_len=304, ref=t1.id, rows=1, extra=Using where; Not exists
t1 table size= 704MB, t2 table size= 521MB
Any help please.
| 12:06 pm on Dec 23, 2010 (gmt 0)|
Well, it seems you are joining on a table with 1.5 million rows without ever using an index. This is bad news.
My best suggestion - which could be better if your provided you database schemata - is to get rid of t2 all together and just add a boolean column to t1. Instead of adding a row to t2 you would just check this field (update it to true). Finally you would add an index to this column and do the query as such:
|Select id, expense From t1 Where newColumn = false; |
I guarantee you this query will be fast, even with 1.5 mill rows. :)
Alternatively, the big index on all your joined columns are my only suggestion. But let my know how it works.
| 4:00 pm on Dec 23, 2010 (gmt 0)|
Thanks Peter, I have to actually compare the records of 2 tables each having approximately 1.5 million records. I am joining both tables and finding updated or new records in t1 table.
I am going with your second option and already using the big index on all my joined columns.