homepage Welcome to WebmasterWorld Guest from 54.197.110.151
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderator: open

Databases Forum

    
Selecting data from one table that's not in another
Matthew1980

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4643275 posted 2:14 pm on Feb 7, 2014 (gmt 0)

Hi there Guru's,

I posted this same topic last year (almost to the date!) First try! [webmasterworld.com]

I've got an issue that's driving me mad, this code:-


Select serialnumber from
serial
where id_number not in
(SELECT serialnumber FROM newbuild


Works fine on my mock table that's got about 20 rows in it (trialing the code first), then when I try it on my test server which compares the data of one table (serial, approx 6000 rows), to the other (newbuild, approx 10000 rows).

This takes in excess of 1 min. 50 secs to complete, and constantly returns ALL rows from the serial table.

I am stumped; the two rows being evaluated are both primary keys in their repective tables and are both ints.

I've checked the collation of each table and made sure that they are identical in all structure.

Why would it work on 20 rows with a 4 row check where 1 row isn't in the 20 row table!

Is there a more efficient way of checking for data present in on table that isn't in another?

Failing this; I will re design the database to be more fluent and less clunky! I've inherited this system from someone who wasn't that bothered about longevity of task.

Thanks,
MRb

 

graeme_p

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4643275 posted 8:46 am on Feb 8, 2014 (gmt 0)

Have you tried EXPLAIN?

Does using a JOIN as suggested in the previous thread make a difference?

DaveWave



 
Msg#: 4643275 posted 8:38 am on Feb 9, 2014 (gmt 0)

Never use not in, it's terrible. Use an outer join and test for null instead.

Matthew1980

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4643275 posted 2:12 pm on Feb 19, 2014 (gmt 0)

Hi there Guy's,

After MUCH deliberation and the help of going through each row manualy myself I have figured out a potential reason for why this is so slow. Yes the code works well, does what it is meant to (ameliorating my last "returns all rows" comment, it returns exactly what it should, just my interpretation of data was fuddled!), but it seems that there are ALOT more records NOT IN than are in. I was expecting there to be IRO 50-100 records difference between the two tables, but it seems that the ratio is more like 5:1 which is why (I think) there is a huge delay whilst this list is compiled.

If this logic isn't correct, then I suspect that I need to change the query and seeing as "JOIN" is being suggested, I need to try this again..

@DaveWave: I couldn't get JOIN to work, so I gave up for other methods...

@Graemw_p: I'm not sure about explain, could you give me some context on that please? I'm knee deep in a C project at the moment and this has taken a back seat for a few days, but I can't help but still ponder it!

Again, thanks for the advice.

Cheers,
MRb

DaveWave



 
Msg#: 4643275 posted 10:10 am on Feb 23, 2014 (gmt 0)

Try this:
Select s.serialnumber from
serial s
Left outer Join newbuild n on n.serialnumber =s.id_number
where id_number is null


Not sure why you have serialnumber in table Serial and you are testing for id_number in table newbuilds, bit confusuing to me but I've just taken your query and converted it to an outer join.

graeme_p

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4643275 posted 10:17 am on Feb 23, 2014 (gmt 0)

In general EXPLAIN will show you exactly what a query does. For example it will show if a subquery is executing just once of repeatedly, whether it is being optimised to a join, etc.

LifeinAsia

WebmasterWorld Administrator lifeinasia us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4643275 posted 3:26 pm on Feb 23, 2014 (gmt 0)

Make sure your tables are indexed properly- there's no reason it should take so long with such small tables. I do similar joins on tables with millions of rows and get results in less than a few seconds (and most of the time is actually from displaying the results, not the actually pulling of data).

Are you on shared hosting? Things could be slow from some other processing hog(s).

graeme_p

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4643275 posted 10:47 am on Feb 24, 2014 (gmt 0)

EXPLAIN will also tell you if the indices are being used as expected.

Also, it may help at this point to share (the relevants bits of) your database schema.

Matthew1980

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4643275 posted 10:27 pm on Mar 13, 2014 (gmt 0)

Hi there Guys,

Thanks for all of your advice on this, I'll report back soon as it will make a huge difference to a project that just keeps getting larger and larger!

I'll check out explain too, looks like it might help.

Cheers,
MRb

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved