Forum Moderators: coopster

Message Too Old, No Replies

Select distinct in sub query

Can't get my head around it

         

chrisjoha

5:09 pm on Sep 7, 2005 (gmt 0)

10+ Year Member



A very simplified version of one of my database tables looks like this:

city_pairs (
pairid,
city1,
city2
)

The city_pairs table contain cities in pairs where a pair may occur twice
row 1: city1: 'new york', city2: 'new mexico'
row 2: city1: 'new mexico', city2: 'new york'

This is expected behaviour. Now I need to select the pairid of all pairs, but I want unique pairs. Ie, in the example above I want only the firs row. How can I do this?

I tried something like
SELECT pairid FROM city_pairs WHERE city1 in
(SELECT city1 FROM (SELECT distinct city1, city2 FROM city_pairs) as distinct_cities)

I think this query looks somewhat clumsy, and even more important, it doesn't act as expected - it returns all rows. I guess this is because ("city1", "city2") and ("city2", "city1") really are two distinct rows... Any takers?

coopster

7:57 pm on Sep 7, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Could the pairs ever occur more than twice? For example, is this ever possible?
1, 'New York', 'New Mexico' 
2, 'New Mexico', 'New York'
...
8, 'New York', 'New Mexico'

chrisjoha

8:22 pm on Sep 7, 2005 (gmt 0)

10+ Year Member



No, all city pairs appear twice, once as (city1, city2) and once as (city2, city1). Ah, it feels so obvious, yet I cannot see it... If it helps all the cities are stored in a separate table. I've tried many combinations but they all end up giving me all hits.

coopster

8:29 pm on Sep 7, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Then you could join the table to itself, pulling only those rows where a column in the right hand table is null (no duplicate record) or only one of the unique id columns (one is always going to be less than the other, assuming the
pairid
is a unique identifier).
SELECT 
a.pairid,
a.city1,
a.city2
FROM city_pairs AS a
LEFT JOIN city_pairs AS b
ON (a.city1 = b.city2 AND a.city2 = b.city1)
WHERE b.city1 IS NULL OR a.pairid < b.pairid
;

chrisjoha

8:51 pm on Sep 7, 2005 (gmt 0)

10+ Year Member



It works! Awesome! :) Thanks alot man! I never would have though of joining the table with itself, but when you think of it, I guess it makes sense. Only thing I wonder about now is that there were no rows in "table b" that was null. Is this a test in case not all city pairs are represented twice?

coopster

9:02 pm on Sep 7, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Exactly. The LEFT JOIN will grab all rows from the left hand table represented with the ALIAS of "a". It will then try to join itself on the condition specified. Well, when there is no matching columns as specified in the join, the LEFT JOIN says to still bring back a result row and since the right hand table ("b") won't having a match, the columns come back NULL.

Have a look at message number 11 in this thread, a bit more of an explanation:
[webmasterworld.com...]