| How to group by one field and order by another? "group by" is overriding my "order by" |
MichaelBluejay

msg:4503582 | 5:30 pm on Oct 3, 2012 (gmt 0) | I'm trying to pull the most recent posts out of a table, but have each topic represented only once. This gets the posts, but takes more than one from each topic: | SELECT topic_id,posted,poster,message FROM posts ORDER BY posted DESC LIMIT 20 |
| Adding a GROUP BY makes sure each thread is represented only once, but then I no longer get the most recent post from each topic: | SELECT topic_id,posted,poster,message FROM posts GROUP BY topic_id ORDER BY posted DESC LIMIT 20 |
| I tried to put "ORDER BY" before "GROUP BY", but that just gives me a syntax error: | SELECT topic_id,posted,poster,message FROM posts ORDER BY posted DESC LIMIT 20[ GROUP BY topic_id |
| How should I write my query?
|
haggul

msg:4509449 | 4:15 pm on Oct 18, 2012 (gmt 0) | Google for "sql top 1 in each group" - there should be loads of articles - I'd post a link but not sure if I am allowed to?
|
LifeinAsia

msg:4509454 | 4:49 pm on Oct 18, 2012 (gmt 0) | You can do it in a 2-step process: 1) Find the max posted date for each topic- SELECT topic_id, MAX(posted) AS LastPost FROM posts GROUP BY topic_id 2) Match the posts by topic_id and LastPost The following should work on some systems: SELECT p.topic_id,p.posted,p.poster,p.message FROM posts p INNER JOIN (SELECT topic_id, MAX(posted) AS LastPost FROM posts GROUP BY topic_id) m ON (p.topic_id=m.topic_id AND p.posted=m.LastPost)
|
MichaelBluejay

msg:4509693 | 5:56 am on Oct 19, 2012 (gmt 0) | Thanks for the responses. In the meantime, I figured out another way to do it: SELECT topic_id,posted,poster,message FROM posts temp1 WHERE posted=(SELECT MAX(temp2.posted) FROM posts temp2 WHERE temp1.topic_id = temp2.topic_id) AND poster in (SELECT poster FROM posts) ORDER BY posted DESC LIMIT 20 |
|
|
LifeinAsia

msg:4509889 | 3:53 pm on Oct 19, 2012 (gmt 0) | A few problems with that method... You're making 3 calls to the posts table: 1) SELECT poster FROM posts (which has to scan the entire table and keep the results in memory) 2) SELECT MAX(temp2.posted) FROM posts 3) SELECT topic_id,posted,poster,message FROM posts temp1 I see #1 as a complete waste because it pulls everything (including duplicate poster values). Plus it doesn't do anything to match the actual poster with the posts identified in #2. In fact, the "AND poster in (SELECT poster FROM posts)" clause is basically the same as saying "AND 1=1" in this case. Try your query with and without "AND poster in (SELECT poster FROM posts)" and I suspect you'll get the same results (except that it should run faster without it). Otherwise, the logic is basically the same as what I posted- not sure which is more efficient. (And depending on your DBMS, it may rewrite both into a more efficient 3rd version under the hood. :) )
|
MichaelBluejay

msg:4510367 | 11:45 am on Oct 21, 2012 (gmt 0) | Thanks again for the help. I left something out of my initial post to keep it simple, but I guess I should be more specific: I also need to pull only posts where the poster's very first post to the whole forum was made more than 24 hours ago. This ensures that I don't pull any spam posts which I haven't yet reviewed. So, the AND isn't actually completely useless. Here's my actual code, which uses a Perl variable for the time: SELECT topic_id,posted,poster,message FROM posts temp1 WHERE posted=(SELECT MAX(temp2.posted) FROM posts temp2 WHERE temp1.topic_id = temp2.topic_id) AND poster in (SELECT poster FROM posts WHERE posted < $twentyFourHoursAgo) ORDER BY posted DESC LIMIT 20 |
| With that in mind, does my query look okay to you now?
|
|
|