homepage Welcome to WebmasterWorld Guest from 54.147.248.118
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderator: open

Databases Forum

    
How to group by one field and order by another?
"group by" is overriding my "order by"
MichaelBluejay

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4503580 posted 5:30 pm on Oct 3, 2012 (gmt 0)

I'm trying to pull the most recent posts out of a table, but have each topic represented only once.

This gets the posts, but takes more than one from each topic:
SELECT topic_id,posted,poster,message FROM posts ORDER BY posted DESC LIMIT 20


Adding a GROUP BY makes sure each thread is represented only once, but then I no longer get the most recent post from each topic:
SELECT topic_id,posted,poster,message FROM posts GROUP BY topic_id ORDER BY posted DESC LIMIT 20


I tried to put "ORDER BY" before "GROUP BY", but that just gives me a syntax error:
SELECT topic_id,posted,poster,message FROM posts ORDER BY posted DESC LIMIT 20[ GROUP BY topic_id


How should I write my query?

 

haggul

5+ Year Member



 
Msg#: 4503580 posted 4:15 pm on Oct 18, 2012 (gmt 0)

Google for "sql top 1 in each group" - there should be loads of articles - I'd post a link but not sure if I am allowed to?

LifeinAsia

WebmasterWorld Administrator lifeinasia us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4503580 posted 4:49 pm on Oct 18, 2012 (gmt 0)

You can do it in a 2-step process:
1) Find the max posted date for each topic-
SELECT topic_id, MAX(posted) AS LastPost
FROM posts
GROUP BY topic_id

2) Match the posts by topic_id and LastPost

The following should work on some systems:
SELECT p.topic_id,p.posted,p.poster,p.message
FROM posts p INNER JOIN (SELECT topic_id, MAX(posted) AS LastPost FROM posts GROUP BY topic_id) m ON (p.topic_id=m.topic_id AND p.posted=m.LastPost)

MichaelBluejay

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4503580 posted 5:56 am on Oct 19, 2012 (gmt 0)

Thanks for the responses. In the meantime, I figured out another way to do it:
SELECT topic_id,posted,poster,message FROM posts temp1 WHERE
posted=(SELECT MAX(temp2.posted) FROM posts temp2 WHERE temp1.topic_id = temp2.topic_id)
AND poster in (SELECT poster FROM posts)
ORDER BY posted DESC LIMIT 20

LifeinAsia

WebmasterWorld Administrator lifeinasia us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4503580 posted 3:53 pm on Oct 19, 2012 (gmt 0)

A few problems with that method...

You're making 3 calls to the posts table:
1) SELECT poster FROM posts (which has to scan the entire table and keep the results in memory)
2) SELECT MAX(temp2.posted) FROM posts
3) SELECT topic_id,posted,poster,message FROM posts temp1

I see #1 as a complete waste because it pulls everything (including duplicate poster values). Plus it doesn't do anything to match the actual poster with the posts identified in #2.

In fact, the "AND poster in (SELECT poster FROM posts)" clause is basically the same as saying "AND 1=1" in this case.

Try your query with and without "AND poster in (SELECT poster FROM posts)" and I suspect you'll get the same results (except that it should run faster without it).

Otherwise, the logic is basically the same as what I posted- not sure which is more efficient. (And depending on your DBMS, it may rewrite both into a more efficient 3rd version under the hood. :) )

MichaelBluejay

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4503580 posted 11:45 am on Oct 21, 2012 (gmt 0)

Thanks again for the help. I left something out of my initial post to keep it simple, but I guess I should be more specific: I also need to pull only posts where the poster's very first post to the whole forum was made more than 24 hours ago. This ensures that I don't pull any spam posts which I haven't yet reviewed. So, the AND isn't actually completely useless. Here's my actual code, which uses a Perl variable for the time:
SELECT topic_id,posted,poster,message FROM posts temp1 WHERE
posted=(SELECT MAX(temp2.posted) FROM posts temp2 WHERE temp1.topic_id = temp2.topic_id)
AND poster in (SELECT poster FROM posts WHERE posted < $twentyFourHoursAgo)
ORDER BY posted DESC LIMIT 20

With that in mind, does my query look okay to you now?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved