homepage Welcome to WebmasterWorld Guest from 54.161.240.10
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderators: physics

Databases Forum

    
MySQl GROUP BY Issue
Spire2001




msg:3709327
 3:42 pm on Jul 28, 2008 (gmt 0)

Hi all,

New member here.

I'm having an issue with figuring out a problem I'm having with MySQL's GROUP BY clause.

Let's say my table looks like this:

id ¦ file_name ¦ folder ¦ int_time
----------------------------------
1 ¦ name1.gif ¦ docs ¦ 0807281350
2 ¦ name1.gif ¦ docs ¦ 0807281353
3 ¦ name2.gif ¦ docs ¦ 0807281350
4 ¦ name2.gif ¦ docs ¦ 0807281355
5 ¦ name2.gif ¦ stuff ¦ 0807281358

I want to create a query that returns only unique names within a cerain folder. To further complicate things, I want the rows returned to be the most recently added (the highest int_time value)

"SELECT id, file_name, MAX(int_time) FROM tbl WHERE folder = 'docs' GROUP BY file_name";

...gets me close, but instead of returning the highest 'int_time' value, it seems to return the lowest 'id' value.

Any ideas? I hope what I'm looking for is clear...

 

Spire2001




msg:3709328
 3:44 pm on Jul 28, 2008 (gmt 0)

I didn't find an edit button, so I'm replying.

My expected result should be:

id file_name folder int_time
----------------------------------
2 name1.gif docs 0807281353
4 name2.gif docs 0807281355

JamieBrown




msg:3709389
 4:46 pm on Jul 28, 2008 (gmt 0)

Could you show us what results you actually get from the query you wrote? Also, what type is the int_time field?

Spire2001




msg:3709427
 5:08 pm on Jul 28, 2008 (gmt 0)

Thanks for your reply.

Small error in my original example:

id file_name date_uploaded ¦ folder int_time
----------------------------------- --------------
1 name1.gif 281350Z Jul08 ¦ docs 0807281350
2 name1.gif 281353Z Jul08 ¦ docs 0807281353
3 name2.gif 281350Z Jul08 ¦ docs 0807281350
4 name2.gif 281355Z Jul08 ¦ docs 0807281355
5 name2.gif 281358Z Jul08 ¦ stuff 0807281358

"SELECT id, file_name, date_uploaded, folder, MAX(int_time) FROM tbl WHERE folder = 'docs' GROUP BY file_name";

Results returned are:

id file_name date_uploaded ¦ folder int_time
--------------------------------------------------
1 name1.gif 281350Z Jul08 ¦ docs 0807281353
3 name2.gif 281350Z Jul08 ¦ docs 0807281355

Note that the correct 'int_time' results are returned. They do not however correspond to the 'date_uploaded' field for the same row. data type of the 'int_time' field is INT(12).

Again, what I would like to have is a single row for each unique file name within each folder, and for said row to be of the latest addition. Such as:

id file_name date_uploaded ¦ folder int_time
----------------------------------- --------------
2 name1.gif 281353Z Jul08 ¦ docs 0807281353
4 name2.gif 281355Z Jul08 ¦ docs 0807281355

Each file name is returned once (per folder). Each row is the most recent, not the lowest 'id' #.

Demaestro




msg:3709434
 5:14 pm on Jul 28, 2008 (gmt 0)

Try this. with just the two fields first.

select distinct file_name, max( int_time) from tbl WHERE folder = 'docs' group by file_name;

[edited by: Demaestro at 5:15 pm (utc) on July 28, 2008]

Demaestro




msg:3709436
 5:17 pm on Jul 28, 2008 (gmt 0)

Crap that isn't right because of the ids.... hold on I need to try something else, but I think this is close.

Spire2001




msg:3709441
 5:23 pm on Jul 28, 2008 (gmt 0)

Thanks for your reply.

There doesn't seem to be any difference when I add DISTINCT to the query. The 'MAX(int_time)' is returning the highest value, but when I add 'date_uploaded' to the query, it's clear that what is being return is from two different rows.

JamieBrown




msg:3709447
 5:29 pm on Jul 28, 2008 (gmt 0)

Ah - I've got you. That's actually quite complex and I can't think of an elegant way to do it right now.

This is horribly nasty, but would get the result:

SELECT file_name, max_time,

(SELECT id FROM tbl WHERE tbl.file_name=maxtimes.file_name AND tbl.int_time=maxtimes.max_time ORDER BY int_time DESC LIMIT 1) AS max_id,

(SELECT date_uploaded FROM tbl WHERE tbl.file_name=maxtimes.file_name AND tbl.int_time=maxtimes.max_time ORDER BY int_time DESC LIMIT 1) AS max_date

FROM

(SELECT file_name, MAX(int_time) as max_time FROM tbl) AS maxtimes

I haven't tested it, so it may need some bug fixes. I wouldn't recommend that though unless its a really small table that's not used a lot.

Instead it might be worth actually extracting the data you need into another summary table on a regular basis.

Hopefully there's a better way I haven't thought of!

Demaestro




msg:3709461
 5:37 pm on Jul 28, 2008 (gmt 0)

Ok I think I have it.... this works in Postgres anyways.

select
tbl.*
from
tbl,
(select max(int_time) as int_time, file_name from tbl group by file_name) as max_times
where
tbl.folder = 'docs'
and max_times.int_time = tbl.int_time
and max_times.file_name = tbl.file_name

[edited by: Demaestro at 5:39 pm (utc) on July 28, 2008]

Spire2001




msg:3709466
 5:46 pm on Jul 28, 2008 (gmt 0)

@ JamieBrown -

That query returns the following error: "MIXING OF GROUP COLUMNS (MIN(), MAX(), COUNT()...) WITH NO GROUP COLUMNS IS ILLEGAL IF THERE IS NO GROUP BY CLAUSE".

I've tried to debug, but no luck no luck so far.

--------------

@ Demaestro -

Your query returns 0 results. No errors though.

--------------

Thanks to you both. I will keep trying. Any other ideas would be welcome.

bcolflesh




msg:3709472
 5:53 pm on Jul 28, 2008 (gmt 0)

SELECT t1.* FROM test AS t1 WHERE t1.folder = "docs" AND t1.int_time = ANY (SELECT max(t2.int_time) FROM test AS t2 WHERE t2.folder = "docs" GROUP BY t2.file_name)

*edit* - replace "test" w/your table name, obviously.

Spire2001




msg:3709475
 6:01 pm on Jul 28, 2008 (gmt 0)

@ bcolflesh -

Thanks for your reply, but sadly, this query also returned 0 results.

Demaestro




msg:3709476
 6:02 pm on Jul 28, 2008 (gmt 0)

Weird.... tweak mine a bit.... It should work... here is the logic thought out loud.

Tester
id file_name folder int_time
----+------------+--------+-----------
1 name_1.gif docs 807281350
2 name_1.gif docs 807281353
3 name_2.gif docs 807281350
4 name_2.gif docs 807281355

Write a query that gets the max int_time for each unque file_name.......

******************
testing=> select max(int_time), file_name from tester group by file_name;

max file_name
-----------+------------
807281353 name_1.gif
807281355 name_2.gif
(2 rows)
******************
^^^You should get this for that query^^^

Now that we have the max int_time for each unique file_name we now want to find their ids.... so add the above to another query to get the ids by matching on file_name and int_time... so like this

******************
testing=> select
tester.*
from
tester, (select max(int_time) as int_time, file_name from tester group by file_name) as max_times
where
tester.folder = 'docs'
and max_times.int_time = tester.int_time
and max_times.file_name = tester.file_name;

id file_name folder int_time
----+------------+--------+-----------
2 name_1.gif docs 807281353
4 name_2.gif docs 807281355
(2 rows)
******************

Mind you this is in Postgres but I made this table and this is what I am getting... the syntax may differ in MySQL, I assume that select statements are allowed. It could be that syntax doesn't work but if you break it into a view then it might

To do what I did in a view logic first create view based on that first query (as long as you get the same results I did)

CREATE VIEW view_name AS
select max(int_time), file_name
from tester
group by file_name

This view should have all the max int_time for each unique file_name... so then select from the view like this.

select
tbl.*
from
tbl, view_name
where
tbl.folder = 'docs'
and view_name.int_time = tbl.int_time
and view_name.file_name = tbl.file_name;

bcolflesh




msg:3709477
 6:03 pm on Jul 28, 2008 (gmt 0)

Mine works - I'm using your data with the latest version of MySQL.

ID file_name folder int_time
2 name1.gif docs 2008-07-28 13:53:00
4 name2.gif docs 2008-07-28 13:55:00

Note that I made int_time a datetime field.

Spire2001




msg:3709506
 6:23 pm on Jul 28, 2008 (gmt 0)

@ bcolflesh -

I neglected to change 'docs' to my correct folder name. I am now getting results, but the final 'GROUP BY t2.file_name' doesn't appear to be taking, as all the rows for the given 'folder' are being returned.

----------

@ Demaestro -

You query is adding an extra 'int_time' and 'file_name' column to the resultset. It also seems to be missing some rows entirely. I'm looking into why certain rows are not being returned.

bcolflesh




msg:3709598
 7:29 pm on Jul 28, 2008 (gmt 0)

What version of MySQL server are you using? As noted, the query returns the exact set you expect for me, with the exact table data you provided in your post.

JamieBrown




msg:3710049
 8:12 am on Jul 29, 2008 (gmt 0)

Sorry - try this:

SELECT file_name, max_time,

(SELECT id FROM tbl WHERE tbl.file_name=maxtimes.file_name AND tbl.int_time=maxtimes.max_time ORDER BY int_time DESC LIMIT 1) AS max_id,

(SELECT date_uploaded FROM tbl WHERE tbl.file_name=maxtimes.file_name AND tbl.int_time=maxtimes.max_time ORDER BY int_time DESC LIMIT 1) AS max_date

FROM

(SELECT file_name, MAX(int_time) as max_time FROM tbl GROUP BY file_name) AS maxtimes

Lets hope that was the only bug! :-)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved