Forum Moderators: open
New member here.
I'm having an issue with figuring out a problem I'm having with MySQL's GROUP BY clause.
Let's say my table looks like this:
id ¦ file_name ¦ folder ¦ int_time
----------------------------------
1 ¦ name1.gif ¦ docs ¦ 0807281350
2 ¦ name1.gif ¦ docs ¦ 0807281353
3 ¦ name2.gif ¦ docs ¦ 0807281350
4 ¦ name2.gif ¦ docs ¦ 0807281355
5 ¦ name2.gif ¦ stuff ¦ 0807281358
I want to create a query that returns only unique names within a cerain folder. To further complicate things, I want the rows returned to be the most recently added (the highest int_time value)
"SELECT id, file_name, MAX(int_time) FROM tbl WHERE folder = 'docs' GROUP BY file_name";
...gets me close, but instead of returning the highest 'int_time' value, it seems to return the lowest 'id' value.
Any ideas? I hope what I'm looking for is clear...
Small error in my original example:
id ¦ file_name ¦ date_uploaded ¦ folder ¦ int_time
----------------------------------- --------------
1 ¦ name1.gif ¦ 281350Z Jul08 ¦ docs ¦ 0807281350
2 ¦ name1.gif ¦ 281353Z Jul08 ¦ docs ¦ 0807281353
3 ¦ name2.gif ¦ 281350Z Jul08 ¦ docs ¦ 0807281350
4 ¦ name2.gif ¦ 281355Z Jul08 ¦ docs ¦ 0807281355
5 ¦ name2.gif ¦ 281358Z Jul08 ¦ stuff ¦ 0807281358
"SELECT id, file_name, date_uploaded, folder, MAX(int_time) FROM tbl WHERE folder = 'docs' GROUP BY file_name";
Results returned are:
id ¦ file_name ¦ date_uploaded ¦ folder ¦ int_time
--------------------------------------------------
1 ¦ name1.gif ¦ 281350Z Jul08 ¦ docs ¦ 0807281353
3 ¦ name2.gif ¦ 281350Z Jul08 ¦ docs ¦ 0807281355
Note that the correct 'int_time' results are returned. They do not however correspond to the 'date_uploaded' field for the same row. data type of the 'int_time' field is INT(12).
Again, what I would like to have is a single row for each unique file name within each folder, and for said row to be of the latest addition. Such as:
id ¦ file_name ¦ date_uploaded ¦ folder ¦ int_time
----------------------------------- --------------
2 ¦ name1.gif ¦ 281353Z Jul08 ¦ docs ¦ 0807281353
4 ¦ name2.gif ¦ 281355Z Jul08 ¦ docs ¦ 0807281355
Each file name is returned once (per folder). Each row is the most recent, not the lowest 'id' #.
This is horribly nasty, but would get the result:
SELECT file_name, max_time,
(SELECT id FROM tbl WHERE tbl.file_name=maxtimes.file_name AND tbl.int_time=maxtimes.max_time ORDER BY int_time DESC LIMIT 1) AS max_id,
(SELECT date_uploaded FROM tbl WHERE tbl.file_name=maxtimes.file_name AND tbl.int_time=maxtimes.max_time ORDER BY int_time DESC LIMIT 1) AS max_date
FROM
(SELECT file_name, MAX(int_time) as max_time FROM tbl) AS maxtimes
I haven't tested it, so it may need some bug fixes. I wouldn't recommend that though unless its a really small table that's not used a lot.
Instead it might be worth actually extracting the data you need into another summary table on a regular basis.
Hopefully there's a better way I haven't thought of!
select
tbl.*
from
tbl,
(select max(int_time) as int_time, file_name from tbl group by file_name) as max_times
where
tbl.folder = 'docs'
and max_times.int_time = tbl.int_time
and max_times.file_name = tbl.file_name
[edited by: Demaestro at 5:39 pm (utc) on July 28, 2008]
That query returns the following error: "MIXING OF GROUP COLUMNS (MIN(), MAX(), COUNT()...) WITH NO GROUP COLUMNS IS ILLEGAL IF THERE IS NO GROUP BY CLAUSE".
I've tried to debug, but no luck no luck so far.
--------------
@ Demaestro -
Your query returns 0 results. No errors though.
--------------
Thanks to you both. I will keep trying. Any other ideas would be welcome.
Tester
id ¦ file_name ¦ folder ¦ int_time
----+------------+--------+-----------
1 ¦ name_1.gif ¦ docs ¦ 807281350
2 ¦ name_1.gif ¦ docs ¦ 807281353
3 ¦ name_2.gif ¦ docs ¦ 807281350
4 ¦ name_2.gif ¦ docs ¦ 807281355
Write a query that gets the max int_time for each unque file_name.......
******************
testing=> select max(int_time), file_name from tester group by file_name;
max ¦ file_name
-----------+------------
807281353 ¦ name_1.gif
807281355 ¦ name_2.gif
(2 rows)
******************
^^^You should get this for that query^^^
Now that we have the max int_time for each unique file_name we now want to find their ids.... so add the above to another query to get the ids by matching on file_name and int_time... so like this
******************
testing=> select
tester.*
from
tester, (select max(int_time) as int_time, file_name from tester group by file_name) as max_times
where
tester.folder = 'docs'
and max_times.int_time = tester.int_time
and max_times.file_name = tester.file_name;
id ¦ file_name ¦ folder ¦ int_time
----+------------+--------+-----------
2 ¦ name_1.gif ¦ docs ¦ 807281353
4 ¦ name_2.gif ¦ docs ¦ 807281355
(2 rows)
******************
Mind you this is in Postgres but I made this table and this is what I am getting... the syntax may differ in MySQL, I assume that select statements are allowed. It could be that syntax doesn't work but if you break it into a view then it might
To do what I did in a view logic first create view based on that first query (as long as you get the same results I did)
CREATE VIEW view_name AS
select max(int_time), file_name
from tester
group by file_name
This view should have all the max int_time for each unique file_name... so then select from the view like this.
select
tbl.*
from
tbl, view_name
where
tbl.folder = 'docs'
and view_name.int_time = tbl.int_time
and view_name.file_name = tbl.file_name;
I neglected to change 'docs' to my correct folder name. I am now getting results, but the final 'GROUP BY t2.file_name' doesn't appear to be taking, as all the rows for the given 'folder' are being returned.
----------
@ Demaestro -
You query is adding an extra 'int_time' and 'file_name' column to the resultset. It also seems to be missing some rows entirely. I'm looking into why certain rows are not being returned.
SELECT file_name, max_time,
(SELECT id FROM tbl WHERE tbl.file_name=maxtimes.file_name AND tbl.int_time=maxtimes.max_time ORDER BY int_time DESC LIMIT 1) AS max_id,
(SELECT date_uploaded FROM tbl WHERE tbl.file_name=maxtimes.file_name AND tbl.int_time=maxtimes.max_time ORDER BY int_time DESC LIMIT 1) AS max_date
FROM
(SELECT file_name, MAX(int_time) as max_time FROM tbl GROUP BY file_name) AS maxtimes
Lets hope that was the only bug! :-)