homepage Welcome to WebmasterWorld Guest from 54.235.39.132
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
SQL "DISTINCT" and multi-fields
Hmmm... can't quite do what I want
trillianjedi




msg:1259231
 6:02 pm on Jul 16, 2005 (gmt 0)

OK, I've spent a couple of hours trying to find some suitable walk-throughs on the net to do what I need to do and failed (I'm thinking "books are good, time for a trip to the library - the internet is overrated" ;-)).

I want to do this:-

SELECT DISTINCT topic_id,poster_id,post_time FROM phpbb_posts WHERE forum_id IN (<some set>) ORDER BY post_time DESC limit 15;

To get the last 15 posts on a phpBB forum.

The problem is the DISTINCT statement works across all fields - so they'll probably all be unique anyway. I need the "uniqueness" (have to remember that one for next weeks word game) to be operative on the topic_id field only (first field).

An example of the problem is the query will return:-


1536 7813 1121535963 18
2048 3480 1121534535 1
2110 3932 1121530350 4
2110 761 1121518426 4
2048 9663 1121515337 1

Spot the two 2110's in the first column? That's where I need it to be unique and ignore the duplicate.

I'm guessing that "DISTINCT" is not the thing to use, but I can't find anything else.

Any help appreciated (and I suspect I'll be back later for help on the INNER JOIN when I've got this part working to get the username!).

Thanks,

TJ

 

dreamcatcher




msg:1259232
 6:17 pm on Jul 16, 2005 (gmt 0)

You can do:

SELECT DISTINCT(topic_id),poster_id,post_time FROM phpbb_posts WHERE forum_id IN (<some set>) GROUP BY topic_id DESC limit 15;

or maybe:

SELECT DISTINCT(topic_id) as dis_field FROM phpbb_posts WHERE forum_id IN (<some set>) ORDER BY dis_field DESC limit 15;

I think both of those might work.

[edited by: jatar_k at 6:46 pm (utc) on July 16, 2005]
[edit reason] turned off smilies [/edit]

trillianjedi




msg:1259233
 6:36 pm on Jul 16, 2005 (gmt 0)

Hi Dreamcatcher - thanks, but unfortunately the first one doesn't work (I also tried that as a guess) and the second one doesn't give me the other fields, which I need....

TJ

ergophobe




msg:1259234
 8:37 pm on Jul 16, 2005 (gmt 0)

Why doesn't the first one work? I tried a similar query on a hotel booking DB and it worked as desired

select distinct (property_id), checkin from bookings
WHERE property_id IN('18', '22', '11', '8', '15', '16', '1', '2', '3')
GROUP BY property_id
order by checkin DESC
limit 5;

It gives me the last five properties in the set to have checkin activity. Isn't that the equivalent of what you want?

trillianjedi




msg:1259235
 11:33 pm on Jul 16, 2005 (gmt 0)
OK - I wonder if I'm doing something wrong here?

Here's the exact query I'm using based on using brackets to select the "unique" field:-

[code]
SELECT DISTINCT (`topic_id`), `poster_id` , `post_time` , `forum_id`
FROM `phpbb_posts`
WHERE `forum_id` IN ("1", "3", "4", "7", "9", "10", "15", "17", "18", "21", "24")
ORDER BY `post_time` DESC
LIMIT 15
[/code]

Which produces:-

[code]
2048 3480 1121534535 1
2110 3932 1121530350 4
2110 761 1121518426 4
2048 9663 1121515337 1
[/code]

I cut this short for the example, but the two 2110's in that first field show that the DISTINCT operand doesn't seem to be working.

Or am I missing something....?

Thanks!

TJ

ergophobe




msg:1259236
 2:27 am on Jul 17, 2005 (gmt 0)

You left out the GROUP BY. Without that, you will get multiple results for a given topic_id b/c the row will not be distinct. In fact, the DISTINCT is not necessary, just the GROUP BY.

If things get way too complicated and it really can't be handled with GROUP BY and DISTINCT, I've selected just the col I want to be distinct into a temporary table so it holds only the ids I want, then just do a join on the existing table. You shouldn't have to do that here though.

ergophobe




msg:1259237
 2:29 am on Jul 17, 2005 (gmt 0)

In fact, just had a look at the manual [sunsite.mff.cuni.cz] it says much the same:


DISTINCT is converted to a GROUP BY on all columns, DISTINCT combined with ORDER BY will in many cases also need a temporary table.

trillianjedi




msg:1259238
 8:46 am on Jul 17, 2005 (gmt 0)

OK - thanks - starting to make sense to me now at least(!) but it still doesn't work (in the sense that I can't get the data I want out like that).

The query built using the GROUP BY seems to have the added side effect of sorting the resulting set into descending order by the column grouped, so:-

SELECT `topic_id` , `poster_id` , `post_time` , `forum_id`
FROM `phpbb_posts`
WHERE `forum_id`
IN ("1", "3", "4", "7", "9", "10", "15", "17", "18", "21", "24")
GROUP BY `topic_id`
ORDER BY `post_time` DESC
LIMIT 8

Produces:-


2107 9098 1121455252 24 <--- early datetimestamp
2105 10002 1121433743 10
2103 8174 1121431672 1
2102 6487 1121428825 21
2100 6052 1121423849 10
2098 9098 1121390368 24
2097 5707 1121370628 18

The left column (the grouped "topic_id") is in descending order.

If I remove the GROUP BY:-


1536 7813 1121535963 18 <--- this is one that should be on top
2048 3480 1121534535 1
2110 3932 1121530350 4
2110 761 1121518426 4
2048 9663 1121515337 1
1931 9631 1121511986 1
2048 9771 1121505161 1
2048 9771 1121503697 1

The above is more accurate (1536 is the most recently posted to thread).

Does the GROUP BY get effected before my SORT BY desc date? Is there any way that I can reverse that - so essentially I get the above table, but then grouped which would remove the duplicate instance of 2048 and 2110.

Many thanks for the help - sorry, SQL is not my area of expertise at all - I only get my feet wet with it when I have to!

Thanks,

TJ

grandpa




msg:1259239
 10:19 am on Jul 17, 2005 (gmt 0)

I have not used UNION and so this construct might need some tweaking, but it looks promising.

Notes:
If you don't use the keyword ALL for the UNION, all returned rows are unique, as if you had done a DISTINCT for the total result set.
If you want to use an ORDER BY or LIMIT clause to sort or limit the entire UNION result, parenthesize the individual SELECT statements and place the ORDER BY or LIMIT after the last one.

(SELECT topic_id FROM phpbb_posts WHERE forum_id IN ("1", "3", "4", "7", "9", "10", "15", "17", "18", "21", "24")) UNION (SELECT poster_id, post_time, forum_id FROM phpbb_posts WHERE forum_id IN ("1", "3", "4", "7", "9", "10", "15", "17", "18", "21", "24"))
ORDER BY post_time DESC
LIMIT 8

trillianjedi




msg:1259240
 11:17 am on Jul 17, 2005 (gmt 0)

Hi Grandpa,

Thanks - it did look promising, but it seems with UNION both select statements require the same number of fields:-

ERROR MySQL said:

#1222 - The used SELECT statements have a different number of columns

Tricky isn't it...

TJ

grandpa




msg:1259241
 10:46 pm on Jul 17, 2005 (gmt 0)

tj, you might want to look at the first user comment on the MySQL UNION documentation [dev.mysql.com]. By utilizing the parenthesis with a single SELECT statement you are effectively creating a UNION?

This is the reason I left a good job after they decided to migrate to *nix

ergophobe




msg:1259242
 2:50 am on Jul 18, 2005 (gmt 0)

Well, if you just can't get it to work, you can do the following.

create a temp table with a single integer column

do a select insert into the temp table where you only select the topic_id that you want

INSERT INTO tempTopicTable(topic_id) SELECT DISTINCT topic_id FROM phpbb_posts WHERE forum_id IN (<some set>) ORDER BY post_time DESC limit 15.

Now you can get whatever you want.

SELECT p.topic_id, p.poster_id,p.whatever_field
FROM phpbb_posts p, tempTopicTable t
WHERE t.topic_id = p.topic_id;

Then destroy the temp table.

Yes, it's four queries, but you are only retrieving one result set

trillianjedi




msg:1259243
 7:37 am on Jul 18, 2005 (gmt 0)

OK thanks guys.

For anyone following this, here's the query so far (now gets username etc):-


//Private/Members only forums are hidden
$visibleforums = "(\"1\", \"3\", \"4\", \"7\", \"9\", \"10\", \"15\", \"17\", \"18\", \"21\", \"24\")";

//Maximum number of posts to show
$maxposts = "15";

//main query
$sql_query = "SELECT phpbb_posts.topic_id , phpbb_posts.poster_id , phpbb_posts.post_time , phpbb_posts.forum_id , phpbb_users.username , phpbb_topics.topic_title
FROM phpbb_posts
INNER JOIN phpbb_users ON phpbb_posts.poster_id = phpbb_users.user_id
INNER JOIN phpbb_topics ON phpbb_topics.topic_id = phpbb_posts.topic_id
WHERE (
phpbb_posts.forum_id
IN ".$visibleforums."
)
ORDER BY phpbb_posts.post_time DESC

LIMIT ".$maxposts;

Yes, it's four queries, but you are only retrieving one result set

By utilizing the parenthesis with a single SELECT statement you are effectively creating a UNION?

I really wanted to keep this simple if possible - it's displayed on the homepage, so tight and small is always good.

There is already a free open-source module which does this for pnpbb2 (called MultiBlock I think) but it's a thousand lines of unnecessary code. I prefer to keep it custom and only doing exactly what I want of it.

The above code works fine, it just displays more than one post to a single thread as two entries.

I'm starting to think laterally. If I want say, the last 15 posts, I grab the last 30 from the query but only echo out the uniques (so I deal with it in my loop). Any thoughts to that, or would it be less efficient that doing the four queries?

Thanks,

TJ

grandpa




msg:1259244
 7:46 am on Jul 18, 2005 (gmt 0)

I'm starting to think laterally. If I want say, the last 15 posts, I grab the last 30 from the query but only echo out the uniques (so I deal with it in my loop).

I was tempted to suggest that option yesterday, but I really wanted to see if there was a solution as a single query. I don't think you need to worry about efficiency with such a small recordset. I would maintain a filter and a counter in the while loop - for preventing duplicates and displaying only 15 results.

trillianjedi




msg:1259245
 9:34 am on Jul 18, 2005 (gmt 0)

Thanks Grandpa - I think I'll give that a try.

I'm a bit unsure about how I maintain my list of "already used topic ID's" though? I guess I would use an array of 15 strings and check each time to see if one has already been used?

Is there a more elegant way?

Here's the basic code that follows the query executed above:-


for (;!$result->EOF; $result->MoveNext() )
{

//Let's keep link text to < 21 chars
if (strlen($result->fields[5]) > 18)
{
$subject = substr($result->fields[5], 0, 15)."...";
}
else
{
$subject = $result->fields[5];
}

echo " <li><a href=\"h*ttp://www.mydomain.com/forum/topic/".$result->fields[0].".html\" title=\"".$result->fields[5]."\">".$subject."</a>";
echo "<br />By ".$result->fields[4]."<br />".date("m/d/y G:i", $result->fields[2])."</li>\n";
}

So I guess what I need is something in there to look at $result->fields[0], and, if it's already been used, skip a record and try the next one.

Like I say though, do I have to do this with an array, and then loop the array every iteration of the above loop?

TJ

grandpa




msg:1259246
 10:08 am on Jul 18, 2005 (gmt 0)

I think this will give you the results you want. Duplicates will need to be in sequential order. This is easier than using an array, but you can do that too if you want.

for (;!$result->EOF; $result->MoveNext() )
{

if ($result->fields[5] <> $save_result) {
$save_result = $result->fields[5];
if (strlen($result->fields[5]) > 18)
{
$subject = substr($result->fields[5], 0, 15)."...";
}
else
{
$subject = $result->fields[5];
}

echo " <li><a href=\"h*ttp://www.mydomain.com/forum/topic/".$result->fields[0].".html\" title=\"".$result->fields[5]."\">".$subject."</a>";
echo "<br />By ".$result->fields[4]."<br />".date("m/d/y G:i", $result->fields[2])."</li>\n";
}
}

trillianjedi




msg:1259247
 10:32 am on Jul 18, 2005 (gmt 0)

Thanks - that makes sense to me. Nice and simple.

They are not in sequential order due to the order by date that I need to make sure that I have the most recent posts.

Perhaps the ultimate solution for me would be to see if we can get this query to first order by date, then order by topic ID so that your duplicate detection code inside the loop will do the rest?

Many thanks for the help,

TJ

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved