Forum Moderators: coopster

Message Too Old, No Replies

mysql query question, grouping site_id. limit to 2 results per domain

i can't solve it, can you?

         

ikbenhet1

10:36 am on Feb 13, 2004 (gmt 0)

10+ Year Member



Hi, i've been wondering and searching for this for a couple of months now, any help would be appriciated.
I have this query to retrieve search results from a database.. i can post the one i use later if needed:

select site_id, page_id, count(points) as SScore from search_database group by concat(site_id,page_id) order by SScore DESC

How can i alter this query to give me a maximum of 2 results per site_id, now it returns unlimited results per site_id. It only should return the first 2 page_id 's from the site_id it encounters after grouping and ordering because they are ranking the higest.

Thanks for any help.

coopster

1:03 pm on Feb 13, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Can you clarify something? Are you stating that you would like to see only the top 2 results returned? Or the top 2 results for each and every
site_id
?

Also, what is in your

points
column? Are you sure you don't want to be using
SUM
rather than
COUNT
?

ikbenhet1

1:45 pm on Feb 13, 2004 (gmt 0)

10+ Year Member



yes, i want the top 2 results for each and every site_id.
i'm sorry that's a typo, that should say sum..

points is just the number of backlinks and some points for having the search-phrase in title in headers or in content

coopster

3:20 pm on Feb 13, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I think you are going to have to loop through your result set, which is probably what you are currently doing.

ikbenhet1

3:27 pm on Feb 13, 2004 (gmt 0)

10+ Year Member



But isn't that slow?
i would need to retrieve 50 or 100 or 200 results in order to display 10. And if the first 50 - 100 -200 result are all from the same site_id then i end up with only 2 results.

Dooing the with a query would be "the bomb"...

Basicly it needs to count the site_id's in the result set and use something like "where count(site_id) < 3"
but i don't know how to count the site_id's in the result set.

Like i said, i am searching for this for a couple of months now and also trying to create a query myself, so i know it is not easy...
Thanks for taking the time to post an answer.

scumm_bar2

4:38 pm on Feb 13, 2004 (gmt 0)

10+ Year Member



I'm sure it would help to see your table structure and have a bit of sample data to play with. Although your last suggestion would narrow the results to those that had only 2 site id occurences, and not limit the amount of results on an individual site id basis.

ikbenhet1

5:46 pm on Feb 13, 2004 (gmt 0)

10+ Year Member



That count is not a ordinary count it's a imaginary count on the site_id in the result set which it has collected so far, i need to figure out how to that.
if i am not using group then i cannot use sum..

No point in giving you all that since i still use url_id , i will make the page_id and site_id after i have a working query for limiting to the top 2 page_id's of every site_id.
But thanks for reading

ikbenhet1

7:28 pm on Feb 13, 2004 (gmt 0)

10+ Year Member



This query is an example of how to do it... right? or does nothing insert until the while query is done?
I'll go make site_id and page_id to test, but love to hear if it will work..
INSERT INTO testx2
SELECT x.id, count( y.id ) as rep
FROM search_database x, testx2 y
GROUP BY x.id having rep<2
thanks.

<nevermind doesn't seem to work>

coopster

7:41 pm on Feb 13, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



>>That count is not a ordinary count it's a imaginary count on the site_id in the result set which it has collected so far, i need to figure out how to that.

You said it all right there. You are trying to push report level-breaking logic back into the result set returned by your query. If it can be done in a single statement, it is beyond my current level of knowledge. It can be done using a temporary table and a couple user variables:


CREATE TEMPORARY TABLE tmp(
site_id SMALLINT UNSIGNED NOT NULL,
page_id SMALLINT UNSIGNED NOT NULL,
points SMALLINT UNSIGNED NOT NULL,
counter SMALLINT UNSIGNED NOT NULL,
prev_site_id SMALLINT UNSIGNED
);
--
--
INSERT INTO tmp
SELECT
site_id,
page_id,
SUM(points) AS SScore,
0,
NULL
FROM search_database
GROUP BY site_id, page_id
ORDER BY site_id, SScore DESC, page_id
;
--
-- SELECT * FROM tmp;
--
SET @prev_site_id = 0;
SET @counter = 0;
--
--
UPDATE tmp
SET
counter = @counter:= IF(@prev_site_id <> site_id, 1, @counter + 1),
prev_site_id = @prev_site_id:= IF(site_id <> @prev_site_id, site_id, @prev_site_id)
;
--
-- SELECT * FROM tmp;
--
SELECT site_id, page_id, points FROM tmp WHERE counter <= 2;
--
--
DROP TABLE tmp;

Uncomment [mysql.com] the SELECT statements if you want to run this from a command line to view the tables during the population process.

ikbenhet1

11:06 pm on Feb 13, 2004 (gmt 0)

10+ Year Member



that seems very nice i can use that if cannot find the right query or give up...which won't take long anymore.

while experimenting i got this error message:
INSERT TABLE 'temp' isn't allowed in FROM table list

i searched and then found this:
Mysql prior ver. 4.0.15 does not support insertion with subselect taken from the same table.

This suggest it is possible after that version.
I have 4.0.17 but i get a error. Maybe mean 4.1 have you got any idea?

coopster

2:15 pm on Feb 14, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I just ran the code successfully on MySQL 3.23.58 and on 4.1. Are you sure you don't have a syntax error?

[url=http://www.mysql.com/doc/en/INSERT_SELECT.html]INSERT INTO tmp SELECT[/url]...

ikbenhet1

8:49 pm on Feb 17, 2004 (gmt 0)

10+ Year Member



Sorry to bother you again, this will be the last time but do these queries work in MySql 4.1?

create temp
select site_id , count( (select site_id from temp) ) from search_database
or:
create temp
select site_id , (select count(site_id) from temp) from search_database

Thank you.

Currently i am using:
$site_id=$get_info['site_id'];
$$site_id++;
if (${$site_id}<3)
{$id[$nummer]=$get_info['id'];
$nummer++;}

coopster

1:24 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



You lost me on this one. Are you trying to count records in the temporary table as you are inserting into it? You can't do that. You will get an
1137 ERROR -- Can't reopen table: temp
.

That's why you need to set a user-defined variable and increment/reset it yourself as I described in my post earlier -- I couldn't think of any other way to get around it.

ikbenhet1

3:03 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



<removed, spoke to soon>

slade7

3:08 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



Coopster, you are a SQL master.

I had a similar problem once and the solution was a HAVING clause.

ikbenhet1

9:19 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



ok. i did it! This works:

insert into test SELECT a.id, a.site_id, count( c.page_id )
FROM id_super_fast_database a, test c
GROUP BY a.id
HAVING count( c.page_id ) < 3
LIMIT 10

CREATE TABLE `test` (
`ID` int(5) NOT NULL default '0',
`page_id` int(11) NOT NULL default '0',
`int` int(5) NOT NULL default '0'
) TYPE=MyISAM;

the table test already needs to already exist and have al least 1 record in it or else it wont select or insert anything.
tested on mysql 4.0.17 .
Dont ask me how...

coopster

11:09 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



But I thought the original intent was to get the top two rankings for each and every site? I don't see how you are going to do that with this query yet...?