Forum Moderators: open

Message Too Old, No Replies

Need some help with a query

Bit too complex for me to know where to start!

         

trillianjedi

10:42 am on Apr 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi guys, wonder if you can help a jedi out with this one.

I'm looking to get a set of numerical data representing the number of times a field appears in a DB table (MySQL), based on the top 10 in terms of quantity.

Example NAME field in the table might look like this:-

Bob
Adam
James
Bob
James
Andrew
Cecil
etc

The data I would need to get back would be:-

Bob - 2
James - 2
Adam - 1
Andrew - 1
Cecil - 1

.... order by most appearing first, then in descending order, limit of 10.

But I don't know what the names are beforehand, so I can't just count the number of times "james" appears, then the number of times "bob" appears etc.

Is it possible to do this in one or two queries?

Thanks!

TJ

coopster

11:17 am on Apr 8, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



COUNT the name column and GROUP BY the name column. I'm not sure which database you are using so you will have to modify the LIMIT clause if it is not MySQL or PostgreSQL:
SELECT 
name,
COUNT(name) AS occurs
FROM names
GROUP BY name
ORDER BY occurs DESC, name
LIMIT 10
;

trillianjedi

1:44 pm on Apr 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Coop, that's fantastic - working really well!

One further addition (if this is possible) - the names are actually user-agents, and each has a version number which sometimes differs, eg:-

Adam 2.1.3
Adam 2.1.4
Adam 2.0.1.5

etc

But basically as far as my script is concerned, I'm only interested in whether or not it's "Adam".

Is it possible to implement a REGEX or "LIKE" into the query to ensure that duplicates of "Adam" with different version numbers are not returned separately, but all rolled into the one result?

Thanks again,

TJ

coopster

4:55 pm on Apr 8, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Yes, you could use a number of options. This is standard SQL, should work in any RDBMS that supports subqueries:
SELECT  
name,
COUNT(name) AS occurs
FROM
(SELECT
CASE
WHEN POSITION(' ' IN name) > 0
THEN SUBSTRING(name FROM 1 FOR (POSITION(' ' IN name)) - 1)
ELSE name
END AS name
FROM names) AS t1
GROUP BY name
ORDER BY occurs DESC, name
LIMIT 10
;

trillianjedi

5:28 pm on Apr 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's just pure genious Coop!

Brilliant. One small issue, I seem to have one name cropping up twice:-

Adam/sgr-3.1.3(LS) 20
Adam/sgr-2.0.12(LS) 15

Any thoughts as to what that might be? I have to confess I don't understand how your query works!

Thanks for all your help again,

TJ

trillianjedi

5:37 pm on Apr 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, I think I'm beginning to understand how this query works. It basically looks at the previous record to see if it's the same?

Hmmm... problem is I have one in between:-

Adam/sgr-3.1.3(LS) 20
BillyBob
Adam/sgr-2.0.12(LS) 15

This might prove tricky.... I suppose I could always grab a set consisting of more records than I need, then parse them down?

coopster

6:16 pm on Apr 8, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



No problem, let me explain how it works.

We are going to SELECT the name from your database table but rather than select the entire name we just want the actual *name* part of it, not the numeric string. When you first showed the names it looked as if there was always a space in between the actual *name* and the *user-agent* part of it, the numbers.

Adam 2.1.3 
1234567890

That is what the POSITION function is doing, locating the first space in the name field and telling us which POSITION it found that space in. Subtract one from that value an we know where the end of our name is, get it? Looking at the example above and using the little ruler I put there you can see the first space is in position 5. Subtract 1 and we have 4, which is the letter 'm' in the name 'Adam'. Now it should be pretty clear.

Next, we are using that value to extract just the name portion. SUBSTRING is taking just a piece of the name column and returning it for us. Remember that POSITION - 1 returned 4, so SUBSTRING(name FROM 1 FOR 4) gives us 'Adam'. I used a CASE logic construct just in case you had any user agents that did not have any spaces or numbers after their name, just the plain old name. If we don't do that, we would get a 0 value returned instead of the name because the POSTION would not find any space in the name, returning zero!

So, knowing that you can either add to your WHEN structure or as you said before you could use alternatives. REGEX is an option but not as portable. Sounds like you are using MySQL here, true? If so we could just throw down a REGEX. To adapt the existing query and keep it compliant you could:

SELECT  
name,
COUNT(name) AS occurs
FROM
(SELECT
CASE
WHEN POSITION(' ' IN name) > 0
THEN SUBSTRING(name FROM 1 FOR (POSITION(' ' IN name)) - 1)
WHEN POSITION('/' IN name) > 0
THEN SUBSTRING(name FROM 1 FOR (POSITION('/' IN name)) - 1)
ELSE name
END AS name
FROM names) AS t1
GROUP BY name
ORDER BY occurs DESC, name
LIMIT 10
;

trillianjedi

10:57 pm on Apr 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah, yes, that makes perfect sense - thanks for the explanation (can probably do the next one on my own now ;)).

Query works great too (yes, MySQL).

Thanks again Coop - you're a star!

TJ

coopster

12:29 am on Apr 9, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Hey, you're welcome TJ. I was going to throw down a REGEX option too but I don't think that is going to work. No way to match the position. A REGEX returns TRUE or FALSE, not what is found where. Now, wouldn't that be slick though? A REGEX function that returns the match as well as the position? hehe, anybody from the RDBMS community listening? :)