Need some help with a query

Forum Moderators: open

Message Too Old, No Replies

Need some help with a query

Bit too complex for me to know where to start!

trillianjedi

10:42 am on Apr 8, 2006 (gmt 0)

Hi guys, wonder if you can help a jedi out with this one.

I'm looking to get a set of numerical data representing the number of times a field appears in a DB table (MySQL), based on the top 10 in terms of quantity.

Example NAME field in the table might look like this:-

Bob
Adam
James
Bob
James
Andrew
Cecil
etc

The data I would need to get back would be:-

Bob - 2
James - 2
Adam - 1
Andrew - 1
Cecil - 1

.... order by most appearing first, then in descending order, limit of 10.

But I don't know what the names are beforehand, so I can't just count the number of times "james" appears, then the number of times "bob" appears etc.

Is it possible to do this in one or two queries?

Thanks!

coopster

11:17 am on Apr 8, 2006 (gmt 0)

COUNT the name column and GROUP BY the name column. I'm not sure which database you are using so you will have to modify the LIMIT clause if it is not MySQL or PostgreSQL:

SELECT 
 name, 
 COUNT(name) AS occurs 
 FROM names 
 GROUP BY name 
 ORDER BY occurs DESC, name 
 LIMIT 10 
;

trillianjedi

1:44 pm on Apr 8, 2006 (gmt 0)

Thanks Coop, that's fantastic - working really well!

One further addition (if this is possible) - the names are actually user-agents, and each has a version number which sometimes differs, eg:-

Adam 2.1.3
Adam 2.1.4
Adam 2.0.1.5

etc

But basically as far as my script is concerned, I'm only interested in whether or not it's "Adam".

Is it possible to implement a REGEX or "LIKE" into the query to ensure that duplicates of "Adam" with different version numbers are not returned separately, but all rolled into the one result?

Thanks again,

coopster

4:55 pm on Apr 8, 2006 (gmt 0)

Yes, you could use a number of options. This is standard SQL, should work in any RDBMS that supports subqueries:

SELECT  
 name, 
 COUNT(name) AS occurs 
 FROM 
  (SELECT 
   CASE 
    WHEN POSITION(' ' IN name) > 0 
    THEN SUBSTRING(name FROM 1 FOR (POSITION(' ' IN name)) - 1) 
    ELSE name 
   END AS name 
   FROM names) AS t1  
 GROUP BY name 
 ORDER BY occurs DESC, name 
 LIMIT 10  
;

trillianjedi

5:28 pm on Apr 8, 2006 (gmt 0)

That's just pure genious Coop!

Brilliant. One small issue, I seem to have one name cropping up twice:-

Adam/sgr-3.1.3(LS) 20
Adam/sgr-2.0.12(LS) 15

Any thoughts as to what that might be? I have to confess I don't understand how your query works!

Thanks for all your help again,

trillianjedi

5:37 pm on Apr 8, 2006 (gmt 0)

OK, I think I'm beginning to understand how this query works. It basically looks at the previous record to see if it's the same?

Hmmm... problem is I have one in between:-

Adam/sgr-3.1.3(LS) 20
BillyBob
Adam/sgr-2.0.12(LS) 15

This might prove tricky.... I suppose I could always grab a set consisting of more records than I need, then parse them down?

coopster

6:16 pm on Apr 8, 2006 (gmt 0)

No problem, let me explain how it works.

We are going to SELECT the name from your database table but rather than select the entire name we just want the actual *name* part of it, not the numeric string. When you first showed the names it looked as if there was always a space in between the actual *name* and the *user-agent* part of it, the numbers.

Adam 2.1.3 
1234567890

That is what the POSITION function is doing, locating the first space in the name field and telling us which POSITION it found that space in. Subtract one from that value an we know where the end of our name is, get it? Looking at the example above and using the little ruler I put there you can see the first space is in position 5. Subtract 1 and we have 4, which is the letter 'm' in the name 'Adam'. Now it should be pretty clear.

Next, we are using that value to extract just the name portion. SUBSTRING is taking just a piece of the name column and returning it for us. Remember that POSITION - 1 returned 4, so SUBSTRING(name FROM 1 FOR 4) gives us 'Adam'. I used a CASE logic construct just in case you had any user agents that did not have any spaces or numbers after their name, just the plain old name. If we don't do that, we would get a 0 value returned instead of the name because the POSTION would not find any space in the name, returning zero!

So, knowing that you can either add to your WHEN structure or as you said before you could use alternatives. REGEX is an option but not as portable. Sounds like you are using MySQL here, true? If so we could just throw down a REGEX. To adapt the existing query and keep it compliant you could:

SELECT  
 name,  
 COUNT(name) AS occurs  
 FROM  
  (SELECT  
   CASE  
    WHEN POSITION(' ' IN name) > 0  
    THEN SUBSTRING(name FROM 1 FOR (POSITION(' ' IN name)) - 1)  
    WHEN POSITION('/' IN name) > 0  
    THEN SUBSTRING(name FROM 1 FOR (POSITION('/' IN name)) - 1)  
    ELSE name  
   END AS name  
   FROM names) AS t1  
 GROUP BY name  
 ORDER BY occurs DESC, name  
 LIMIT 10  
;

trillianjedi

10:57 pm on Apr 8, 2006 (gmt 0)

Ah, yes, that makes perfect sense - thanks for the explanation (can probably do the next one on my own now ;)).

Query works great too (yes, MySQL).

Thanks again Coop - you're a star!

coopster

12:29 am on Apr 9, 2006 (gmt 0)

Hey, you're welcome TJ. I was going to throw down a REGEX option too but I don't think that is going to work. No way to match the position. A REGEX returns TRUE or FALSE, not what is found where. Now, wouldn't that be slick though? A REGEX function that returns the match as well as the position? hehe, anybody from the RDBMS community listening? :)