Forum Moderators: coopster
I need to select a few of these columns and do some statistical analysis on them like this:
codename¦¦name¦¦owner¦¦figure1¦¦figure2¦¦limiter¦¦identifier
In my table, each row is an event involving two parties. one party is irrelevant, and the other party is codename+owner (both of which are varchar and may appear far more than once in the table. The identifier is actually just a string containing the codename and the owner I use for working with the data in excel.
What I need to do is SELECT COUNT for every row in which the limiter (a float) is greater than 0, and it needs to be grouped by identifier (which has the same effect as GROUP BY codename, owner) and I need the AVG of all the values in figure1 and figure2 combined for each.. but averaged by the total number of values, not rows (which is really the COUNT *2 for each grouped set, because if the limiter is > 0 every row will have a value for fig1 and fig2) Then I want to print the top 20 averages onto a page like this ordered by avg DESC:
codename name¦¦owner¦¦COUNT¦¦AVG
Right now I do this by
SELECT codename, name, owner, COUNT(*)
WHERE limiter > '0'
GROUP BY codename, owner //identifier would work here too
then I loop through all of these w/ PHP applying other simple queries to each to obtain the average figure, and dump the end result into excel, re-sort, delete unwanted and dump that into another table and SELECT from there to display on web page. Is there a select that would do all that better? The number of rows returned by my query above is about 1700-1800 usually. I want the top 20 Averages
My apologies if this is too long or inappropriate. Brain is fried.
I want to save myself the time of having to compile the averages manually and display from a compiled table.
I have a testing server set up and am looking at ways of splitting up my data to achieve this, but its a difficult problem. Most of the readily available advice out there deals with operational databases and there's not much I've found about OLAP.
I guess a similar situation would be if you had to calculate batting averages from a table of many, many at bats - each row being one at bat.
SELECT codename, name, owner,
AVG(fig1 + fig2) AS daavg,
COUNT(*) AS dacount
FROM table
WHERE limiter > '0'
GROUP BY identifier
HAVING COUNT(*) >=5
ORDER BY whatever
I'd like to limit the results to HAVING COUNT >=5 AND HAVING AVG(fig1 + fig2) > 'whatever' - but it doesn't seem to work that way