homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

This 62 message thread spans 3 pages: < < 62 ( 1 [2] 3 > >     
Writing an Efficient Query
Help with an issue with SQL query

 10:56 am on May 29, 2003 (gmt 0)

Hi all,

I have to select rows from a table where one of the fields is an integer. I only want records that have '1,3,4,6 and 17' as this number. Not the others.

What is the best way to write the query? Here are a couple of ways I've thought of:

SELECT * FROM tbl WHERE colum IN $array_of_acceptable_integers;

SELECT * FROM tbl WHERE column = $val1 OR $val2 etc....

As you can see, once we get past the very basics I'm none to bright so anyone has any 'authorative tutorials' on writhing these kind of things would also be welcome.





 3:29 pm on Jun 3, 2003 (gmt 0)


How about this one.

Table 1 layout:

productid INT 10
buyerid INT 10
sellerid INT 10

Table 2 layout:

userid INT 10
username varchar 50

Now table 2 has all the usernames for ALL userids. So if you want to get out of table 1 the product name you do a left join to a product table (not shown), but if you want to get the usernames out of buyerid AND sellerid by joining both buyerid and sellerid to Table 2, how do you do it?

(I'm working on this as we speak so if I find an answer, I may pipe in).


 3:33 pm on Jun 3, 2003 (gmt 0)

I could be off, but I've used a LIKE function with success. Example:

<cfquery name="LoanDisp" datasource=#MM_Brokers_DSN# username=#MM_Brokers_USERNAME# password=#MM_Brokers_PASSWORD#>
SELECT ID, TimeDate, LastName, City, State, LoanType, Status
FROM PreQualData
WHERE Brokers LIKE '#Session.BrokerID#' AND (DLDate < #future# OR DLDate IS Null) OR Lender LIKE '#Session.BrokerID#'


 3:39 pm on Jun 3, 2003 (gmt 0)

Table 1 layout:

productid INT 10
buyerid INT 10
sellerid INT 10

Table 2 layout:

userid INT 10
username varchar 50

Now table 2 has all the usernames for ALL userids. So if you want to get out of table 1 the product name you do a left join to a product table (not shown), but if you want to get the usernames out of buyerid AND sellerid by joining both buyerid and sellerid to Table 2, how do you do it?

Ummm.. what type of data does buyerid and sellerid contain?

the userid?


 3:49 pm on Jun 3, 2003 (gmt 0)

Select table2.username
From table2
Left Join table1 on table2.userid = table1.buyerid
Left Join table1 on table2.userid = table1.sellerid

I believe this will get you what you're after.


 3:59 pm on Jun 3, 2003 (gmt 0)

coloryan - Using LIKE in an SQL will often perform a table scan and not utilize the index. I'd stay away from using LIKE if an alternative if available. See the mySQL manual [mysql.com] for details.

clark - I use this great guide on SQL joins [w3schools.com] all the time. Much easier to comprehend than the mySQL join manual [mysql.com]. Even after being sent to training, database concepts are still difficult to comprehend.

Anyway, It sounds like you want to pull the matching records from two fields in table 1 (buyerid, sellerid) joining on one field in table 2 (userid). I would try something like this:

select * from table1, table2
where table1.buyerid = table2.userid
or table1.sellerid = table2.userid


 4:20 pm on Jun 3, 2003 (gmt 0)

Did I say thanks for the great discussion?

Well, thanks!

I have a question if I may: Lets say I have a table that contains products. One of the fields is an auto int id, one is an int that corresponds to the 'section' the product is in and one is a smallint that corresponds to a price rang. Like 1 => <$100 2 => $101- $200 etc....

Sometimes people only want products in a section, sometimes they want prd's in a section in a price range and sometimes they want prd's in any section in a price range.

So, is this a good candidate for a composite key of section/range? And if section calls are much more frequent, is there any disadvantage?

Many thanks!



 5:24 pm on Jun 3, 2003 (gmt 0)

Hi Nick_W, yes I think so. If I understand you correctly, a composite key is an index composed of more than one column. I see two beneficial indexes in your case.

products in a section, sometimes they want prd's in a section in a price range
composite key on (section, price_range)
prd's in any section in a price range.
key on (price_range)

On a broader note, I find it important to have open communication between your code developer and database administrator. Its easy if you are both. :) When new features or code changes occur, the database may need adjustment for efficient queries. I know one of eCommerce wholesaler that reduced their home page load time from an unbearable 10 seconds to mere seconds after they adjusted the SQL code and database. Personally, I would have tested both before unleashing it to the public, but its not my site...


 5:57 pm on Jun 3, 2003 (gmt 0)

Great, thanks!

That's what threw me, it hadn't occured to me that I could have a composite and a standalone key (of one of the composite columns) at the same time.

No reason whatsoever why not of course, this info on composites has me thrown though so my thinking is a little askew. Well, that's my story anyway ;)




 8:04 pm on Jun 3, 2003 (gmt 0)

Thanks for so many responses guys. While you were all typing I figured out a way to do it. I'm running out but will check if the responses here are a better way to do it. I did something like adding on the select line
t2.username as buyerusername,t2.username as sellerusername

ON t1.buyerid=buyerusername.userid
ON t1.sellerid=sellerusername.userid

I wrote this quick so I may have some of the fields wrong, but that's the essence of it.


 11:44 pm on Jun 3, 2003 (gmt 0)

[mysql.com ]

You can refer to the same table more than once by using table aliases. A common example would be when you have a staff table that contains both staff and manager's details. If you wanted a list of staff, with their corresponding manager's details as well, you would refer to the same table twice.
SELECT s1.StaffId, s1.StaffName, s1.Position, s1.ManagerId, s2.StaffName AS ManagerName, s2.Position AS ManagerPosition
FROM staff AS s1 INNER JOIN staff AS s2 ON(s1.ManagerId=s2.StaffId)

This can work multiple times, referring back to the same table. You can also join with other tables as normal at the same time, although you need to specify which table alias (and thus which rows, eg. the manager's or the staff's) you would like to join on.

You can find the quote at the above link, search the page for "twice."



 9:39 am on Jun 4, 2003 (gmt 0)

excellent thread, thanks all.

One note to the poster of the original question: since you want to get a list of all the threads in your forum that are active, instead of a SELECT (...) WHERE COLUMN IN (3,4,17,28), why don't you just add an extra field to this table called "active"? So then you have SELECT (...) WHERE active = 1 . A thread's being "active" is a bit of information that is apparently useful enough for you to want to choose threads on this criterium, why not then actually store it then in the database? When new threads become active and old threads become inactive, you can either change their status via your admin program, instead of having to rewrite your sql query every time. Or you might set up a function that's executed every once in a while to automatically determine which threads are active / inactive, and then update this field for you.


 10:29 am on Jun 4, 2003 (gmt 0)


Sql automatically convert

SELECT Something FROM T1
WHERE SomethingElse IN ('1', '2', '3')


SELECT Something FROM T1
WHERE SomethingElse = '1' OR SomethingElse = '2' OR SomethingElse = '3'

so second choice is the best.

Also try to use covered index in your query:

If you want to query Member name from a table of members do the next:

SELECT Member FROM T1 WHERE Member = 'Nick_W'

Doing so you have ALL information you need in the index (CLUSTERED is better).

According to composite index also the next are covered:

SELECT Member, EMail FROM T1 WHERE Member = 'Nick_W'
(Assume you have composite index on Member/Email)

For long varchar fields you can use index on computed columns(but this is feature of sql server).



 9:09 pm on Jun 4, 2003 (gmt 0)

just to respond to martekbiz comments regarding my statement of using PHP:

>>>Use PHP to iterate through the array

Has t the the worst advice I've ever read.

Why would you want to use resources to cycle through on record after another until you find the results oyu are looking for?

Use the DB! That's what it is there for. It'll be much quicker and less server resources/OH used.

I'm sorry martekbiz but that's just a complete load of old rubbish, and an argument I see floated about quite frequently, and grossly inaccurate.

You need to understand the bigger picture, and that is the unit of execution - the thread.

When you execute a MySQL query, thread 'A' will attempt a read lock on the table in question (there are some caveats to this, but rtfm for more). Thread 'B' wants to run a query and attempts to get a read lock on the table as well. Only trouble is because thread 'A' is deciding to query for: "select * FROM world AND cat", thread 'B' can't get a lock, so what does it do? The thread SLEEPS and takes your query with it!

In most applications the thread is dropped by the processor, and placed (usually) on the back of the processor run queue, awaiting it's turn again. However in MySQL's case, the thread is placed on the "Read Lock Queue".

Thread 'B' sleeps here until it is woken by MySQL as it is now (potentially) able to aquire the "Read Lock", then the thread is usually placed back onto the processors run queue to await it's turn at execution.

The number one rule of any application that utilizes threading is simple:

Hold exclusive locks or semaphores on data for the minimum time possible.

Therefore in heavily used senarios, it makes far more sense to get more data than neccesary, and "get out of the database" so you release your lock, then manipulate in PHP where the PHP thread (or more usually Apache) will be able to manipulate data in its own memory space with far less risk of lock contention.



 9:29 pm on Jun 4, 2003 (gmt 0)


1,000,000 records that you need to iterate through and find matches (or whatever the criteria is).

I will GLADY throw the execution of using the DB to do the work as opposed to PHP any day of the week and twice on Sunday's.

PHP is a powerful language. We develop purely in PHP here and I most certinly would not have it do something that a DB could easily do in half the time and with less strain on the server.



 9:43 pm on Jun 4, 2003 (gmt 0)

fine, now do it with 20 or 30 clients simultaniously and see which one grinds to a halt first.

PHP is a powerful language.

Certainly is, but it's not about the power, it's about who's doing the work, the thread that has a lock on your database stopping other threads executing or the one in Apache's memory space that can do so while other queries are running.

Using your senario it would make far more sense to query the records with "< value" and then do the rest of the work out of the db, rather than "== value and == value and == value" etc.

It's actually about finding a happy medium anyway, the one that gets the most from MySQL, with the least work and most importantly time.

BTW. Loads of PHP functions are coded in 'C' not PHP anyway.


[edited by: aspr1n at 9:48 pm (utc) on June 4, 2003]


 9:46 pm on Jun 4, 2003 (gmt 0)

>>Loads of PHP functions are coded in 'C' not PHP anyway.

Indeed they are.


 1:43 pm on Jun 9, 2003 (gmt 0)

RE the original post:

I don'T see the point of gettign the values, bludgeoning htem into a bit of SQL string and sending them out again, why not have a simple multi join return the whole thing?

furthermore, to add to the compound index debate, please note taht in a atraditionalk many-to-many relationship it's often very usefull to have the compound index and it'S reverse at the same time, like so:

TABLE (messageID,topicID)
INDEX topicToMessage(topicID,messageID)
INDEX messageToTopic(messageID,topicID)

This way you can quickly get all the messages in one topic and all te topics a message belongs to (asuming it'S a true many-to-many rel, ok this is probably a bad example)

Note that in this case your index will be twice as big as your database data. But of course it's usually just to integers per row, so in these cases it might be acceptable. I use this extensively to link keywords from the stemmed table to the documents, get all documents for a keyword, get all keywordsw for a document.

I can see the need for an IN table ONLY where the elements are either static, or come form userinput. if they are selected form another table by some criteria, then these tables should be linked directly in a single query.

BTW: once you dig deeper into database engine design, you will realise it's ALWAYS better to do as much as possible in the join, rather then do "manual" joins in the host app. Read contention is not a big problem, as MySQL does NOT lock a table exclusively for reading, it's only a problem if you mix read and writes a lot which is hardly ever the case. In this case MySQL is happily multi threaded, and I'd bet my left thumb it'll out-optimize your manual PHP join any second of the day.

further note that doing an index on column1 of you have a compound on (column1,column2,column3) is not needed at all, but an index on column2 may be.

In fact in the case of a log database where you may want to extract different info from a large dataset, you will see you can use indexes on (col1,col2,col4) and (col1,col3,col4) seperately. In fact I ahve several multi gig tables where the index file is larger then the datafile by almost 50%. this makes sense cos the table is built once in a long operation, but hten needs to spit out detailed reports often. and even updateing such a table is not a major job, jsut not somithng you wanna do sevral tiems per second of course.



 7:52 pm on Jun 9, 2003 (gmt 0)

daisho - thanks for pointing out that dkubb mentioned compund indexes - missed that.

dkubb - I hesitated mentioning the quirky silent changes as well, but I thought it was important in mentioning. I did forget about the fact (as you pointed out) that if there are NO variable length column types then char will stay char. And I have to agree - section 5.x is a GREAT section!

One more optimizing thing to throw into the mix - myisampack, I have just started messing with it (no real need until recently). If you have a table that you only need to select from (no inserts, updates, or deletes) myisampack can be helpful (according to the docs). It would be useful in a datawharehouse environment. Anyone have more experience with it?



 11:05 pm on Jun 9, 2003 (gmt 0)

re: the silent type changes.

if you have something liek a log table, where you have mostly short types, like ids dates and short strigns and only one or two unbounded strigns (TEXT) then it might be worth it to split it into two tables, with all teh short columns in one and hte variable ones in the other linked by an id. because it's then usual to do all the selectign on the short bits of data and only fetching the long TEXT fields on demand, and hte linking will be fast thangs to a one-on-one on the primary key.

Another note for log tables. On a decently sized website, it quickly reaches millions of rows and gigs of data. I've foudn that by splitign user agents and path info into seperate tales I could cut out HUGE chunks, sicne user_agents have only a few 1000 different values, not a million or more, thame with the URLS on your site once you exclude the path_info.



 5:51 am on Jun 11, 2003 (gmt 0)

I'm working on some data now where I'm INSERTING INTO table data F1,F2,F3 SELECT BLAH
from a long select, with a GROUP BY function and HAVING function. In order to get at the data the way I want to I also have a COUNT(DISTINCT field1,f2,f3). Now there has to be one field in the INSERT statement for every field in the SELECT statement. So essentially the COUNT(DISTINCT F1,F2,...) field is JUNK.

For now I created a JUNK field, but I'm wondering if there's a way in the INSERT table to have it point to a NULL field?

PS I just added another COUNT(DISTINCT) for another reason and have 2 Junk fields now. I tried to point both to JUNK but mysql wouldn't let me. Ideas appreciated..


 8:28 am on Jun 11, 2003 (gmt 0)

If you don't need to count value in the target DB, simply don'T include it in the query. What other reason, besides getting the count data into the target table is there for you to include the count statements in the select?



 9:20 am on Jun 11, 2003 (gmt 0)

I'm using the result of

"COUNT(DISTINCT fields) as constrainit"

to constrain my results with

"HAVING constrainit=1"


 10:45 am on Jun 11, 2003 (gmt 0)

Then you're using it in hte wrong way, only put in the SELECT columns what you need as output data.

You should write
FROM tableX
WHERE whereClause
HAVING COUNT(distinct w)



 4:35 pm on Jun 11, 2003 (gmt 0)

I was getting errors with that. OK here's a full STATEMENT and the error I get:
REPLACE INTO bad(id,isallowed)

SELECT id, isallowed
FROM full
GROUP BY id,isallowed
HAVING isallowed = 1 AND COUNT(DISTINCT tape, magazine, book, movie) > 1

Even without the REPLACE INTO statement just doing the select I get:
Unknown column 'tape' in 'having clause'


 9:25 am on Jun 12, 2003 (gmt 0)

I see your problem. This is actually bny design. you cannot apply group functions to fields that aren't listed or grouped by.

The traditional solution in this case would be a temporary table.

In other SQL servers you'd use nested selects. If you'Re running MySQL 4.0+ than you can try that (I'm using 3.xx)

Hope you'll find a workable solution. I don't think "Junk" fields are a good idea though.



 3:30 pm on Jun 12, 2003 (gmt 0)

I'm running MySQL 4.0.12. Nested selects? Never heard of that, hmm...

I guess a temp table is possible. In fact I'm using it in another instance where I have a lot less data to go through, but using it in this monster table just means I have to go through the data twice. Why is that better than junk fields (other than the fact that it's ugly and I hate seeing the word "junk" in my db heh heh)?


 4:04 pm on Jun 12, 2003 (gmt 0)

Well in a nested select you can internally produce the count as a valid field, use the field as condition, but drop it in the outermost select. In a way it's really just a temp table on the fly though.

How about including the column but ignoring it when you write the data to an output file?



 4:21 pm on Jun 12, 2003 (gmt 0)

I guess I'll have to read up on subselects, sounds very very useful. Hmm, may be able to help me out instead of all these workarounds I've had to do.

On the output file, there is none. It will be displayed on the web and the junk field is basically ignored.


 4:33 pm on Jun 12, 2003 (gmt 0)

Subquerys are very useful, they work because in SQL a result set is the same type as a table or view.

You can use a query result in a FROM clause :
SELECT fields FROM ( SELECT fields FROM tables)

Or a simple view:
SELECT fields FROM myQuery


 9:00 pm on Jun 13, 2003 (gmt 0)

Very very useful.

Here's another question. How to best index this query:

There is a product name and a product number and they are tied together so theoretically productname can be normalized into another table.

The product number is alphabetical because the higher the number the higher end the product. But the name would sort alphabetically. So I'd like to do one query, and get the product name,productid and sort by productid which would give me the product quality at the same time.

I'm using a complicated php function to achieve a few other things at the same time and am concerned that I set it up suboptimally.

I purposely did NOT normalize on this data in order to not have to bother with too many joins for minimal/zero performance gain...

So here's the query:

SELECT DISTINCT productname,productid FROM t1 ORDER BY productid.

So while I'd love to have something like

SELECT productname WHERE DISTINCT productid FROM t1 ORDER BY productid

but naturally this is illegal, so I had to stick in the productname (plus, hiding in a function, I didn't quite realize what the end query looked like until very late into the project).

So if I index the table with 2 columns, "productname,productid", it will wind up being an alphabetical sort, which is nonsensical in this case.

If I index on productid which makes sense, since the query is on DISTINCT productname,productid, I think that the index wouldn't even be used... So how should I best index it without having to rewrite the function and checking on all the php pages that call it?

Hmm, while writing this I'm thinking if I can call the function with SELECT productid,productname then I can index on both of those and avoid this problem...


 11:02 pm on Jun 13, 2003 (gmt 0)

remember that indexes are NOT used for the select part
only for the where and group by and order by parts. So in this case as there are no further wheres or joins (shown) it would be futile to index by product name

An index by productid seems to be most appropriate. If you can do a unique index, even better for the select distinct performance (but to be honest I don't believe distinct is optimized using indices, not that it would maek a big difference in your case)

But of course if you have joins and where clauses it all changes.


This 62 message thread spans 3 pages: < < 62 ( 1 [2] 3 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved