|Simple mysql select query very slow|
mysql select query very slow
| 7:22 pm on Sep 13, 2010 (gmt 0)|
First time posting on here...
Okay, I was hoping to get some custom feedback for my specific mysql query, and ultimately, how to speed it up. The query in question is currently taking on average about 8 seconds to execute.
The table in question has only 702,750 records. No current indexes (which I am hoping someone can help me with), and has the following my.cnf config:
[size=3]auto increment increment1
auto increment offset1
automatic sp privilegesON
binlog cache size32,768
bulk insert buffer size8,388,608
character set clientutf8
character set connectionutf8
character set databaseutf8
character set filesystembinary
character set resultsutf8
character set serverutf8
character set systemutf8
character sets dir/usr/share/mysql/charsets/
datetime format%Y-%m-%d %H:%i:%s
default week format0
delay key writeON
delayed insert limit100
delayed insert timeout40
delayed queue size1,000
div precision increment4
keep files on createOFF
engine condition pushdownOFF
expire logs days0
ft boolean syntax+ -><()~*:""&|
ft max word len84
ft min word len4
ft query expansion limit20
ft stopword file(built-in)
group concat max len1,024
have blackhole engineYES
have community featuresNO
have dynamic loadingYES
have example engineYES
have federated engineYES
have merge engineYES
have query cacheYES
have rtree keysYES
innodb additional mem pool size1,048,576
innodb autoextend increment8
innodb buffer pool awe mem mb0
innodb buffer pool size8,388,608
innodb commit concurrency0
innodb concurrency tickets500
innodb data file pathibdata1:10M:autoextend
innodb data home dir
innodb adaptive hash indexON
innodb fast shutdown1
innodb file io threads4
innodb file per tableOFF
innodb flush log at trx commit1
innodb flush method
innodb force recovery0
innodb lock wait timeout50
innodb locks unsafe for binlogOFF
innodb log arch dir
innodb log archiveOFF
innodb log buffer size1,048,576
innodb log file size5,242,880
innodb log files in group2
innodb log group home dir./
innodb max dirty pages pct90
innodb max purge lag0
innodb mirrored log groups1
innodb open files300
innodb rollback on timeoutOFF
innodb support xaON
innodb sync spin loops20
innodb table locksON
innodb thread concurrency8
innodb thread sleep delay10,000
innodb use legacy cardinality algorithmON
join buffer size1,048,576
key buffer size104,857,600
key cache age threshold300
key cache block size1,024
key cache division limit100
large files supportON
large page size0
lc time namesen_US
locked in memoryOFF
log bin trust function creatorsOFF
log queries not using indexesOFF
log slave updatesOFF
log slow queriesOFF
long query time10
low priority updatesOFF
lower case file systemOFF
lower case table names0
max allowed packet419,430,400
max binlog cache size18446744073709547520
max binlog size1,073,741,824
max connect errors20
max delayed threads20
max error count64
max heap table size33,554,432
max insert delayed threads20
max join size18446744073709551615
max length for sort data1,024
max prepared stmt count16,382
max relay log size0
max seeks for key18446744073709551615
max sort length1,024
max sp recursion depth0
max tmp tables32
max user connections0
max write lock count18446744073709551615
multi range count256
myisam data pointer size6
myisam max sort file size9223372036853727232
myisam mmap size18446744073709551615
myisam recover optionsOFF
myisam repair threads1
myisam sort buffer size16,777,216
myisam stats methodnulls_unequal
net buffer length16,384
net read timeout30
net retry count10
net write timeout60
open files limit2,158
optimizer prune level1
optimizer search depth62
pid file/var/lib/mysql/***************.pid //omitted website name
preload buffer size32,768
query alloc block size8,192
query cache limit104,857,600
query cache min res unit4,096
query cache size104,857,600
query cache typeON
query cache wlock invalidateOFF
query prealloc size8,192
range alloc block size4,096
read buffer size4,194,304
read rnd buffer size4,194,304
relay log index
relay log info filerelay-log.info
relay log purgeON
relay log space limit0
rpl recovery rank0
secure file priv
skip external lockingON
skip show databaseOFF
slave compressed protocolOFF
slave load tmpdir/tmp/
slave net timeout3,600
slave skip errorsOFF
slave transaction retries10
slow launch time2
sort buffer size4,194,304
sql big selectsON
system time zoneCDT
table lock wait timeout50
thread cache size64
tmp table size67,108,864
transaction alloc block size8,192
transaction prealloc size4,096
updatable views with limitYES
version commentMySQL Community Edition (GPL)
version compile machinex86_64
version compile osunknown-linux-gnu
My query is as follows:
[size=3]select `ListingID`, `ListDate`, `Description`, `Status`, `City`, `State_Abbrev`, `region_1`, `region_2`, `Address`, `PostalCode`, `Latitude`, `Longitude`, `ListPrice`, `PropertyType`, `Bedrooms`, `Bathrooms`, `main_img`
where `Status` = 'For Sale'
order by `ListDate` desc[/size]
Result: Showing rows 0 - 29 (623,177 total, Query took 7.4942 sec)
Table `listings` structure:
I have tried changing some things up, and just when it seems to help, I get that one query that takes like 15 seconds. I know there are sites with millions of records running at a fraction of what my queries do.
Any help would be greatly appreciated.
| 7:54 pm on Sep 13, 2010 (gmt 0)|
I would change the status field to an integer and do some coding like 1=for sale, 2=sold...
Create an index on status and listdate.
With the explain statement you can see which indexes are used to execute the query, see [dev.mysql.com ]
| 7:59 pm on Sep 13, 2010 (gmt 0)|
Thanks for the reply. I am implementing this now.
| 3:41 pm on Sep 14, 2010 (gmt 0)|
Correct, numeric lookups will **always** be faster, but when dealing with large record sets you need to normalize your database tables.
The problem, or at least a large part of it, is your tables aren't normalized. Look at this section:
Wow. :-) First, does every record have all these values complete or are a lot of them empty? Will many of these records have the exact same value in the same fields? What I'm getting at is this table can be made narrower by parting this off as two other tables and joining on them only when needed.
Field Type Null Key Default Extra
ListingID bigint(15) NO PRI NULL auto_increment
State_Abbrev varchar(5) NO NULL
(Region fields removed ...)
id int(11) |ListingID bigint(15) | regionID int(11)
So now you have a long list of numeric values to query. Numeric queries will always be lightspeed faster than text queries.
Next you have the actual regions table.
id int(11) | regionName varchar(50) | region_slug varchar(50)
(or maybe RegionName and region_slug are the same thing, only one needed, better yet.)
A typical select might be
select Listings.*,Regions.RegionName,Regions.region_slug from Listings,listingRegions,Regions where Listings.ListingID=listingRegions.ListingID and listingRegions.RegionID=Regions.id
- You will only have as many region records in listingRegions as you need for each listing, no overhead.
- The number of regions for each listing are unlimited, but can be as few as one. You are no longer limited to 14 regions.
- Your Regions table can grow as you need it, it can have five regions or five million.
- When you build your select lists in forms for regions, you no longer have to edit select lists, you just pull the values dynamically directly from the Regions table.
- Caveat: Currently when someone is adding a listing, do they enter a textual value or select from a list? If they are entering text values, this plays hell on your searches - UperWest Side, Upper West Side, and upper westside will all be different values and will never match in a user-entered search. The previous approach will eliminate that, and, it will be faster. Instead of
.... where region='Upper West Side'
.... where region='1234'
Again eliminating a textual query. You don't even need to use the join in a user search - just query the value, then query the region table when they click for details.
Anywhere you can eliminate a text value for a numeric value, you should. Another example:
State_Abbrev varchar(5) NO NULL
If these are U.S. states, you only need two characters - another optimization, make your fields only as wide as you need them. Better yet, build a states table
id int(11) |abbrev char(2) |full_name varchar(20)
then in your listings table,
State int(3) NO NULL
And you have another join - PLUS - you have the choice of selection the abbreviated or full name depending on the context. Same is true of countries, if you use them. This makes your queries more complex, but as your database grows you will be glad you did.
Try a search in Google for normalization site:webmasterworld.com for many more helpful threads around here.
| 4:11 pm on Sep 14, 2010 (gmt 0)|
Wow, i really appreciate you taking the time to lend a hand!
The `region_0`, etc., do not all have values. Typically, up to about `region_5` or so will contain a value on a average. Just to point out, `region_x` vs `region_x_slug` is like so:
I do my queries against `region_x_slug` as it is a stripped version of `region_x`, which is the "pretty/display" version, ie. `region_x` might equal 'Beverly Hills', where `region_x_slug` would then equal 'beverly-hills'.
I am inspired with casting regions to integers for the sake of querying. I am full-out SEO in terms of displaying the region name in the URL, however, ie. /california/beverly-hills/ as opposed to: /?region=1234, for obvious reasons, otherwise, a numeric value system would have been on the forefront of my mind when I began.
As usual though, things start out really small and escalate very, very quickly, and I'll be the first to admit that I was not fully prepared for the growth.
Again, my biggest struggle when I began with this table was the inconsistency of the data. For starters, I currently receive 99% of my entries from an external feed/source. Not all listings have many regions, so I figured I'd just try and cover the maximum number of regions that might ever be available .. big mistake, I see.
"Currently when someone is adding a listing, do they enter a textual value or select from a list?"
They are only able to select from a drop-down, pre-populated list generated from the db. My thoughts were yours exactly in that allowing the user to enter in their own information would only lead to thousands of discrepancies.
I am going to go through everything you've said again and see if I can implement your recommendations without blowing things up. The database will only grow from here on out, so your recommendations are pretty much a must-do.
Thanks again for your time and help!
I'll follow-up shortly with my progress.
| 4:03 pm on Sep 16, 2010 (gmt 0)|
|They are only able to select from a drop-down, pre-populated list generated from the db. |
Good, now you'll have
<option value="1234">Beverly Hills</option>
<option value="Beverly Hills">Beverly Hills</option>
What if their region isn't listed? You let them build the list for you by offering an "other region" field which you can *temporarily* store in a table somewhere, associated by the listing ID. When it comes up, if these are moderated (which I'm guessing they should) you can add the new region to your regions table and update their record.
| 4:28 pm on Sep 16, 2010 (gmt 0)|
There is a good, scratch that, great chance that I do not have *every* city/region/location on the map, so I do offer the opportunity for a user to submit a location for inquiry. I then manually reference that location to confirm, and input into the db if it's legit.
I made the mistake several years ago of allowing users to enter their own city by way of text-field. I ended up with 20 different variations to each city. It was a nightmare.
In the meantime (while working locally on implementing a new structure), I placed a few indexes in place and the system seems to be working much better.
My only issue with the proposed structure is, how does the system know - based on the location in the URL (for SEO reasons) - where to draw listings from? For example...
That is an example url. Cuurently, I query the db based on the last location in the URL (los-angeles). I use a PHP function that breaks apart the URL to give me an array of all the locations in the URL, with los-angeles in this case, being the last entry in the array. Very simple way of querying the db (... WHERE `region_x_slug` = 'los-angeles' ...) is an example where clause.
How would I now, without changing the URL structure whatsoever, work with a new db structure consisting of numeric values only (as I would no longer be querying based on text-values).
I hope all that makes sense ;)
Thanks for all the replies!
| 4:32 pm on Sep 16, 2010 (gmt 0)|
Search your table of cities based on the last part of the URL to get the CityID, then search your listings from there.
| 4:39 pm on Sep 16, 2010 (gmt 0)|
I will have a go at it. Thank you.
Back to the original post, does anybody see any issues with the setup of the mysql configuration?
The server creds:
CPU = 5.09 GHZ
RAM = 3801 MB
in case those were necessary.
Thanks again, everybody.