homepage Welcome to WebmasterWorld Guest from 54.211.157.103
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderators: physics

Databases Forum

    
Anyone Using SSD for their DB?
Improving performance
brotherhood of LAN




msg:4240781
 12:52 am on Dec 10, 2010 (gmt 0)

I have been benchmarking a MyIsam MySQL database with some queries that'll regularly be performed. One simple table is this one, that is regularly accessed and contains unique URL strings:

CREATE TABLE IF NOT EXISTS `http_urls_list` (
`id` int(10) unsigned NOT NULL,
`url` text NOT NULL,
UNIQUE KEY `id` (`id`)
) ENGINE=MyISAM

The table is just shy of 30 million rows and lookups to the `id` column are very fast, but any query that fetches the `url` column can be up to 10 times slower. I am looking into using SSD to store such tables to improve performance in fetching non-indexed data.

Does anyone have any views/experiences to share using SSD to improve database performance?

As an aside, I'm more interested in read performance. There is a separate `id`/`md5` table used for inserts that is only accessed when adding new urls.

 

lammert




msg:4240804
 3:33 am on Dec 10, 2010 (gmt 0)

SSD's are nice for random read access systems because they don't have the latency due to head movements seen on conventional disks. In a situation with a heavily accessed MySQL database table with 30 million rows of tens of bytes each, I would expect the RAM cache to be big enough to hold that table, or at least the most used parts of it. SSD will only speed up if your RAM cache isn't large enough and you need regular lookups to get the URL fields from the table. Otherwise you won't see any speed improvement.

brotherhood of LAN




msg:4241020
 2:46 pm on Dec 10, 2010 (gmt 0)

Hey Lammert, indeed RAM and its cache are super-fast, and will help for some queries on this large table. I load most of the important indexes into memory, the full database has over 32GB and growing.

For queries using this table, the cache is going to be missed a lot as outside of a couple of popular URL strings, the rest will be accessed fairly randomly.

jmccormac




msg:4241364
 9:45 am on Dec 11, 2010 (gmt 0)

You have an index on 'id' but not on 'url'? If you are storing the http : // preamble in the url data, it might be a good thing to strip it. But will the length of any of the URLs be longer than 250 characters or so? If not then specifiying the the 'url' as a varchar might be better.

I'm currently playing with a 43 million row table in MyISAM. The data is essentially the new/deleted/transfer dates for some domains in .com TLD over 2000-2010. The main index uses the domain name. The select time for a query is about 0.15 seconds on a cheap dual core desktop box. Even with a compressed table (this is historical data so there will be no additions or deletions), the query time drops to about 0.11. Perhaps you might look at the choice of indices before spending money on an SSD?

Regards...jmcc

brotherhood of LAN




msg:4241394
 3:41 pm on Dec 11, 2010 (gmt 0)

Thanks for the reply jmcc,

This particular table never queries the `url` field, only the `id` field, and it's only storing the path. Domain parts are stored in different tables. I actually have a separate table for URLs if I wanted to find a specific url string:

CREATE TABLE IF NOT EXISTS `http_urls_hashes` (
`id` int(10) unsigned NOT NULL,
`segment` tinyint(3) unsigned NOT NULL,
`hash` binary(16) NOT NULL,
KEY `segment` (`segment`),
KEY `hash` (`hash`)
) ENGINE=MyISAM
/*!50100 PARTITION BY RANGE (segment)
(PARTITION p0 VALUES LESS THAN (1) ENGINE = MyISAM, ... PARTITION p255 VALUES LESS THAN (256) ENGINE = MyISAM)

The table is partitioned by the `segment` field, ranging from 0 to 255, `hash` is an MD5 of the URLs. Doing lookups and joining onto my first table is quick enough (100s of queries a second)

It's only when I JOIN to the first table and add the `url` field to the select statement that queries slow down a lot (10s per second), so indexes aren't a problem and I feel it doesn't make sense to index the `url` field when it's not used in WHERE statements.

I've benchmarked between TEXT, VARCHAR and CHAR for the `url` field and the query time difference was small.

As another benchmark, I put the `url` values into a flatfile of fixed length (255), where an example `id` could be 100000 and the url would be at ((100000 * 255) - 255)th byte. This was slightly quicker but fseek obviously takes time to move to the correct part of the file.

I am thinking SSD is better for 'random accesses' of the table, and will generally improve the fetching of non-indexed data columns.

xnavigator




msg:4257483
 7:32 pm on Jan 24, 2011 (gmt 0)

which site do you manage? bit.ly? :D

>> It's only when I JOIN to the first table and add the `url` field to the select statement that queries slow down a lot

anyway:

with this query are you getting a lists of record? Or you are just searching for 1 specific url?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved