|All sites on server pulling from 1 DB - optimization for heavy traffic|
Had a question for you all because over the past couple of weeks my sites have been getting quite a bit more traffic (probably around 5000+ hits/day overall) and they are all heavily reliant on a central database/table that just exceeds 1 million entries (MySQL) to pull records from.
In addition to pulling records, the table in this DB has to be updated with new/modified entries a minimum of once a day (but more often than not around 2-5 times/day) since the content always has to be correct.
This was working fine, but now, with the increase in traffic, my dedicated server seems to eat up all of its memory and the entire server freezes and I have to have it re-booted.
I am wondering if there is a better way to go about this. I'm at a point where I am not really sure what avenue to pursue and the server is freezing up and has to be re-booted 1-3 times a week now when the server is getting too many requests...almost to the point of it happening every day, over the past 3 days.
Is it appropriate for me to have all sites pulling from the same table in the DB with that many records? Anyone have any advice as to how I could go about setting up/modifying/correcting my current set up?
I just really am not sure what to do at this point.
Any and all advice is greatly appreciated - thank you in advance.
Summary of Server and DB/Table:
Intel Xeon X3330 @ 2.66GHz
4 GB DDR3 RAM
500 GB SATA2
1,100,000 rows (approx.)
Are you sure it's your database that's eating up the memory? I'd check that out first, it may be the programming that's using it up.
assuming that you have an index on the table, then 5,000 database calls a day is trivial, i think your bottleneck is elsewhere.
Ya, I have indexes on the database.
From what I can see, and from what I have been told from my hosting company's support department when they looked into it when having to re-boot and such, is that the mysql server is getting heavily loaded.
One problem I think i have is that quite a bit of my pages don't simply do a simple look-up query for a single item. A lot of my item specific pages (which are the ones getting the most traffic), and which all are dynamically created, search for the single item along with either similar items or items within a particular radius (which i think is a heavier recourse taking process).
I think there are probably a ton of more efficient ways to go about setting something like this up - but have not been able to com up with a better solution as of yet...
When I look at WHM on the server, under the Process Manager, mysql is always by far the highest process running. Other mysql tables on the server seem to run fast but they are also not being hit very hard and are a ton smaller in size.
Are there any tools/things you can do to try to find where things are bottle-necking?
On another note, the sites are not currently set up to cache the pages for X amount of days/hours...I am sure this would help a little bit but would really like to try and find out where my main problem is in programming is before moving to adding on that type of stuff.
An example set of queries that are probably my most commonly used are the below snippets from my detail pages (grab item and grab nearby items)...let me know if this is completely horrible to be running on the DB on every visit....I know I have quite a bit of improvement to make.
$query = "select * from `database`.`table`
where id='" . $db->escape($id) . "'";
$query2 = "SELECT column1, column2....column15,
( 3959 * acos( cos( radians('".$db->escape($lat)."') ) * cos( radians( Latitude ) ) * cos( radians( Longitude ) - radians('".$db->escape($lng)."') ) + sin(radians('".$db->escape($lat)."') ) * sin( radians( Latitude ) ) ) ) AS distance
WHERE City = '".$city."' AND status IN ('open','closed','pending') AND
((DATE_SUB(CURDATE(),INTERVAL 1 YEAR) <= DateClosed OR DATE_SUB(CURDATE(),INTERVAL 6 MONTH) <= EntryDate)) HAVING distance < ".$db->escape($radius)."
ORDER BY distance LIMIT 0 , 10";
Restating that I know it is probably horrible to have that run each page load (or at least I think it's probably horrible), just don't know how I could go about displaying such content on a page without running such resource intensive tasks.
Seeing the above, does it make more sense that mysql would be suffering a bit? Or, in your experience, should this still be manageable levels?
I really appreciate and thank you for your time.
Tec4, you look to be doing some GIS type calculations on the fly. This will cause a lot of processing and may be the cause of your spiking CPU.
Is it possible to pre-calculate this distance rather than doing it on the fly?
Also do a search on GIS donut, as that is a concept to reduce the load of looking for results in a GIS situation.
@Dijkgraaf - I'm trying to play with the idea of storing the closest entities in another table and just doing another simple look-up based on the entry's ID. I'm trying to figure out how/if it would work with my current set up....I think it would work decently for 1 of the 2 types of entities that I have this going for but not sure about the other - example:
has about 1 million rows. This table is updated/modified approximately every 5 hours w/ new entries and modifications to existing entries. Some of the entries, once modified, are updated to certain status's that cannot be displayed to the end user anymore and thus cannot be contained in the "nearby items" group. So, not really 100% sure how I could store this one efficiently...but then again, maybe any storing would be more efficient than doing it all on the fly. Would also need to take into consideration that I would probably need to update the nearby items at a minimum of a weekly basis locally, on a couple of my home computers, and upload and overwrite to the existing data - which isn't really much of a problem I don't think -- just trying to wrap my head around how to do it the best way possible.
I am actually already storing the closest items from this DB to other items in this DB (this one is much more static and never changes to a non-show-able status)...this one is about 210,000 rows and also queries against Database#1 for "nearby items" (this one is also done on the fly...but when looking for nearest entries from DB#2 to DB#2, it is stored in an array in the database).
Thanks for the pic regarding GIS donut, i'll look that up a bit later tonight when I'm back on the PC.
I'll try some ideas I come up with over tonight and see if I can come up with anything that works - it does seem a bit odd to be doing all the resource intensive calculations on the fly....any ideas or thoughts are always welcome :)
|maybe any storing would be more efficient than doing it all on the fly |
Yes, this is almost always true.
|Some of the entries, once modified, are updated to certain status's that cannot be displayed to the end user anymore |
If some portion of your data is invalidated on a regular basis, just build your caching mechanism to take that into account. For example, if you typically display the 10 nearest locations, but you expect some of those to be invalid, then cache the 20 nearest locations.
Primary key lookups on those 20 records should still be MUCH faster than calculating the distances of 10 records. If a few of them are no longer valid, you should still have enough rows to display what you need.
You can do a lot of primary key lookups with almost no hit on the CPU -- it's SELECT statements that perform math operations with the data in the table that kill the server.
Seems a quick fix would be to separate your DB from that server and setup a DB only server at least until you can get a handle on the memory issues. I would say that is a temporary solution if you know you have issues with you queries.
You need to test these queries and use the explain keyword to 1. See how long your queries are taking against real data and 2. Verify they are using indexes properly
The OP states 'in the past few weeks ... more traffic ... server slows down'.
It was fine before but now not is. The change is increase in traffic.
This looks like an optimisation issue. Your mysql and apache settings are not optimal.
download and run the following scripts:
They will show how much memory is being used by MySQL and apache. the tuner script will also offer suggestions as to what to tune.
As for apache you may have the server installed with default parameters which is just wasting your memory. You will need to tune some of them just so slightly and then check if the memory footprint reduces.
For Apache you want to be looking at changing
Keep Alives / Timeouts
For MySQL change
The quickest option is to just put more ram in the box. RAM is dirt cheap. You could put 8Gb in there for around $50 bucks (depending on type required, depending on if your DC will charge an exorbitant monthly fee).
Note too that your O/S may not be able to see more than 4Gb. In this case you just need to tune apache and / or MySQL.
@whoisgregg - thanks for the note on the primary key look-ups. I figured this previously but nothing really deemed it a super priority, in my mind, before because everything was running smoothly until the increase in traffic.
Since your post, I have gone ahead and switched my radius search to per-calculate and store the closest 20 items (by primary key) in a column in the table like: 'key1','key2',....,'key20'. Then upon page loading I use this to inject the keys into a function in one of my classes to search for the nearest items, like this:
select column1,column2,....,column12 from database.table where PrimaryKey in ($keys);
Planning on updating these entries once a week to make sure items are up to date and such. Does that seem like an appropriate way of going about it? Or is there a better way to store nearby entries in a table, etc?
@particleman - wish I could slap up a database server, I'd be on that in an instant! lol unfortunately would probably cost quite a bit to have my current hosting provider to do so - luckily the modification to the queries is helping quite a bit! Have modified them quite a bit so far - later today I will get on the running of queries with the explain keyword to try and hone them in even further - thanks for the recommendation :)
@Frank_Rizzo - will download those scripts tonight and see what it reports - thanks for pointing those out, had not heard of those so will check those out. Should definitely add a couple gigs of RAM in the box, as well, for sure.
Thank you all for the help!