|Page speed lagging/half loading - Need input on troubleshooting|
Looking for some 3rd party input on a performance issue that has crept up with our website recently. I'm afraid I'd rather not divulge our website address at the current time, but I've included quite a few facts below that I'm hoping will trigger some ideas we haven't thought of yet. Here's the scoop:
- The site's hosted in a popular Texas based Cloud server. (Shared hosting, shared database) (for the purpose of this post I will ******** the name company name of the server.)
- The site is built in PHP (using Smarty templates) & MySQL.
- In early October, Google's "time spent downloading" in Google Webmaster tools has been increasing steadily. Previously, the site was behaving fairly consistently for roughly 10 months. Lately, some of our products in Google's merchant center have been "delisted" because Google timed out getting to them as well.
- No programming changes were made in the weeks/months prior to the slowdown.
- Another symptom we experience is an occasional "lag", whereby the page will half-load, sit there for 15+ seconds and then continue to load. It tends to happen every 15-20 page loads.
- "Lagged" pages always take about 30 seconds to load - regardless of the type of page being looked at. (Typical page loads are 5 seconds for category & product pages, 12 seconds for homepage.)
- Lag always starts at the same spot. Timewise, it's roughly equivalent to where the page would normally be done if watching the page load progress with a monitoring tool. Location-wise, it's at about 1/4 of the page-load. After the lag, the remainder of the page loads very quickly.
- We did quite a lot of back/forth with ********* since nothing had changed prior to the performance degradation so the server(s) seemed to be the likely culprit. They indicated no issues with the web or database servers - including providing stats such as highest usage % and wait i/o during the day which were negligible.
- Because of the way it was originally built, each page load does *a lot* of queries. Hundreds in some cases. They're simple queries, but a ton of them. Because of this, we always start by assuming it's a "too many queries" or "locking" problems.
- We setup a dedicated database server to ensure that we have *all* the power of the database server to ourselves, but performs about the same and has the same lag issues.
We've done the following with regards to database queries & indexing:
- All tables are InnoDB to mitigate potential table locking.
- MySQL slow query log (with 10s timeout) on the new server does not show any slow queries. (But it does experience the 15+ second lags)
- MySQL slow query log on the shared server shows a few slow queries (eg. 12 seconds), but oddly not those that would be expected to be slow. (eg. A query that pulls 1 record from a table with only 1 record, or a query that pulls a record from a table of 5 records based on its primary key). There does not appear to be any lock-time associated with any slow queries.
- The vast majority of queries are well optimized (including indexes that include 'order by' criteria in the indexes to avoid filesorts) and those that are complex and slower occur only on one or two pages that aren't necessarily those that indicate locking behavior.
We're currently looking into non-database-specific, inefficient code, but the site's been running well for ages so it seems strange that a programming issue would pop up out of the blue.
Any thoughts as to where we might've left a stone unturned?
is it getting held up on a specific GET request?
have you tried using a waterfall chart to see the load times for each resource on the page?
1) do a traceroute to your server. This will show you each jump along the way to your server and show any slow spots. I'm sure there are traceroute tools available for windows but I'm sure I don't know what they are :).
2) if it's not between you and the server as shown by traceroute, see if you can get a command prompt on your server. I use ssh to log on, again I'm not sure how to do that from windows to a linux server, but I'm sure it's possible. Anyway, issue the command 'top'. That will show you the processes that are running on the server as well as the avg cpu load. Nothing specific to add here, other than to see if the server looks to be overloaded.
3) re: phranque's waterfall test - the one that I've recently found is webpagetest.org (hopefully this link will pass, I found it because Google links to the site, and it's free). that'll show you a few interesting things. First it'll show you what's taking so long - downloading, or waiting for your server. Maybe you just have a big pig of a file. Or maybe your server is delayed in responding (and thus, likely your host). Anyway, that page provided me some very valuable information, for example I've got a plugin in my wordpress install that is adding a huge amount of code to my site.
4) if you suspect it's your website database, do some googling on msyql slow logging. There's some tools out there that will log all slow mysql queries to a file. You can then use that file to determine what queries are holding up the works and either modify them, or determine that again, it's just the host.
But my best guess? You're just on a slow server :).
Just to confirm, is everything hosted on the one server? No ads, widgets or similar held elsewhere?
Thanks for the ideas folks. Definitely some good food-for-thought there - some of which we had already dug into but I forgot to mention. I've addressed each item below.
is it getting held up on a specific GET request?
- We monitored the page load using the "Network" panel of Google Chrome's developer tools to watch the requests and the slowdown is happening on the base page load itself, not any of the components that make up the page. Because of that, we dug into the database as the most likely cause.
Where the various components are hosted. All on the same server?
- Most things are on the same server (though it's a cloud), but there are a few 3rd party hosted widgets like a Starfield tech and Verisign widget and some Facebook & Twitter "like" buttons. The lag occurs on pages that don't have the FB/Twitter buttons though. We've setup the two SSL widgets to use "deferred scripts" where possible and put them at the bottom of the page to prevent them from causing any page loading issues.
- We didn't do extensive traceroute checking because we're seeing the performance hit from pretty-well any network and a variety of clients, etc. Butů I see how it could be an internal router on ********'s side that might cause it so thanks for bringing that up. Did a few tests today and it looks like response times are between 20 and 70 ms for each hop which seems reasonable.
Top / Process Load
- Unfortunately because it's a shared hosting environment, we don't have shell access at which we can run the 'top' command. ******** has "assured" us that both the web and DB servers are lightly used, however.
- As mentioned earlier, we used Google Chrome's network panel which does a waterfall-style analysis. Seems to be the base page, not any of its components.
- This was definitely our first thought and dug deeply into it and the slow query log, etc. We've since setup the site itself on the dedicated DB server as well to see how it behaves when both it and the DB are on the same server. (ie. DB traffic never needs to leave the server) and it appears to work very fast with no lagging there. Seems like a pretty strong indicator that it's the "path between the servers" at *******'s end that's the issue and/or there's just a lot of data going back/forth so even on a fast network connection it takes time. Switching the site over to that server is definitely looking like our best option at this stage. Butů
Our biggest concern now is the potential impact of changing our IP address at this point. Anyone have any experience with changing IPs and how it might affect SERPS. I suspect we'll need to make sure the IP isn't a "bad neighborhood" IP, but other than that, will a "sudden" IP change hurt SERPS?
Is the server slow with a static page, ie just html and predefined text?
I'm suggesting a plain page with no data from other processes or even images.