|apache web server HELP! extreme laggy pages|
I work for a company with a highly trafficked site (about 100,000 unique daily visitors). We have our "MANAGED" hosting through Rackspace for quite the pretty penny per month. It includes a dedicated back end / DB server and cloud servers for the front end.
The setup has been functioning perfect for 10 months now; but this past Monday the speed of the site immediately dropped. Page load times fell from 1-2 seconds to between 10-20 seconds, and sometimes not at all. As far as we know (and as far as Rackspace says), no server setting were modified. No new code was introduced on our end. It's a mainly static site, with minimal user interaction with the backend at all.
Can any expert offer some advice? We've monitored the traffic, checked IPs, etc. We've even tunes down several site features in the interest of reducing server load. Upon a server reboot, the active threads/processes running on it IMMEDIATELY jump back up to maxed out levels. It seems like once our daily traffic reaches 10MB/s, a type of queue forms and the delays begin. Rackspace assures us that we're not limited to that.
Please advise - thanks! -Jay-
EDIT: Some more background info: The site is typically busiest from 7am until 3pm EST. For the past few days, we've noticed that between 7pm and 9-10pm the server has just lagged incredibly. However, at around that 9-10pm mark, something changes and the pages go back to loading almost instantly. (There is still decent traffic though.) Then at around 7am again it slows to a crawl.
Rackspace has offered solutions such as spinning up another server and incorporating their load balancing - they are in the process of this BUT they do NOT think the traffic is the issue. At one point they actually said there was potential packet loss somewhere in the network, but no progress has been made.
Versions in Use:
OS: cent OS on cloud
OS: Redhat on Dedicated Server
PHP: 5.3 / MySQL: 5.1.69
Preliminary question: What do you see when you ping the server? If lags are literally 10-20 seconds, you should be able to see the trouble spot with the naked eye.
Admittedly it's hard to explain why the problem would affect only your site and not everyone else living at (I assume) the same colo facility. Unless there's something the host isn't telling you...
ping to our server IP is between 120 and 150 ms with no loss.
traceroute steps in ms: <1, 28,12, 13, 28, 27, 28, 27, (here it hits our web host) 27, 27, 28, 29, 135
rackspace hasn't been much help for over 2 days now. from what i've read in forums today, others with similar issues have come up with network traffic or hardware failure as possible issues. (since the site has functioned 100% ok with a minimal (5%ish) traffic increase for the past 10 months. they very well could be keeping quiet about something until it's fixed...
We've had packet loss when pinging from one server to the other; and the servers are in the same datacenter. We believe there is some type of network issue. Rackspace originally told us that it looked to be a DDOS attack; they then said it was much more than our normal traffic; and then they said they were investigating their hardware. Eventually we determined SOMETHING had to be done, so another server was thrown up. Of course it helped immensely... for a price of course.
I'm still curious as to how out of nowhere one day the web server could no longer handle the load. It wasn't like the page load times were slowly getting worse; one day they just diminished to a crawl. Here's 2 charts that show our standard traffic:
I believe the cap looks to be at/around 10 Mbps. Our IT Director said that he believes we do have a gb card in the hardware. Thoughts on why the traffic seems to have a nice rollercoaster chart for the past 8 months but for 4 days last week (until we load balanced with another server on Friday) it would keep plateauing? (During these times pages would take 15-20 seconds to load, sometimes timing out, and this would last for a good 10 hours from 7am thru the early evening hours; then all of a sudden at 8 or 9 pm the site would start loading instantly again.)
We don't mind much the cost of the additional server if it's necessary; just feel there is something else at play here.
welcome to WebmasterWorld, Jay!
i would try running the "top" command during the slow period to see if that shows anything hogging resources on your server.
then maybe something like iftop to monitor your network traffic.