trying to figure out why my server went dark

My Apache server on AWS EC2 went dark for an hour early this morning. Trying to figure out why but having a little trouble deciphering the logs. Any comments or insights would be most appreciated.
This is Apache 2.4.34 on Amazon Linux 2, 4GB memory (typically only 25-30% used). It hosts a single WordPress site, with the database hosted on a dedicated RDS instance.

Looking at my metrics and logs, it appears there was a big memory spike just after 2am, it went up to over 90%, then there's an hour and a half gap during with the server was unresponsive (and no metrics being collected during that time, no log entries).

httpd error log shows nothing right before the crash, but has this at the recovery point:

03:49:36 [Mon Oct 01 06:20:54.569826 2018] [proxy_fcgi:error] [pid 18162:tid 140310726268672] (70007)The timeout specified has expired: [client 11.111.111.111:43245] AH01075: Error dispatching request to : (polling)

messages log shows this:

Oct 1 06:10:01 ip-111-11-11-111 systemd: Starting User Slice of root.
02:10:02
Oct 1 06:10:01 ip-111-11-11-111 systemd: Started Session 2880 of user root.
02:10:02
Oct 1 06:10:01 ip-111-11-11-111 systemd: Starting Session 2880 of user root.
02:10:02
Oct 1 06:10:01 ip-111-11-11-111 systemd: Removed slice User Slice of root.
02:10:02
Oct 1 06:10:01 ip-111-11-11-111 systemd: Stopping User Slice of root.
02:39:28
Oct 1 06:19:42 ip-111-11-11-111 dhclient[3079]: XMT: Solicit on eth0, interval 113520ms.
03:49:36
Oct 1 06:19:46 ip-111-11-11-111 dhclient[3079]: XMT: Solicit on eth0, interval 129820ms.
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: amazon-cloudwat invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null), order=0, oom_score_adj=0
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: amazon-cloudwat cpuset=/ mems_allowed=0
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: CPU: 0 PID: 5768 Comm: amazon-cloudwat Not tainted 4.14.62-70.117.amzn2.x86_64 
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: Call Trace:
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: dump_stack+0x5c/0x82
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: dump_header+0x94/0x229
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: ? do_try_to_free_pages+0x2a1/0x330
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: ? get_page_from_freelist+0x525/0xba0
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: oom_kill_process+0x213/0x410
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: out_of_memory+0x2af/0x4d0
03:49:36
Oct 1 06:39:28 ip-111-11-11-111 kernel: __alloc_pages_slowpath+0xa19/0xe00
03:49:36

And ssl_error log has this:

02:39:28
[Mon Oct 01 06:11:26.699501 2018] [proxy_fcgi:error] [pid 18162:tid 140310625556224] (70007)The timeout specified has expired: [client 11.11.11.111:6018] AH01075: Error dispatching request to : (polling), referer: https://example.com/product/phoebe-hat-mitts-kit/
03:49:36
[Mon Oct 01 06:11:26.699465 2018] [proxy_fcgi:error] [pid 18162:tid 140310785017600] (70007)The timeout specified has expired: [client 11.11.1.111:48590] AH01075: Error dispatching request to : (polling), referer: https://example.com/product-category/hand-knitting-patterns/

What exactly happened to cause the memory spike?

Do I want to rethink my configuration in my httpd.conf file? Here's what I have now:


<IfModule mpm_event_module>
 ServerLimit 300
MaxRequestWorkers 300
 StartServers 2
 MinSpareThreads 25
 MaxSpareThreads 75
 ThreadLimit 64
 ThreadsPerChild 25
MaxConnectionsPerChild 1000
</IfModule>

Please help me understand exactly what happened here and how I can fix it. Thanks!

[01-Oct-2018 07:49:36] NOTICE: fpm is running, pid 3119 [01-Oct-2018 07:49:36] NOTICE: ready to handle connections [01-Oct-2018 07:49:36] NOTICE: systemd monitor interval set to 10000ms [01-Oct-2018 15:51:30] NOTICE: [pool www] child 3215 exited with code 0 after 28914.058528 seconds from start [01-Oct-2018 15:51:30] NOTICE: [pool www] child 29529 started [01-Oct-2018 15:51:39] NOTICE: [pool www] child 3217 exited with code 0 after 28922.768615 seconds from start [01-Oct-2018 15:51:39] NOTICE: [pool www] child 29537 started [01-Oct-2018 15:51:39] NOTICE: [pool www] child 3218 exited with code 0 after 28922.995524 seconds from start [01-Oct-2018 15:51:39] NOTICE: [pool www] child 29538 started [01-Oct-2018 15:52:51] NOTICE: [pool www] child 3216 exited with code 0 after 28995.217703 seconds from start [01-Oct-2018 15:52:51] NOTICE: [pool www] child 29681 started [01-Oct-2018 15:53:23] NOTICE: [pool www] child 4710 exited with code 0 after 27921.342289 seconds from start [01-Oct-2018 15:53:23] NOTICE: [pool www] child 29710 started [01-Oct-2018 20:40:35] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 3 idle, and 11 total children [01-Oct-2018 20:40:36] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 3 idle, and 12 total children [02-Oct-2018 08:59:53] NOTICE: [pool www] child 13161 exited with code 0 after 44356.284643 seconds from start [02-Oct-2018 08:59:53] NOTICE: [pool www] child 19861 started [02-Oct-2018 09:08:52] NOTICE: [pool www] child 13182 exited with code 0 after 44875.929775 seconds from start [02-Oct-2018 09:08:52] NOTICE: [pool www] child 20324 started [02-Oct-2018 09:18:52] NOTICE: [pool www] child 13180 exited with code 0 after 45476.997745 seconds from start [02-Oct-2018 09:18:52] NOTICE: [pool www] child 20859 started [02-Oct-2018 14:01:18] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 2 idle, and 15 total children [02-Oct-2018 14:01:34] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 1 idle, and 16 total children [03-Oct-2018 10:46:37] NOTICE: [pool www] child 12282 exited with code 0 after 64784.146714 seconds from start [03-Oct-2018 10:46:37] NOTICE: [pool www] child 4290 started

#configure MPM event module # Optimized settings for AWS EC2 t2.medium <IfModule mpm_event_module> ServerLimit 500 StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxRequestWorkers 500 MaxConnectionsPerChild 1000 </IfModule>

trying to figure out why my server went dark

Megunticook

justpassing

iamlost

justpassing

Megunticook

Megunticook

robzilla

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week