Hi to all,
I'm in trouble, I have two Linux machine that have this featurese:
1°st machine
24 core
14 GB RAM
53% free disk space
2°nd machine
16 core
6 GB
87% free disk space
Actually I have:
netstat -putan | grep ESTA | wc -l
152
ps aux | grep httpd | wc -l
247
This first machine receive a lots of connection (up to 5000 every hours) through a Facebook apps where the user can see some video (hosted in 2nd machine) and they can upload and vote the video that will be approved by a CMS hostend in 1st machine.
So, for this reason the bandwidth of two machine are 250 Mbps. Actually, with iftop I can see 130 Mbps (video machine) + 20 Mbps used (html and backend). The video are encoded with ffmpeg.
Into apache server status folder, actually, I can see:
CPU Usage: u65.15 s223.49 cu0 cs0 - 12.6% CPU load
100 requests/sec - 1.2 MB/second - 12.3 kB/request
219 requests currently being processed, 56 idle workers
For each visitor we have 6,72 page for visit. In order to see the video (hosted in 2nd server) the user must be view 3 different page: Welcome page, random videos, the video. So I imagine that the user can see 2 different video each visit before rate it.
Into my logwatch I can see up to 1 TB transferred every day from 2nd server (video).
I have modified httpd.conf and my.cnf of the first machine in this manner:
httpd.conf
KeepAlive On
MaxKeepAliveRequests 150
KeepAliveTimeout 2
<IfModule prefork.c>
StartServers 30
MinSpareServers 10
MaxSpareServers 20
ServerLimit 512
MaxClients 512
MaxRequestsPerChild 4000
</IfModule>
<IfModule worker.c>
StartServers 2
MaxClients 256
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
</IfModule>
my.cnf:
[mysqld]
datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
user=mysql
symbolic-links=0
default-time-zone = '+01:00'
skip-external-locking
port = 3306
key_buffer = 128M
max_allowed_packet = 1M
table_cache = 25000
max_tmp_tables = 25000
sort_buffer_size = 1M
read_buffer_size = 1M
read_rnd_buffer_size = 1M
myisam_sort_buffer_size = 1M
thread_cache_size = 64
query_cache_size = 512M
query_cache_limit = 512M
read_rnd_buffer_size = 1M
join_buffer_size = 1M
max_connections = 512
tmp_table_size = 64M
max_heap_table_size = 64M
innodb_buffer_pool_size = 32M
interactive_timeout=100
# Reduced wait_timeout to prevent idle clients holding connections.
wait_timeout=15
connect_timeout=10
Actually the 1st machine have this average:
load average: 5.35, 5.69, 5.54
So I have seen that during the night and in the first half of the day everything works fine but in hte afternoon the situation stop to work or something like this.
I don't understand why but the bandwith decrease (today, for example above 40 Mbps), the Facebook app and the CMS will be unavailable. If I restart apache the situation become faster and after few minutes again unavailable.
I have tested the CMS with various PEN test tool (arachni, foca, wapity, netsparker free and so on) but nothing appear.
I have checked the connection and I have seen that when httpd is up to MaxClients anything works. I have try to increase the MaxClients connection up to 1000 but the system still does not work correctly; the connection (nestat -putan | grep ESTA | wc -l) up to 1004 and the system will be unstable. If I set MaxClients up to 2000 the httpd die after few minutes.
Today I have increase the SYN Flood protection of that machine in order to mitigate the Syn Flood connection but I'm not sure that I have solved the problem. The last attack will be at 20.37 Italian time.
Actually everything works fine but is midnight and that's is normal.
This is the actual iptables rules:
Table: mangle
Chain PREROUTING (policy ACCEPT)
num target prot opt source destination
Chain INPUT (policy ACCEPT)
num target prot opt source destination
Chain FORWARD (policy ACCEPT)
num target prot opt source destination
Chain OUTPUT (policy ACCEPT)
num target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
num target prot opt source destination
Table: filter
Chain INPUT (policy ACCEPT)
num target prot opt source destination
1 DROP all -- 109.169.74.200 0.0.0.0/0
2 DROP icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 8
3 DROP all -- 109.169.74.200 0.0.0.0/0
Chain FORWARD (policy DROP)
num target prot opt source destination
1 DROP icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 8
Chain OUTPUT (policy ACCEPT)
num target prot opt source destination
1 DROP icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 8
2 DROP icmp -- 0.0.0.0/0 0.0.0.0/0 icmp type 8
Chain syn-flood (0 references)
num target prot opt source destination
1 RETURN all -- 0.0.0.0/0 0.0.0.0/0 limit: avg 10/sec burst 50
2 LOG all -- 0.0.0.0/0 0.0.0.0/0 LOG flags 0 level 4 prefix `SYN flood: '
3 DROP all -- 0.0.0.0/0 0.0.0.0/0
Please, can anyone help me to understand what's happened and how to solve it definetly ?
Thanks in advance for any suggestion,
Leofire