I have a strange problem with a couple of my dedicated servers. I use them to host OpenVZ VPSs, and they have been running without incident for weeks -- but in the last couple days, at random intervals, one of them will start experiencing huge lag spikes (ping in the thousands of ms), and usually eventually time out. If I let it go long enough, the ping returns to normal, but it falls into this pattern:
64 bytes from xx.xx.xx.xx: icmp_seq=54048 ttl=42 time=133.463 ms
64 bytes from xx.xx.xx.xx: icmp_seq=54049 ttl=42 time=135.088 ms
Request timeout for icmp_seq 119586
Request timeout for icmp_seq 119587
Request timeout for icmp_seq 119588
Request timeout for icmp_seq 119589
Request timeout for icmp_seq 119590
Request timeout for icmp_seq 119591
Request timeout for icmp_seq 119592
Request timeout for icmp_seq 119593
Request timeout for icmp_seq 119594
Request timeout for icmp_seq 119595
Request timeout for icmp_seq 119596
Request timeout for icmp_seq 119597
Request timeout for icmp_seq 119598
Request timeout for icmp_seq 119599
64 bytes from xx.xx.xx.xx: icmp_seq=54064 ttl=42 time=133.666 ms
64 bytes from xx.xx.xx.xx: icmp_seq=54065 ttl=42 time=135.369 ms
It repeats like that, pinging a few times then timing out for several seconds, then a burst of normal pings, etc.
The only way to fix it is to use the host's automatic hardware reset request option.
So far this has happened on 2 different servers (not simultaneously). They're not similar besides each running CentOS and hosting VPSs. I have other VPS servers that this has never happened to...
I'm thinking this is some kind of network error in the DC, but when I ask them about it they say their network has had no problems and that they need a two-way traceroute/ping to diagnose the problem.
Just like you took the ping reports, take trace routes to the server IPs when the problem occurs. Trace routes will give a clear idea where the problem is. It may either be a network problem at DC OR an intermittent network problem OR a slim chance that your server is under attack.
So, a trace route will help as well as monitoring the server processes when the problem occurs will help you to get to a solution.
| Server Setup | Security | Optimization | Troubleshooting | Server Migration
| Monthly and Task basis services.
| MSN : madaboutlinux[at]hotmail.com | Skype : madaboutlinux
It could be dos attack. You should run 'iptraf' to trace what ip is consuming more bandwindth. However, you should follow DC and make some traceroute test from different locations, that could help you in order to diagnosis of this issue.