Results 1 to 24 of 24
Thread: SAVVIS has routing issues?
-
09-13-2006, 07:32 AM #1Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
SAVVIS has routing issues?
We've been having off-and-on slow-downs and have traced them to issues within SAVVIS routing. We bought servers from Servstra/LayeredTech/SAVVIS to get into the SAVVIS data center so we wouldn't have these problems. Traceroutes performed on servers outside of the data center result in no errors. Traceroutes performed to servers inside SAVVIS always have 2 to 4 errors, with the timeout set at the default 4 seconds. All errors occur when it hits the SAVVIS network. The routing problems are unrelated to traffic volume. They have been happening since yesterday and still going on at 5 AM EDT. This problem has no doubt been going on for sometime and now I'm beginning to doubt the wisdom of being in the SAVVIS data center. Server-to-server connections between two of our SAVVIS servers inside of the data center on the same Class B, seem fast. Traceroutes done even from the servers in the data center to servers outside the data center manifest the same problem as doing a traceroute to them from the outside.
-
09-13-2006, 09:48 AM #2.
- Join Date
- Jun 2006
- Location
- East Coast // NYC
- Posts
- 1,698
Which Savvis datacenter are you referring to? Our servers are in the Jersey City, NJ datacenter and have had absolutely no problems -- just incredible pings, blazing fast bandwidth .
-
09-13-2006, 10:04 AM #3Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Originally Posted by MrRadic
When I checked originally, I settled here because if was the fastest I could find of any data center. However, I'm seeing, and most likely have been for quite awhile, a problem at the SAVVIS TX center. The border router seems to be OK in this most recent one, but it had problems on the border router earlier as well.
1 1 ms <1 ms <1 ms 10.1.128.1
2 11 ms 9 ms 9 ms My external IP
3 11 ms 9 ms 18 ms 12.244.250.193
4 16 ms 15 ms 13 ms 12.118.112.9
5 25 ms 93 ms 21 ms tbr2-p012401.dtrmi.ip.att.net [12.123.139.142]
6 24 ms 95 ms 22 ms tbr2-cl18.cgcil.ip.att.net [12.122.10.134]
7 25 ms 19 ms 19 ms ggr2-p390.cgcil.ip.att.net [12.123.6.37]
8 23 ms 20 ms 30 ms 192.205.33.154
9 21 ms 23 ms 22 ms dcr2-so-5-0-0.Chicago.savvis.net [204.70.192.46]
10 45 ms 47 ms 43 ms dcr1-so-4-2-0.Denver.savvis.net [204.70.193.221]
11 56 ms 55 ms 64 ms dcr1-so-0-0-0.dallas.savvis.net [204.70.192.94]
12 56 ms 53 ms 52 ms bhr1-pos-12-0.fortworthda1.savvis.net [208.172.131.82]
13 49 ms 61 ms 45 ms 216.39.66.26
14 * 55 ms * 216.39.66.26
15 * * 48 ms 154.205.My.IP.reverse.layeredtech.com [72.36.Server.IP]
This happens with both of our servers in the DC. The servers are configured completely differently, and another one that I don't own or maintain has the same problem. It's just as big a mess when you run traceroute from the server.
-
09-13-2006, 10:30 AM #4.
- Join Date
- Jun 2006
- Location
- East Coast // NYC
- Posts
- 1,698
If you'd like, I can try a tracert from one our servers -- it may just be your ISP, let me know.
-
09-13-2006, 10:39 AM #5Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Originally Posted by MrRadic
-
09-13-2006, 10:47 AM #6.
- Join Date
- Jun 2006
- Location
- East Coast // NYC
- Posts
- 1,698
I also sent you a PM with this --
Tracing route to XXXXX.org [72.36.HIS.IP]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms reliablesite.net [64.237.33.193]
2 <1 ms <1 ms <1 ms 0.te1-1.cr1.ewr1.choopa.net [64.237.32.158]
3 <1 ms <1 ms <1 ms ge-6-21.car2.Newark1.Level3.net [4.79.236.9]
4 <1 ms <1 ms <1 ms ae-1-55.bbr1.Newark1.Level3.net [4.68.99.129]
5 <1 ms 3 ms <1 ms ae-0-0.bbr1.NewYork1.Level3.net [64.159.1.41]
6 1 ms 1 ms 1 ms ge-6-0-0-55.gar3.NewYork1.Level3.net [4.68.97.132]
7 1 ms <1 ms <1 ms dcr6-so-6-1-0.NewYork.savvis.net [4.68.127.206]
8 5 ms 5 ms 7 ms bcs2-so-4-0-0.Washington.savvis.net [204.70.192.1]
9 5 ms 19 ms 5 ms bcs1-so-7-0-0.Washington.savvis.net [204.70.192.33]
10 38 ms 39 ms 19 ms dcr1-so-3-0-0.Atlanta.savvis.net [204.70.192.53]
11 39 ms 19 ms 19 ms dcr2-as0-0.Atlanta.savvis.net [204.70.192.42]
12 40 ms 40 ms 39 ms csr2-ve240.fortworthda1.savvis.net [216.39.64.35]
13 40 ms 38 ms 40 ms 216.39.66.26
14 39 ms 39 ms * bhr1-pos-12-0.fortworthda1.savvis.net [208.172.131.82]
15 40 ms 40 ms 39 ms 154.205.HIS.IP.reverse.layeredtech.com [72.36SERVER.IP]
Trace complete.
-
09-13-2006, 10:59 AM #7Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Yours is as good as I've gotten. You got bombed only once at the perimeter, and none inside. Most of mine didn't bomb at the perimeter. I just did it twice in a row. One time it bombed twice, and minute later 4 times.
-
09-13-2006, 12:28 PM #8Web Hosting Master
- Join Date
- Jul 2003
- Location
- Texas
- Posts
- 787
The results you are seeing are normal. Your traceroutes are trying to hit devices which do not have reverse DNS records setup for them so you will get a failure there and also do not respond to ICMP packet or treat them with a low priority. Once you are inside our network and pass the edge routers we use RFC 1918 IP space which is also blocked on outbound packets so any inbound traceroute will show a * or similar on the first hop in.
$ traceroute -n 72.36.154.xxx
traceroute to 72.36.154.xxx(72.36.154.xxx), 64 hops max, 40 byte packets
1 209.67.208.177 0.348 ms 0.302 ms 0.307 ms <-- Pod 1 Host
2 10.1.3.1 0.495 ms 0.450 ms 0.382 ms
3 216.39.69.49 0.426 ms 0.330 ms 0.407 ms
4 216.39.64.41 0.488 ms 0.375 ms 0.297 ms
5 216.39.64.26 0.945 ms 0.663 ms 0.610 ms
6 216.39.69.238 1.131 ms 1.056 ms 0.970 ms
7 10.1.4.14 0.933 ms 0.879 ms 0.853 ms
8 72.36.154.xxx 0.759 ms 0.714 ms 0.669 ms <-- Your Host
Traceroute to your server from our POD1 network located in the same DC but using a 100% diverse network to your host which is on our POD2 network. You can see that with the -n flag being used there is no errors and shows the IP of the host instead of where you typically see a *.
Use the -n flag which prevents a nslookup of each IP and you will get the same results.
You can also test the througput of the connection by downloading a test file via wget / fetch on your host and it should get 1-1.2MB/s for a 10mb/s or 10-12MB/s for a 100mb/s link.
Thanks,
Jeremy
-
09-13-2006, 01:08 PM #9Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Jeremy,
>Once you are inside our network and pass the edge routers we use RFC 1918 IP space which is also blocked on outbound packets so any inbound traceroute will show a * or similar on the first hop in.<
Thank you HUGE for that reply. We have a ZABBIX install that we are not done with for this server. This has been huge frustrating because the slow-downs happen at random times and do not follow the normal daily traffic curve. When we need to move a large site to a different data center to server our customers, that's a problem. When the slow downs occur, the server load goes down with it. We were just about close on two more high-end servers from you with mirror and fail-over and are getting concerned with what we see.
I will post back results when I get them,
Thanks again!
-
09-13-2006, 01:21 PM #10Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Originally Posted by LTADMIN
Here is my latest without looking up the IP:
Trace
1 1 ms <1 ms <1 ms 10.1.128.1
2 18 ms 10 ms 8 ms 73.43.source.address
3 11 ms 17 ms 8 ms 12.244.250.193
4 26 ms 15 ms 13 ms 12.118.112.9
5 23 ms 21 ms 25 ms 12.123.139.142
6 22 ms 22 ms 21 ms 12.122.10.134
7 30 ms 103 ms 21 ms 12.123.6.37
8 56 ms 21 ms 23 ms 192.205.33.154
9 22 ms 21 ms 20 ms 204.70.192.46
10 45 ms 43 ms 48 ms 204.70.192.98
11 60 ms 64 ms 53 ms 208.172.129.230
12 55 ms 53 ms 53 ms 208.172.131.82
13 61 ms 47 ms 59 ms 216.39.64.59
14 * * 56 ms 216.39.66.26
15 49 ms * * 72.36.destination address
16 48 ms 54 ms 47 ms 72.36.destination address
Trace complete.
-
09-13-2006, 01:27 PM #11.
- Join Date
- Jun 2006
- Location
- East Coast // NYC
- Posts
- 1,698
Usually a * means a packet was lost due to no-reply. But if Jeremy says that the routers treat those packets with low importance, it could just ignoring them at times -- which also means that during no load times (3am) the routers should almost always respond to each ping.
-
09-13-2006, 01:37 PM #12Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Originally Posted by MrRadic
-
09-14-2006, 01:17 PM #13Newbie
- Join Date
- Apr 2006
- Posts
- 14
I have the same issue, pings get lost at the border router, although pings to my server itself are NEVER lost
I guess the border router treats pings as low priority as he said.. ( I am in India so these ping times are normal to US servers)
1 <1 ms <1 ms <1 ms 192.168.1.1
2 28 ms 27 ms 25 ms 61.17.201.1
3 53 ms 36 ms 27 ms 202.54.10.62
4 32 ms 31 ms 29 ms 203.197.72.150
5 44 ms 31 ms 31 ms 202.54.2.162
6 252 ms 231 ms 247 ms 202.54.2.130
7 366 ms 311 ms 311 ms 204.70.151.29
8 312 ms 311 ms 311 ms 204.70.193.45
9 297 ms 295 ms 297 ms 204.70.192.1
10 296 ms 311 ms 297 ms 204.70.192.33
11 299 ms 300 ms 309 ms 204.70.192.53
12 298 ms 309 ms 309 ms 204.70.192.42
13 302 ms 301 ms 547 ms 216.39.64.59
14 350 ms 307 ms 299 ms 216.39.66.26
15 300 ms 299 ms * 208.172.131.82
16 299 ms 299 ms 301 ms 72.36.my.ipLast edited by [MaxX]; 09-14-2006 at 01:23 PM.
-
09-14-2006, 01:46 PM #14Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Originally Posted by [MaxX]
1 1 ms <1 ms 1 ms 10.1.128.1
2 12 ms 8 ms 9 ms 73.43.My.IP
3 10 ms 9 ms 9 ms 12.244.250.193
4 17 ms 26 ms 13 ms 12.118.112.9
5 35 ms 22 ms 21 ms 12.123.139.142
6 24 ms 21 ms 21 ms 12.122.10.134
7 22 ms 21 ms 21 ms 12.123.6.37
8 22 ms 21 ms 21 ms 192.205.33.154
9 * 29 ms * 204.70.192.46
10 46 ms 46 ms 44 ms 204.70.193.221
11 55 ms 52 ms 59 ms 204.70.192.94
12 56 ms 52 ms 63 ms 208.172.131.82
13 49 ms 59 ms 48 ms 216.39.66.26
14 * 55 ms * 216.39.66.26
15 * * 48 ms 72.36.Server.IP
Note:
- There are no stars except on the SAVVIS network.
- Check the last three lines
I ran a packet sniffer on an FTP session this morning
17248 My.Server DELL_9400 82 0:05:21.056441 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774573529,W=33304 TCP Slow Segment Recovery (1.040303 seconds from packet 17,184)
17249 My.Server DELL_9400 82 0:05:21.056716 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774573529,W=31856
17250 My.Server DELL_9400 70 0:05:21.087577 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W= 9412 TCP Slow Segment Recovery (1.038276 seconds from packet 17,186)
17251 My.Server DELL_9400 70 0:05:21.087854 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=13508
17252 My.Server DELL_9400 70 0:05:21.088145 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=21700
17253 My.Server DELL_9400 70 0:05:21.088400 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=17604
17254 My.Server DELL_9400 70 0:05:21.088689 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=25796
17255 My.Server DELL_9400 70 0:05:21.088956 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=29892
17256 My.Server DELL_9400 70 0:05:21.089223 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=33304
17257 My.Server DELL_9400 82 0:05:21.158471 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774622761,W=33304
17258 My.Server DELL_9400 90 0:05:21.192125 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774622761,W=33304 TCP Slow Acknowledgement (1.001926 seconds from packet 17,202)
17259 DELL_9400 My.Server 1518 0:05:21.192300 FTP Data Src= 5006,Dst= 20,.A....,S=2774656065,L= 1448,A=4062253421,W=65535
17260 My.Server DELL_9400 90 0:05:21.218485 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774622761,W=33304 TCP Slow Acknowledgement (1.000211 seconds from packet 17,204)
17261 DELL_9400 My.Server 1506 0:05:21.218647 FTP Data Src= 5006,Dst= 20,.A....,S=2774622761,L= 1436,A=4062253421,W=65535 TCP Retransmission
17262 DELL_9400 My.Server 82 0:05:21.218712 FTP Data Src= 5006,Dst= 20,.A....,S=2774624197,L= 12,A=4062253421,W=65535 TCP Retransmission
17263 DELL_9400 My.Server 1506 0:05:21.218770 FTP Data Src= 5006,Dst= 20,.A....,S=2774625657,L= 1436,A=4062253421,W=65535 Non-Responsive Server
17264 DELL_9400 My.Server 82 0:05:21.218823 FTP Data Src= 5006,Dst= 20,.A....,S=2774627093,L= 12,A=4062253421,W=65535 Non-Responsive Server
However, the trace was not full of this, and the speed would be limited to up stream speed. However, these kinds of errors I didn't get on the non-SAVVIS server. The download went well with not more than the normal errors.
You can also test the througput of the connection by downloading a test file via wget / fetch on your host and it should get 1-1.2MB/s for a 10mb/s or 10-12MB/s for a 100mb/s link.
-
09-14-2006, 09:01 PM #15PHP for breakfast
- Join Date
- May 2004
- Location
- Lansing, MI, USA
- Posts
- 1,548
Hey guys, when you're trying to hide your IP, be tactful about it...
15 * * 48 ms 154.205.My.IP.reverse.layeredtech.com [72.36.Server.IP]
... reverse dns records ... default to displaying in reverse. So, you blocked out the first two octets on the reverse resolve, and the last two on the direct ip... and ... one can put one and the other together to get your IP.
Just figured I would point that out as I'd seen a couple people do it in this thread.Jacob - WebOnce Technologies - 30 Day 100% Satisfaction Guarantee - Over 5 Years Going Strong!
Website Hosting, PHP4&5, RoR, MySQL 5.0, Reseller Hosting, Development, and Designs
Powered By JAM - Professional Website Development - PHP, MySQL, JavaScript, AJAX - Projects Small & Large
-
09-14-2006, 10:34 PM #16Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Originally Posted by WO-Jacob
-
09-14-2006, 11:46 PM #17WHT Addict
- Join Date
- Jun 2004
- Location
- Canada
- Posts
- 132
post tcptraceroute's, mtr's and tcpdumps.
-
09-15-2006, 05:57 PM #18Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Originally Posted by comm
Traceroute on the other one shows no errors, no repeated IP addresses, and 11 hops instead of 16.
-
09-16-2006, 10:44 PM #19Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
- I have pkt files from the packet sniffer that show standard HTML pages and PHP pages. Both HTML and PHP have the same problems. Delays for as long as 15 seconds before responding. It happens several times during the page loads. Sometimes you get 4,6,10,12 second delays, but 15 is a popular number, which coincides with the Keep-Alive timeout. The VPS server is also in the same data center and it does not have the problem.
FTP transfer for a 64MB test zip file:
- From the SAVVIS dedicated server to my notebook 5:00 minutes avg.
- From the SAVVIS VPS server to my notebook 1:46 minutes avg.
- From the Non-SAVVIS server to my notebook 1:50 minutes. Avg.
*Big difference in FTP performance even from two servers in the same data center.
- From the SAVVIS dedicated server to the SAVVIS VPS server 1:13
- From the SAVVIS VPS server to the SAVVIS dedicated server 1:10
WGET transfer for a 64MB test zip file
- From the SAVVIS Test Server to dedicated server using wget, 7 seconds..
With the FTP protocol I don't see large lags that I see with HTML/PHP, but I do get Low Throughput messages from the packet traces. I commented out eAccelerator from the PHP.INI and restarted Apache, but it made no difference, so I put it back.
Any kind of Apache problem does not explain the wide gap in FTP performance. I have no way of knowing if an in-data center ftp transfer should take that long because I have nothing to compare it with. All that I know is wget takes 7 seconds from the test server.
Couple interesting traceroutes. No time-outs when done internally, even without the -n option, even on the same router.
- Laptop to SAVVIS VPS server
1 <1 ms <1 ms <1 ms 10.1.128.1
2 10 ms 13 ms 15 ms 73.43.32.1
3 8 ms 7 ms 19 ms 12.244.250.193
4 13 ms 17 ms 16 ms 12.118.112.9
5 22 ms 21 ms 35 ms 12.123.139.142
6 22 ms 21 ms 23 ms 12.122.10.134
7 20 ms 23 ms 20 ms 12.123.6.69
8 24 ms 21 ms 20 ms 208.175.10.93
9 21 ms 22 ms 24 ms 204.70.192.46
10 44 ms 51 ms 43 ms 204.70.193.221
11 59 ms 52 ms 58 ms 204.70.192.94
12 46 ms 47 ms 53 ms 208.172.131.82
13 57 ms 60 ms 59 ms 216.39.64.18
14 * 60 ms * 216.39.69.238
15 59 ms * 55 ms 72.36.VPS.Server
- SAVVIS dedicated server to SAVVIS VPS server
1 153.205.36.72.reverse.layeredtech.com (72.36.205.153) 0.341 ms 0.842 ms 0.491 ms
2 10.1.5.9 (10.1.5.9) 0.653 ms 0.510 ms 0.406 ms
3 216.39.79.37 (216.39.79.37) 0.367 ms 0.441 ms 0.333 ms
4 bhr1-po-1.fortworthda1.savvis.net (216.39.64.33) 0.590 ms 0.862 ms *
5 216.39.64.18 (216.39.64.18) 1.326 ms 1.129 ms 1.306 ms
6 216.39.69.238 (216.39.69.238) 1.117 ms 0.689 ms 0.724 ms
7 10.1.4.14 (10.1.4.14) 0.849 ms 0.640 ms 0.915 ms
8 www.blurstorm.com (72.36.VPS.Server) 0.969 ms 1.339 ms 1.957 ms
Summary:
One cannot make a case for Apache/PHP because of the unrelated abysmal FTP performance. Moreover, the FTP performance when performed internal to the SAVVIS data center is on-par with other servers. There isn't much wrong with the wget times to the test server, which works out to approximately 9MB/sec, which is not far from the 10MB-12MB/sec expected. Thus it isn't server throughput problem. Since the site always has traffic, one might think that it's the traffic. I unloaded Apache and did ftp timings again. Timings are within 8 seconds of each other with the lower time being with Apache loaded. So far, everything still points toward a network problem.
-
09-16-2006, 11:06 PM #20Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Note: When FTP downloading from the SAVVIS VPS server or the non-SAVVIS server, the needle on the and readout on the packet sniffer shows a relatively steady 775 pps +- 70. In the case of the SAVVIS dedicated server, the load fluctuates wildly averaging around 275 with lows of 151 gusting to 450 with occasional peak gusts of 511.
-
09-20-2006, 10:04 AM #21Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Progress:
"The NOC technicians have updated us that they have reports of errors in the switch-port, and they shall move the server to a different port to sort this out."
"The NOC team has updated us that they have to move your host to a different slot on their network. But this may require a re-IP of your host to use the new locations port and IP address space. "
"I have exchanged the switch port on this server... " Also they said that they will move it to another location if they cannot get the throughput necessary where it is on the new port or even move it to another DC if necessary. At least the SAVVIS/Layered Tech/Servstra will kill the problem if you provide them with unrefutable evidence.
I have tested before and after the switch port change. The difference is astounding. Pages that took up to 54 seconds to load now are difficult to time. Whatever it is, it's less than 2 seconds like our other servers. We used to see multiple 15 second delays, 11 second delays, 9 second delays, 6 second delays, 2 second delays in one page load. We haven't been able to make that happen since yesterday. We will be testing again today. The maximum total of the delays that we see on a page load now is 1.75 seconds, which is what we see out other sites both inside and outside of SAVVIS.
This experience has been massively expensive torture since the beginning many months ago. I don't understand why they either don't have SNMP monitoring on the switches, or they didn't pay any attention to the errors. There is nothing natural about packet errors between a host and a switch. I assumed that being inside SAVVIS this is taken care of for me, so we spent our time working on the server to it to perform better.
The bottom line for me is, there is no excuse for what happened. The Servstra/LayeredTech/SAVVIS connection will do what is necessary to fix the problem.
I appreciate all those who chipped in on this thread to help solve this problem. I'll keep you posted if there are further developments.
-
09-20-2006, 12:50 PM #22Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
Originally Posted by IT_Architect
-
09-20-2006, 10:09 PM #23Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
We're going to call it good enough and close the issue. It's not completely clean, but it is on par with our other servers inside and outside of the SAVVIS DC. At least the sites are serviceable and we may migrate some of them back that we rushed off to VPSs.
Thanks All!
-
09-21-2006, 05:17 PM #24Web Hosting Guru
- Join Date
- Feb 2006
- Posts
- 345
They said they wanted to wait a couple days and test some more before closing the ticket. The sites on the dedicated server are performing very well today. In fact, they are outperforming our VPS sites for the first time, as it should. While this problem should have never happened, I'm satisfied with the persistence and commitment that SAVVIS / LayeredTech / Servstra exhibited to resolve the root cause of the problem. I plan to try a couple more servers with them except this time I will insure that their communications path can handle the traffic before I go live with them.
Last edited by IT_Architect; 09-21-2006 at 05:29 PM.