Results 1 to 24 of 24
  1. #1
    Join Date
    Feb 2006
    Posts
    345

    SAVVIS has routing issues?

    We've been having off-and-on slow-downs and have traced them to issues within SAVVIS routing. We bought servers from Servstra/LayeredTech/SAVVIS to get into the SAVVIS data center so we wouldn't have these problems. Traceroutes performed on servers outside of the data center result in no errors. Traceroutes performed to servers inside SAVVIS always have 2 to 4 errors, with the timeout set at the default 4 seconds. All errors occur when it hits the SAVVIS network. The routing problems are unrelated to traffic volume. They have been happening since yesterday and still going on at 5 AM EDT. This problem has no doubt been going on for sometime and now I'm beginning to doubt the wisdom of being in the SAVVIS data center. Server-to-server connections between two of our SAVVIS servers inside of the data center on the same Class B, seem fast. Traceroutes done even from the servers in the data center to servers outside the data center manifest the same problem as doing a traceroute to them from the outside.

  2. #2
    Join Date
    Jun 2006
    Location
    East Coast // NYC
    Posts
    1,698
    Which Savvis datacenter are you referring to? Our servers are in the Jersey City, NJ datacenter and have had absolutely no problems -- just incredible pings, blazing fast bandwidth .

  3. #3
    Join Date
    Feb 2006
    Posts
    345
    Quote Originally Posted by MrRadic
    Which Savvis datacenter are you referring to? Our servers are in the Jersey City, NJ datacenter and have had absolutely no problems -- just incredible pings, blazing fast bandwidth .
    These are the Texas ones.

    When I checked originally, I settled here because if was the fastest I could find of any data center. However, I'm seeing, and most likely have been for quite awhile, a problem at the SAVVIS TX center. The border router seems to be OK in this most recent one, but it had problems on the border router earlier as well.

    1 1 ms <1 ms <1 ms 10.1.128.1
    2 11 ms 9 ms 9 ms My external IP
    3 11 ms 9 ms 18 ms 12.244.250.193
    4 16 ms 15 ms 13 ms 12.118.112.9
    5 25 ms 93 ms 21 ms tbr2-p012401.dtrmi.ip.att.net [12.123.139.142]
    6 24 ms 95 ms 22 ms tbr2-cl18.cgcil.ip.att.net [12.122.10.134]
    7 25 ms 19 ms 19 ms ggr2-p390.cgcil.ip.att.net [12.123.6.37]
    8 23 ms 20 ms 30 ms 192.205.33.154
    9 21 ms 23 ms 22 ms dcr2-so-5-0-0.Chicago.savvis.net [204.70.192.46]
    10 45 ms 47 ms 43 ms dcr1-so-4-2-0.Denver.savvis.net [204.70.193.221]
    11 56 ms 55 ms 64 ms dcr1-so-0-0-0.dallas.savvis.net [204.70.192.94]
    12 56 ms 53 ms 52 ms bhr1-pos-12-0.fortworthda1.savvis.net [208.172.131.82]
    13 49 ms 61 ms 45 ms 216.39.66.26
    14 * 55 ms * 216.39.66.26
    15 * * 48 ms 154.205.My.IP.reverse.layeredtech.com [72.36.Server.IP]

    This happens with both of our servers in the DC. The servers are configured completely differently, and another one that I don't own or maintain has the same problem. It's just as big a mess when you run traceroute from the server.

  4. #4
    Join Date
    Jun 2006
    Location
    East Coast // NYC
    Posts
    1,698
    If you'd like, I can try a tracert from one our servers -- it may just be your ISP, let me know.

  5. #5
    Join Date
    Feb 2006
    Posts
    345
    Quote Originally Posted by MrRadic
    If you'd like, I can try a tracert from one our servers -- it may just be your ISP, let me know.
    I PMed you with a server address of one of them. However, it won't be my ISP because it works every place else. It's only SAVVIS TX where I see the problem. When I do a trace to your server, it comes out fine.

  6. #6
    Join Date
    Jun 2006
    Location
    East Coast // NYC
    Posts
    1,698
    I also sent you a PM with this --

    Tracing route to XXXXX.org [72.36.HIS.IP]
    over a maximum of 30 hops:

    1 <1 ms <1 ms <1 ms reliablesite.net [64.237.33.193]
    2 <1 ms <1 ms <1 ms 0.te1-1.cr1.ewr1.choopa.net [64.237.32.158]
    3 <1 ms <1 ms <1 ms ge-6-21.car2.Newark1.Level3.net [4.79.236.9]
    4 <1 ms <1 ms <1 ms ae-1-55.bbr1.Newark1.Level3.net [4.68.99.129]
    5 <1 ms 3 ms <1 ms ae-0-0.bbr1.NewYork1.Level3.net [64.159.1.41]
    6 1 ms 1 ms 1 ms ge-6-0-0-55.gar3.NewYork1.Level3.net [4.68.97.132]
    7 1 ms <1 ms <1 ms dcr6-so-6-1-0.NewYork.savvis.net [4.68.127.206]
    8 5 ms 5 ms 7 ms bcs2-so-4-0-0.Washington.savvis.net [204.70.192.1]
    9 5 ms 19 ms 5 ms bcs1-so-7-0-0.Washington.savvis.net [204.70.192.33]
    10 38 ms 39 ms 19 ms dcr1-so-3-0-0.Atlanta.savvis.net [204.70.192.53]
    11 39 ms 19 ms 19 ms dcr2-as0-0.Atlanta.savvis.net [204.70.192.42]
    12 40 ms 40 ms 39 ms csr2-ve240.fortworthda1.savvis.net [216.39.64.35]
    13 40 ms 38 ms 40 ms 216.39.66.26
    14 39 ms 39 ms * bhr1-pos-12-0.fortworthda1.savvis.net [208.172.131.82]
    15 40 ms 40 ms 39 ms 154.205.HIS.IP.reverse.layeredtech.com [72.36SERVER.IP]

    Trace complete.

  7. #7
    Join Date
    Feb 2006
    Posts
    345
    Yours is as good as I've gotten. You got bombed only once at the perimeter, and none inside. Most of mine didn't bomb at the perimeter. I just did it twice in a row. One time it bombed twice, and minute later 4 times.

  8. #8
    Join Date
    Jul 2003
    Location
    Texas
    Posts
    787
    The results you are seeing are normal. Your traceroutes are trying to hit devices which do not have reverse DNS records setup for them so you will get a failure there and also do not respond to ICMP packet or treat them with a low priority. Once you are inside our network and pass the edge routers we use RFC 1918 IP space which is also blocked on outbound packets so any inbound traceroute will show a * or similar on the first hop in.

    $ traceroute -n 72.36.154.xxx
    traceroute to 72.36.154.xxx(72.36.154.xxx), 64 hops max, 40 byte packets
    1 209.67.208.177 0.348 ms 0.302 ms 0.307 ms <-- Pod 1 Host
    2 10.1.3.1 0.495 ms 0.450 ms 0.382 ms
    3 216.39.69.49 0.426 ms 0.330 ms 0.407 ms
    4 216.39.64.41 0.488 ms 0.375 ms 0.297 ms
    5 216.39.64.26 0.945 ms 0.663 ms 0.610 ms
    6 216.39.69.238 1.131 ms 1.056 ms 0.970 ms
    7 10.1.4.14 0.933 ms 0.879 ms 0.853 ms
    8 72.36.154.xxx 0.759 ms 0.714 ms 0.669 ms <-- Your Host

    Traceroute to your server from our POD1 network located in the same DC but using a 100% diverse network to your host which is on our POD2 network. You can see that with the -n flag being used there is no errors and shows the IP of the host instead of where you typically see a *.

    Use the -n flag which prevents a nslookup of each IP and you will get the same results.

    You can also test the througput of the connection by downloading a test file via wget / fetch on your host and it should get 1-1.2MB/s for a 10mb/s or 10-12MB/s for a 100mb/s link.

    Thanks,

    Jeremy

  9. #9
    Join Date
    Feb 2006
    Posts
    345
    Jeremy,

    >Once you are inside our network and pass the edge routers we use RFC 1918 IP space which is also blocked on outbound packets so any inbound traceroute will show a * or similar on the first hop in.<

    Thank you HUGE for that reply. We have a ZABBIX install that we are not done with for this server. This has been huge frustrating because the slow-downs happen at random times and do not follow the normal daily traffic curve. When we need to move a large site to a different data center to server our customers, that's a problem. When the slow downs occur, the server load goes down with it. We were just about close on two more high-end servers from you with mirror and fail-over and are getting concerned with what we see.

    I will post back results when I get them,
    Thanks again!

  10. #10
    Join Date
    Feb 2006
    Posts
    345
    Quote Originally Posted by LTADMIN
    The results you are seeing are normal. Your traceroutes are trying to hit devices which do not have reverse DNS records setup for them so you will get a failure there and also do not respond to ICMP packet or treat them with a low priority. Once you are inside our network and pass the edge routers we use RFC 1918 IP space which is also blocked on outbound packets so any inbound traceroute will show a * or similar on the first hop in.
    I understand what you are saying and that these * may be normal. It's just that I don't get them from the other data centers. I will continue to investigate.

    Here is my latest without looking up the IP:
    Trace
    1 1 ms <1 ms <1 ms 10.1.128.1
    2 18 ms 10 ms 8 ms 73.43.source.address
    3 11 ms 17 ms 8 ms 12.244.250.193
    4 26 ms 15 ms 13 ms 12.118.112.9
    5 23 ms 21 ms 25 ms 12.123.139.142
    6 22 ms 22 ms 21 ms 12.122.10.134
    7 30 ms 103 ms 21 ms 12.123.6.37
    8 56 ms 21 ms 23 ms 192.205.33.154
    9 22 ms 21 ms 20 ms 204.70.192.46
    10 45 ms 43 ms 48 ms 204.70.192.98
    11 60 ms 64 ms 53 ms 208.172.129.230
    12 55 ms 53 ms 53 ms 208.172.131.82
    13 61 ms 47 ms 59 ms 216.39.64.59
    14 * * 56 ms 216.39.66.26
    15 49 ms * * 72.36.destination address
    16 48 ms 54 ms 47 ms 72.36.destination address
    Trace complete.

  11. #11
    Join Date
    Jun 2006
    Location
    East Coast // NYC
    Posts
    1,698
    Usually a * means a packet was lost due to no-reply. But if Jeremy says that the routers treat those packets with low importance, it could just ignoring them at times -- which also means that during no load times (3am) the routers should almost always respond to each ping.

  12. #12
    Join Date
    Feb 2006
    Posts
    345
    Quote Originally Posted by MrRadic
    Usually a * means a packet was lost due to no-reply. But if Jeremy says that the routers treat those packets with low importance, it could just ignoring them at times -- which also means that during no load times (3am) the routers should almost always respond to each ping.
    I suspect that will not be the case unless something changes. I will devise a way to feed ZABBIX for both internal and external routing.

  13. #13
    I have the same issue, pings get lost at the border router, although pings to my server itself are NEVER lost

    I guess the border router treats pings as low priority as he said.. ( I am in India so these ping times are normal to US servers)

    1 <1 ms <1 ms <1 ms 192.168.1.1
    2 28 ms 27 ms 25 ms 61.17.201.1
    3 53 ms 36 ms 27 ms 202.54.10.62
    4 32 ms 31 ms 29 ms 203.197.72.150
    5 44 ms 31 ms 31 ms 202.54.2.162
    6 252 ms 231 ms 247 ms 202.54.2.130
    7 366 ms 311 ms 311 ms 204.70.151.29
    8 312 ms 311 ms 311 ms 204.70.193.45
    9 297 ms 295 ms 297 ms 204.70.192.1
    10 296 ms 311 ms 297 ms 204.70.192.33
    11 299 ms 300 ms 309 ms 204.70.192.53
    12 298 ms 309 ms 309 ms 204.70.192.42
    13 302 ms 301 ms 547 ms 216.39.64.59
    14 350 ms 307 ms 299 ms 216.39.66.26
    15 300 ms 299 ms * 208.172.131.82
    16 299 ms 299 ms 301 ms 72.36.my.ip
    Last edited by [MaxX]; 09-14-2006 at 01:23 PM.

  14. #14
    Join Date
    Feb 2006
    Posts
    345
    Quote Originally Posted by [MaxX]
    I have the same issue, pings get lost at the border router, although pings to my server itself are NEVER lost
    I've been doing quite a few tests. Here is a test from midnight last night:
    1 1 ms <1 ms 1 ms 10.1.128.1
    2 12 ms 8 ms 9 ms 73.43.My.IP
    3 10 ms 9 ms 9 ms 12.244.250.193
    4 17 ms 26 ms 13 ms 12.118.112.9
    5 35 ms 22 ms 21 ms 12.123.139.142
    6 24 ms 21 ms 21 ms 12.122.10.134
    7 22 ms 21 ms 21 ms 12.123.6.37
    8 22 ms 21 ms 21 ms 192.205.33.154
    9 * 29 ms * 204.70.192.46
    10 46 ms 46 ms 44 ms 204.70.193.221
    11 55 ms 52 ms 59 ms 204.70.192.94
    12 56 ms 52 ms 63 ms 208.172.131.82
    13 49 ms 59 ms 48 ms 216.39.66.26
    14 * 55 ms * 216.39.66.26
    15 * * 48 ms 72.36.Server.IP
    Note:
    - There are no stars except on the SAVVIS network.
    - Check the last three lines

    I ran a packet sniffer on an FTP session this morning
    17248 My.Server DELL_9400 82 0:05:21.056441 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774573529,W=33304 TCP Slow Segment Recovery (1.040303 seconds from packet 17,184)
    17249 My.Server DELL_9400 82 0:05:21.056716 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774573529,W=31856
    17250 My.Server DELL_9400 70 0:05:21.087577 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W= 9412 TCP Slow Segment Recovery (1.038276 seconds from packet 17,186)
    17251 My.Server DELL_9400 70 0:05:21.087854 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=13508
    17252 My.Server DELL_9400 70 0:05:21.088145 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=21700
    17253 My.Server DELL_9400 70 0:05:21.088400 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=17604
    17254 My.Server DELL_9400 70 0:05:21.088689 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=25796
    17255 My.Server DELL_9400 70 0:05:21.088956 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=29892
    17256 My.Server DELL_9400 70 0:05:21.089223 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774621313,W=33304
    17257 My.Server DELL_9400 82 0:05:21.158471 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774622761,W=33304
    17258 My.Server DELL_9400 90 0:05:21.192125 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774622761,W=33304 TCP Slow Acknowledgement (1.001926 seconds from packet 17,202)
    17259 DELL_9400 My.Server 1518 0:05:21.192300 FTP Data Src= 5006,Dst= 20,.A....,S=2774656065,L= 1448,A=4062253421,W=65535
    17260 My.Server DELL_9400 90 0:05:21.218485 FTP Data Src= 20,Dst= 5006,.A....,S=4062253421,L= 0,A=2774622761,W=33304 TCP Slow Acknowledgement (1.000211 seconds from packet 17,204)
    17261 DELL_9400 My.Server 1506 0:05:21.218647 FTP Data Src= 5006,Dst= 20,.A....,S=2774622761,L= 1436,A=4062253421,W=65535 TCP Retransmission
    17262 DELL_9400 My.Server 82 0:05:21.218712 FTP Data Src= 5006,Dst= 20,.A....,S=2774624197,L= 12,A=4062253421,W=65535 TCP Retransmission
    17263 DELL_9400 My.Server 1506 0:05:21.218770 FTP Data Src= 5006,Dst= 20,.A....,S=2774625657,L= 1436,A=4062253421,W=65535 Non-Responsive Server
    17264 DELL_9400 My.Server 82 0:05:21.218823 FTP Data Src= 5006,Dst= 20,.A....,S=2774627093,L= 12,A=4062253421,W=65535 Non-Responsive Server

    However, the trace was not full of this, and the speed would be limited to up stream speed. However, these kinds of errors I didn't get on the non-SAVVIS server. The download went well with not more than the normal errors.

    You can also test the througput of the connection by downloading a test file via wget / fetch on your host and it should get 1-1.2MB/s for a 10mb/s or 10-12MB/s for a 100mb/s link.
    The download speed of the test file within SAVVIS wasn't that bad. It ran about 10 seconds on average for a 64MB file. It takes about 10 seconds, which indicates about 55%-65% of what it should be. Nothing that would cause slowdowns like we are seeing. This problem appears to the user that it is hung, and then takes off again. Sometimes it's fast, but never for long, and it's slowly getting worse. The server loads are naturally going down because people can't navigate the pages. I'm super glad we move the hotel site out of there. I'm not waiting much longer for the weather site. We've spent untold thousands chasing this problem.

  15. #15
    Join Date
    May 2004
    Location
    Lansing, MI, USA
    Posts
    1,548
    Hey guys, when you're trying to hide your IP, be tactful about it...

    15 * * 48 ms 154.205.My.IP.reverse.layeredtech.com [72.36.Server.IP]

    ... reverse dns records ... default to displaying in reverse. So, you blocked out the first two octets on the reverse resolve, and the last two on the direct ip... and ... one can put one and the other together to get your IP.

    Just figured I would point that out as I'd seen a couple people do it in this thread.
    Jacob - WebOnce Technologies - 30 Day 100% Satisfaction Guarantee - Over 5 Years Going Strong!
    Website Hosting, PHP4&5, RoR, MySQL 5.0, Reseller Hosting, Development, and Designs
    Powered By JAM - Professional Website Development - PHP, MySQL, JavaScript, AJAX - Projects Small & Large

  16. #16
    Join Date
    Feb 2006
    Posts
    345
    Quote Originally Posted by WO-Jacob
    So, you blocked out the first two octets on the reverse resolve, and the last two on the direct ip... and ... one can put one and the other together to get your IP.
    Oh duh!

  17. #17
    Join Date
    Jun 2004
    Location
    Canada
    Posts
    132
    post tcptraceroute's, mtr's and tcpdumps.

  18. #18
    Join Date
    Feb 2006
    Posts
    345
    Quote Originally Posted by comm
    post tcptraceroute's, mtr's and tcpdumps.
    We have the hotel site on a server inside of SAVVIS and outside on a smaller machine. The smaller machine is outperforming the big machine by a lot. On the SAVVIS side sometimes it is lightning and sometimes very slow. That doesn't happen with the non-SAVVIS hotel site. Now that, the hotel site is out of there and working, I will be doing some more testing on the weather site which is still on the SAVVIS network.

    Traceroute on the other one shows no errors, no repeated IP addresses, and 11 hops instead of 16.

  19. #19
    Join Date
    Feb 2006
    Posts
    345
    - I have pkt files from the packet sniffer that show standard HTML pages and PHP pages. Both HTML and PHP have the same problems. Delays for as long as 15 seconds before responding. It happens several times during the page loads. Sometimes you get 4,6,10,12 second delays, but 15 is a popular number, which coincides with the Keep-Alive timeout. The VPS server is also in the same data center and it does not have the problem.

    FTP transfer for a 64MB test zip file:
    - From the SAVVIS dedicated server to my notebook 5:00 minutes avg.
    - From the SAVVIS VPS server to my notebook 1:46 minutes avg.
    - From the Non-SAVVIS server to my notebook 1:50 minutes. Avg.
    *Big difference in FTP performance even from two servers in the same data center.

    - From the SAVVIS dedicated server to the SAVVIS VPS server 1:13
    - From the SAVVIS VPS server to the SAVVIS dedicated server 1:10

    WGET transfer for a 64MB test zip file
    - From the SAVVIS Test Server to dedicated server using wget, 7 seconds..

    With the FTP protocol I don't see large lags that I see with HTML/PHP, but I do get Low Throughput messages from the packet traces. I commented out eAccelerator from the PHP.INI and restarted Apache, but it made no difference, so I put it back.

    Any kind of Apache problem does not explain the wide gap in FTP performance. I have no way of knowing if an in-data center ftp transfer should take that long because I have nothing to compare it with. All that I know is wget takes 7 seconds from the test server.

    Couple interesting traceroutes. No time-outs when done internally, even without the -n option, even on the same router.
    - Laptop to SAVVIS VPS server
    1 <1 ms <1 ms <1 ms 10.1.128.1
    2 10 ms 13 ms 15 ms 73.43.32.1
    3 8 ms 7 ms 19 ms 12.244.250.193
    4 13 ms 17 ms 16 ms 12.118.112.9
    5 22 ms 21 ms 35 ms 12.123.139.142
    6 22 ms 21 ms 23 ms 12.122.10.134
    7 20 ms 23 ms 20 ms 12.123.6.69
    8 24 ms 21 ms 20 ms 208.175.10.93
    9 21 ms 22 ms 24 ms 204.70.192.46
    10 44 ms 51 ms 43 ms 204.70.193.221
    11 59 ms 52 ms 58 ms 204.70.192.94
    12 46 ms 47 ms 53 ms 208.172.131.82
    13 57 ms 60 ms 59 ms 216.39.64.18
    14 * 60 ms * 216.39.69.238
    15 59 ms * 55 ms 72.36.VPS.Server

    - SAVVIS dedicated server to SAVVIS VPS server
    1 153.205.36.72.reverse.layeredtech.com (72.36.205.153) 0.341 ms 0.842 ms 0.491 ms
    2 10.1.5.9 (10.1.5.9) 0.653 ms 0.510 ms 0.406 ms
    3 216.39.79.37 (216.39.79.37) 0.367 ms 0.441 ms 0.333 ms
    4 bhr1-po-1.fortworthda1.savvis.net (216.39.64.33) 0.590 ms 0.862 ms *
    5 216.39.64.18 (216.39.64.18) 1.326 ms 1.129 ms 1.306 ms
    6 216.39.69.238 (216.39.69.238) 1.117 ms 0.689 ms 0.724 ms
    7 10.1.4.14 (10.1.4.14) 0.849 ms 0.640 ms 0.915 ms
    8 www.blurstorm.com (72.36.VPS.Server) 0.969 ms 1.339 ms 1.957 ms

    Summary:
    One cannot make a case for Apache/PHP because of the unrelated abysmal FTP performance. Moreover, the FTP performance when performed internal to the SAVVIS data center is on-par with other servers. There isn't much wrong with the wget times to the test server, which works out to approximately 9MB/sec, which is not far from the 10MB-12MB/sec expected. Thus it isn't server throughput problem. Since the site always has traffic, one might think that it's the traffic. I unloaded Apache and did ftp timings again. Timings are within 8 seconds of each other with the lower time being with Apache loaded. So far, everything still points toward a network problem.

  20. #20
    Join Date
    Feb 2006
    Posts
    345
    Note: When FTP downloading from the SAVVIS VPS server or the non-SAVVIS server, the needle on the and readout on the packet sniffer shows a relatively steady 775 pps +- 70. In the case of the SAVVIS dedicated server, the load fluctuates wildly averaging around 275 with lows of 151 gusting to 450 with occasional peak gusts of 511.

  21. #21
    Join Date
    Feb 2006
    Posts
    345
    Progress:

    "The NOC technicians have updated us that they have reports of errors in the switch-port, and they shall move the server to a different port to sort this out."

    "The NOC team has updated us that they have to move your host to a different slot on their network. But this may require a re-IP of your host to use the new locations port and IP address space. "

    "I have exchanged the switch port on this server... " Also they said that they will move it to another location if they cannot get the throughput necessary where it is on the new port or even move it to another DC if necessary. At least the SAVVIS/Layered Tech/Servstra will kill the problem if you provide them with unrefutable evidence.

    I have tested before and after the switch port change. The difference is astounding. Pages that took up to 54 seconds to load now are difficult to time. Whatever it is, it's less than 2 seconds like our other servers. We used to see multiple 15 second delays, 11 second delays, 9 second delays, 6 second delays, 2 second delays in one page load. We haven't been able to make that happen since yesterday. We will be testing again today. The maximum total of the delays that we see on a page load now is 1.75 seconds, which is what we see out other sites both inside and outside of SAVVIS.

    This experience has been massively expensive torture since the beginning many months ago. I don't understand why they either don't have SNMP monitoring on the switches, or they didn't pay any attention to the errors. There is nothing natural about packet errors between a host and a switch. I assumed that being inside SAVVIS this is taken care of for me, so we spent our time working on the server to it to perform better.

    The bottom line for me is, there is no excuse for what happened. The Servstra/LayeredTech/SAVVIS connection will do what is necessary to fix the problem.

    I appreciate all those who chipped in on this thread to help solve this problem. I'll keep you posted if there are further developments.

  22. #22
    Join Date
    Feb 2006
    Posts
    345
    Quote Originally Posted by IT_Architect
    I'll keep you posted if there are further developments.
    It looks like things are getting back where they were. I have 6 second delays again on static pages. Not as bad as before, but not as good as yesterday. I guess one day in a row is not a track record. NUTS! I may need to move.

  23. #23
    Join Date
    Feb 2006
    Posts
    345
    We're going to call it good enough and close the issue. It's not completely clean, but it is on par with our other servers inside and outside of the SAVVIS DC. At least the sites are serviceable and we may migrate some of them back that we rushed off to VPSs.

    Thanks All!

  24. #24
    Join Date
    Feb 2006
    Posts
    345
    They said they wanted to wait a couple days and test some more before closing the ticket. The sites on the dedicated server are performing very well today. In fact, they are outperforming our VPS sites for the first time, as it should. While this problem should have never happened, I'm satisfied with the persistence and commitment that SAVVIS / LayeredTech / Servstra exhibited to resolve the root cause of the problem. I plan to try a couple more servers with them except this time I will insure that their communications path can handle the traffic before I go live with them.
    Last edited by IT_Architect; 09-21-2006 at 05:29 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •