shaun
09-26-2000, 05:28 PM
This post is prompted by the spate of recent outages at Alabanza. There are two main points to my post, which are concerned with:
1. The hardware setup at Alabanza
2. The software setup at Alabanza
First the hardware side of things. Let me quote first of all from the Alabanza website http://www.alabanza.com
Alabanza recognizes that downtime is not an option for your servers. That's why we invest heavily in hardware and facilities that ensure that your clients are up and running on the web 24 hours a day, seven days a week.....(then follows lots of descriptions of different redundancies built in to their system)
now I quote from a different thread on this messageboard:
Greetings,
I would just like to clear up the issue of the network outage that was experienced on the evening of 9/19/00 and again for a short time in the afternoon of 9/20/00. On the evening of the 19th one of our primary routers failed. This router is connected directly to a backbone. This failure caused us to lose about 50% of traffic. The outage occurred between 11:00 PM and 2:00 AM Eastern time at which time it was resolved. On the afternoon of the 20th we experienced a similar outage, but this lasted only approximately 5 minutes. We will be replacing the problematic router and will continue to monitor both routers very closely for any signs of problems over the next several days.
This problem only effected around 50% of the connections so many people did not experience this problem however those people who did experience problems may have experienced anything from an unusually slow connection to no connection at all.
This problem was hardware related and should be completely resolved when the faulty hardware is replaced.
We appreciate everyone's patience and understanding as we work to resolve this issue.
Sincerely,
Alabanza_Dan
Alabanza Technical Support
The second quote suggests that some of Alabanza's hardware setup is not redundant. Why is this? Since a system like this is only as good as its weakest point, if Alabanza has one (or more) weak points in its hardware setup, then all the other measures are rendered useless when these weakest points fail. As interested parties, we have to trust Alabanza to hire technicians who are good enough to analyze their entire network and eliminate their weaknesses. Should our trust be diminished as a result of these recent incidents and the above explanation offered? (Disclaimer - I'm not an expert in these matters. How much does the offending router cost? Alabanza obviously didn't have a spare router on standby, but if they had, how long would it have taken to swap out the faulty one and replace it with a good one? Maybe there is just no way to avoid an outage such as the one described above. My personal guess is that avoiding such outages are a matter of system design - which comes down to paying lots of money for really good people to analyze/design the systems, and paying lots of money for backup/redundant equipment.)
Now on to the software setup at Alabanza.
It seems like a very good idea that Alabanza machines allow only ssh, and not telnet. The point is that with telnet, the password is sent as plain text over the internet - if someone else were to intercept the password they could then log in to your machine and do very bad things to it (including affecting the entire machine and all the other sites hosted on it). Whereas with ssh, the password is sent over the internet in an encrypted form, so that even if it is intercepted, it can't be used by a third party to log in to the machine.
However, such a system is only as good as its weakest link. And in this case, in the default Alabanza setup there is a very weak link in the ssh process. I'm not going to go into details here, since I don't want to expose any machines to attacks. (From my limited knowledge, I believe the weak link exists in the default setup from Alabanza, but I know that some hosts have changed this default so that their system is not exposed to this particular weakness. However there is a general design/security flaw which I believe every host is affected by).
The fact that an entire system like this might be set-up with the appearance of safety, but with weaknesses that negate these safety measures, makes me wonder about the management and design processes that led to the software and policies that are currently in place. As end-users, we are forced to trust Alabanza to have paid good people to come up with a combination of software and policies which have a high level of safety. Is our trust warranted?
shaun
1. The hardware setup at Alabanza
2. The software setup at Alabanza
First the hardware side of things. Let me quote first of all from the Alabanza website http://www.alabanza.com
Alabanza recognizes that downtime is not an option for your servers. That's why we invest heavily in hardware and facilities that ensure that your clients are up and running on the web 24 hours a day, seven days a week.....(then follows lots of descriptions of different redundancies built in to their system)
now I quote from a different thread on this messageboard:
Greetings,
I would just like to clear up the issue of the network outage that was experienced on the evening of 9/19/00 and again for a short time in the afternoon of 9/20/00. On the evening of the 19th one of our primary routers failed. This router is connected directly to a backbone. This failure caused us to lose about 50% of traffic. The outage occurred between 11:00 PM and 2:00 AM Eastern time at which time it was resolved. On the afternoon of the 20th we experienced a similar outage, but this lasted only approximately 5 minutes. We will be replacing the problematic router and will continue to monitor both routers very closely for any signs of problems over the next several days.
This problem only effected around 50% of the connections so many people did not experience this problem however those people who did experience problems may have experienced anything from an unusually slow connection to no connection at all.
This problem was hardware related and should be completely resolved when the faulty hardware is replaced.
We appreciate everyone's patience and understanding as we work to resolve this issue.
Sincerely,
Alabanza_Dan
Alabanza Technical Support
The second quote suggests that some of Alabanza's hardware setup is not redundant. Why is this? Since a system like this is only as good as its weakest point, if Alabanza has one (or more) weak points in its hardware setup, then all the other measures are rendered useless when these weakest points fail. As interested parties, we have to trust Alabanza to hire technicians who are good enough to analyze their entire network and eliminate their weaknesses. Should our trust be diminished as a result of these recent incidents and the above explanation offered? (Disclaimer - I'm not an expert in these matters. How much does the offending router cost? Alabanza obviously didn't have a spare router on standby, but if they had, how long would it have taken to swap out the faulty one and replace it with a good one? Maybe there is just no way to avoid an outage such as the one described above. My personal guess is that avoiding such outages are a matter of system design - which comes down to paying lots of money for really good people to analyze/design the systems, and paying lots of money for backup/redundant equipment.)
Now on to the software setup at Alabanza.
It seems like a very good idea that Alabanza machines allow only ssh, and not telnet. The point is that with telnet, the password is sent as plain text over the internet - if someone else were to intercept the password they could then log in to your machine and do very bad things to it (including affecting the entire machine and all the other sites hosted on it). Whereas with ssh, the password is sent over the internet in an encrypted form, so that even if it is intercepted, it can't be used by a third party to log in to the machine.
However, such a system is only as good as its weakest link. And in this case, in the default Alabanza setup there is a very weak link in the ssh process. I'm not going to go into details here, since I don't want to expose any machines to attacks. (From my limited knowledge, I believe the weak link exists in the default setup from Alabanza, but I know that some hosts have changed this default so that their system is not exposed to this particular weakness. However there is a general design/security flaw which I believe every host is affected by).
The fact that an entire system like this might be set-up with the appearance of safety, but with weaknesses that negate these safety measures, makes me wonder about the management and design processes that led to the software and policies that are currently in place. As end-users, we are forced to trust Alabanza to have paid good people to come up with a combination of software and policies which have a high level of safety. Is our trust warranted?
shaun
