could you list the downtimes you have already faced and how did you deal with them? could you have prevented them?
- 8 hours downtime after I tried to install watchdog and did something wrong (don't know what)
i didn't care too much on preventing because it was my first month of hosting so i hosted just my sites...
but it costed me an OS reload
- 22 hours downtime because credit card rejected...
could have prevented having 2 cards in 2 differents companies (my 3 mastercard were rejected, but VISA was accepted.. but I took 6 hours to find one)
I'm not sure you'll find many hosts with as much candor as you have, I personnally wouldn't want to air my company's dirty laundry. Using your post as an example why would you let possible customers and competitors know that...
A. you have trouble locking down your server
B. You may not be around around in a month if your visa card is maxed out.
(pardon me if that was overly harsh)
I take the mindset that anything I say on an internet forum can be seen by anybody anywhere anytime for good or bad.
the reason for asking that is to try to improve services and to be aware of things that may happen (for example, i would never imagine my 3 credit cards would be declined) before they really happen
everybody and every company may have problems... the difference is the way you deal with them
at least now i don't play with my server anymore (i stop doing that after i started hosting people paying for the service) and already ordered a VISA card in my name, that I will use only for paying the datacenter
so i won't have those same problems again
btw... now my server has watchdog running (what has nothing to do with security)
Why anyone would use a visa card or credit card of any type and not have the money to pay the bill at the end of the month is beyond me.
But staying on topic, the most downtime I've suffered would probably be around 4 hours because of network trouble. I don't really have downtime on my production server that's related to me playing with the server. I make sure I know what I'm doing when I do it. That's where my system administrator comes in most of the time. Now if we're talking about my "playing around" server, the one I've got at home here's become unbootable many times and I've had to reinstall the OS in some cases, but that's not critical to anything.
Originally posted by Lem0nHead btw... now my server has watchdog running (what has nothing to do with security)
I never can keep that new fangled Linux stuff straight but that actually brings up a corollary to my statement about making information available... what if a customer had done a search about your host (assuming they knew your username was attached to your company) seen the watchdog remark and come to the same conclusion I had. It may turn them off.
Openness is a good thing most of the time but you have to make sure your statements stay within context.
Downtimes happen, it's how you react to them, and how you make sure they don't happen again that your clients will appreciate most. It's also how honest you are with your clients, and how well you communicate it to them. We, like anyone else, have downtime, not a lot..but it happens..all servers have their bad days sooner or later, however, be honest with your clients, and communicate with them well while you are having your problems, and most will understand.
With that said, we of course maintain a 100% open .COMmunity, as well as readily available Alertra reports....don't try to hide your downtime, it won't work.
Your best chance of finding honest information about the cause of a hosts downtime is when when one of them comes to this forum to ask for help when they themselves don't know how to fix it. Happens a lot. If you search a bit I'll bet you'll find some useful stuff.
Feel free to post what you find
A Collection of Web Hosts
Small biographies on hosts, uptime reports and some reviews
Feel free to add your review or add a host that isn't on the list.
Originally posted by 93.3 Why anyone would use a visa card or credit card of any type and not have the money to pay the bill at the end of the month is beyond me.
A lot of people have difficulties paying their credit card bills, especially after Christmas. Certian times of the year do put stress (self imposed) on household cashflows and thus they experience cashflow difficulties.
•AussieHost.com• Aussie Bob, host since 2001 • • Host Multiple Domains on Fast Australian Servers!! •
Originally posted by jbigelow I personnally wouldn't want to air my company's dirty laundry.
I guess that depends exactly what kind of stains are on your laundry . We think being absolutely honest is a good thing (when that honesty isn't too harmful to others) and we keep a detailed public log.
Our worst outage was:
Problem: ethernet switch failure.
Down time: 4 hours, 49 minutes.
Cause: Primary switch to our hosting servers stopped functioning (crashed by buggy SNMP software) and we decided to depend on datacenter staff to do a reset. Unfortunately, the DC 'smart hand' on duty that night wasn't all that smart and couldn't reset the switch... and yet another external tech we sent out had an "unknown equipment problem" and failed to reset the switch as well .
Solution: We gave up on the DC staff and drove in and fixed the switch ourselves.
Prevention: We made sure this (4.75 hours downtime for a 10 minute fix) will never happen again by adding a backup switch the DC staff can patch in without requiring console access to any of the hardware. We also decided not to depend on the DC "smart hands" in the future unless as a last resort... and never on a Sunday night.
Sometimes I wish others would share their school-of-hard-knock stories so I would not have a head that’s so mushy from being knocked around so much.
Our worse case of downtime came from installing a patch by hand when the automated system from the vendor didn’t work. The installation corrupted the operating system (yes, you can corrupt the RedHat Linux operating system; I’m told it is extremely rare, but I can verify it can happen).
The result was close to 10 hours of down time as we had to get technicians near the data center to reload parts of the operating system. Within three weeks, we migrated to a new machine.
I’m not sure how it could have been prevented. It is easy to look back and say go back to the vendor to have them do it. But the same steps where done on a similar configured server at the same time with no down time. Life happens.
After a couple of years in business, we've had a lot of downtime accross the servers. I think the most was a massive DDOS attack where the server was offline for 6 hours+, and then there was some incidents of IPs going missing from a server. That was fun. Then you have cpu/case fans blowing every now and then. That's about it and all the gorey details are listed in the HTTPme server announcement forums. They do make for some good reading.
•AussieHost.com• Aussie Bob, host since 2001 • • Host Multiple Domains on Fast Australian Servers!! •
Originally posted by SROHost I guess that depends exactly what kind of stains are on your laundry . We think being absolutely honest is a good thing (when that honesty isn't too harmful to others) and we keep a detailed public log.
So it looks like I'm in the minority but I will point out that the reasons listed here are usually reasons outside of the hosts control...
t"he colo wouldn't accept Mastercard...."
"script kiddies DDoS'd us...."
"Cisco's crappy hardware failed...."
Will we see anybody listing a reason such as ....
"Alan the Admin got wasted on Tequila the other night and took a leak on the edge routers causing about 10 hours of downtime."
Our longest period of downtime occurred back during the NAC fire. I think that was like 18 months ago or somewhere around then. We were down for about 30-40 minutes if I recall correctly. Our clients were quite understanding considering the circumstances. We kept in constant contact with them throughout the whole ordeal. That fire was preceded by a power failure some months earlier which had also caused quite a bit of downtime, although I honestly forget how much time that was. NAC had an undersized generator at the time (something they have since rectified) and there was quite a bit of explaining to do about the downtime.
We moved to ServInt after that and it's been smooth sailing ever since...
Originally posted by Watcher_TVI We moved to ServInt after that and it's been smooth sailing ever since...
If you consider the 40 minute NAC fire outage bad...surely you didn't completely forget about the 30 minute servint outage back in feburary? There was also 2 outages back in December....
Point being...all providers have downtime at some point or another...as you said, communicate well with your customers, be upfront, and do as much as you can to prevent it from happening again, and most customers will understand.
(Note: I am a paying customer of ServInt as well, in no way is this a negative post about ServInt. I am also a very pleased paying customer of NAC.)
I wasn't down for 30 minutes with ServInt, I was down for 9 minutes during that incident as recorded by Alertra on 5 minute intervals. The other 2 issues you reference did not affect everyone, just a few people that were located overseas from what I saw. In almost a year at ServInt we have had 4 outages totaling 0 hrs, 24 mins, 2 secs which results in 99.983% uptime. That also includes 4 minutes and 39 seconds of downtime from a site update on our hosting site and was not reflective of the uptime of that server.
I am not knocking NAC either, they addressed the issues they needed to when they ran into problems. This wasn't really about what DC has the best uptime, it's how you handle the downtime.
How we face the issue of unannounced downtime is by trying to eliminate as much downtime as possible by choosing the DC we feel can provide the most reliable network. Then we communicate with our clients openly and often when any downtime does occur.
Just stating..everyone has their downtimes...I agree, it's how you handle it, Servint has also done a great bit to make their network stronger, but you can not deny that they didn't have their issues too.
Both datacenters have done very good to improve on their weaknesses.