Page 1 of 2 12 LastLast
Results 1 to 25 of 45
  1. #1
    Join Date
    Oct 2006
    Location
    Montreal, QC, Canada
    Posts
    139

    Your worst nightmare while colocating

    Hi,

    I justed started colocating my own box. I was wondering what is the worst situation that you faced while colocating. Did 2 hard drives in a RAID5 configuration broke at the same time? Did all your hardware and hard drives burned due to electrical problems? I want to know what is the most difficult situations that I can face.

  2. #2
    Join Date
    Dec 2004
    Location
    New York, NY
    Posts
    10,710
    Well, the question is, how far are you from the data center?

  3. #3
    Box fried at HE.

    Before I knew i submitted a ticket to reboot - guess what, they claim i'm not on the ACL (this has happened more than 3 times before). Then, once they finally figure it out it takes them an hour to respond to a simple request (they're all just playing games in the NOC anyways).

    Not sure if HE power fried the box or the PSU just died. HE
    Silicon Valley Web Hosting - Bay Area Bare-Metal and 1U to Full Rack Colocation

    www.svwh.net

  4. #4
    Join Date
    Oct 2004
    Location
    Nevada
    Posts
    887
    This is going to be a fun thread....

    1. 130F in the cabinet at night. They were turning off the HVAC to save money at night.

    2. Turned out that colo had a UPS with 15 minutes of batteries, but the backup generator was off-site, on a trailer. And the battery was dead so the generator would not start. Maintenance had not been done in months. By the time they got the trailer to the colo, and managed to start the generator, hours later, the power was already back on.

    3. FM200 system hooked up, everything 'looked good'. No FM200 chemical in the tank; the colo couldnt afford to fill it.

  5. #5
    Join Date
    Jan 2001
    Location
    Flagstaff
    Posts
    127
    Two words: Charles Baker

  6. #6
    Join Date
    Oct 2006
    Location
    Montreal, QC, Canada
    Posts
    139
    how far are you from the data center?
    I am 15 miles away. However, I think I am paranoid. I have a second server at home with the exact same spec. I keep a complete offsite backup of data and a system image of my server. Should my box explode at the DC, I could have my second server online with almost no data loss within 1-2 hours.

    Now I am thinking what would happen if my server would fry at the DC while my server at home is stolen at the same time...

  7. #7
    Join Date
    Jun 2002
    Location
    PA, USA
    Posts
    5,143
    Just one word advise when choosing colocation: location. Find one that's closest to you.

    We have many RAID5 servers, and have yet to have two drives failing at the same time (knock on wood). You can keep spare parts (or second server) for your primary server. But, I don't think there is an easy way to replace a failed RAID controller. RAID controller is the most non-redundant part in any redundant server one can build. If the RAID controller fails, there is no easy way to rebuild the whole array.

    Quote Originally Posted by Jeffyt
    Two words: Charles Baker
    Not to many people will remember Mr. Baker. So it's a surprise to hear you mention this name. I would agree with you, for my personal experience dealing with him (non-colocation related).
    Fluid Hosting, LLC - Enterprise Cloud Infrastructure: Cloud Shared and Reseller, Cloud VPS, and Cloud Hybrid Server

  8. #8
    Join Date
    Oct 2006
    Location
    Montreal, QC, Canada
    Posts
    139
    If the RAID controller fails, there is no easy way to rebuild the whole array.
    If you have a similar controller, you don't need to rebuild the array. You just plug it in and it will detect the old array. I never had to replace a raid controller but that what tomshardware say.

  9. #9
    Join Date
    Aug 2006
    Location
    Ashburn VA, San Diego CA
    Posts
    4,615
    Back when I was first piddling with Colo, I noticed my box down one morning... after several frantic calls and emails, I finally got a canned response -- sorry, we're out of business. When can you pick up your box?
    Fast Serv Networks, LLC | AS29889 | DDOS Protected | Managed Cloud, Streaming, Dedicated Servers, Colo by-the-U
    Since 2003 - Ashburn VA + San Diego CA Datacenters

  10. #10

    How do you image?

    Quote Originally Posted by WebDevourer
    I am 15 miles away. However, I think I am paranoid. I have a second server at home with the exact same spec. I keep a complete offsite backup of data and a system image of my server. Should my box explode at the DC, I could have my second server online with almost no data loss within 1-2 hours.

    Now I am thinking what would happen if my server would fry at the DC while my server at home is stolen at the same time...
    Any particular software you use to keep an offsite backup and image?

    I've heard that the top hardware issues with servers are:
    1. Drive failure
    2. Power supply failure
    3. Fan failure
    Thanks.

  11. #11
    Join Date
    Jun 2002
    Location
    PA, USA
    Posts
    5,143
    Quote Originally Posted by WebDevourer
    If you have a similar controller, you don't need to rebuild the array. You just plug it in and it will detect the old array. I never had to replace a raid controller but that what tomshardware say.
    Hm, I am not sure about that. I am not sure what Tom tested. But on our RAID controllers (Dell PERC), the RAID information is stored on the NVRAM on the raid controller.
    Fluid Hosting, LLC - Enterprise Cloud Infrastructure: Cloud Shared and Reseller, Cloud VPS, and Cloud Hybrid Server

  12. #12
    Join Date
    Oct 2006
    Location
    Montreal, QC, Canada
    Posts
    139
    the RAID information is stored on the NVRAM on the raid controller
    If raid info is stored on the card and the card fry, the array is gone for good. I think Promise and 3ware store raid info on the hard disks, that way enabling you to survive a raid card crash.

    Any particular software you use to keep an offsite backup and image?
    I use Acronis True Image to make a OS image and Handy Backup to transfer this image every night by ftp.

  13. #13
    Join Date
    May 2004
    Location
    Atlanta, GA
    Posts
    3,872
    Quote Originally Posted by WebDevourer
    If raid info is stored on the card and the card fry, the array is gone for good. I think Promise and 3ware store raid info on the hard disks, that way enabling you to survive a raid card crash...
    I will 2nd the comment. array configuration created by 3ware or Areca is stored on all member drives. if you have to change RAID card, the array will boot right up, no data loss nor re-sync is required.

    Adaptec is a another story: changing RAID card, even with same firmware version, will still strigger re-sync process. if the new adaptec card comes with firmware not identical to old one, YES, there is a great risk that array will be lost.

    we did see some instances that 2nd array member failed (the darn Seagate desktop drives!) before RAID-5 re-sync was completed. nowadays, we just refuse to use any non WD raid edition SATA drives on large scale array.
    C.W. LEE, Apaq Digital Systems
    http://www.apaqdigital.com
    sales@apaqdigital.com

  14. #14
    Join Date
    Nov 2005
    Posts
    3,944
    got attacked multiple times and shot my 95th percentile up so I was said to owe the colo company an extra $2,600...that was my worst experience...luckily my colo company was understanding.

  15. #15
    Join Date
    Dec 2004
    Location
    San Francisco, CA
    Posts
    1,912
    Worst nightmare? Being asked by your bandwidth provider to stop using their IPs (2000 of them, almost 2 years ago) and giving you a 5 day notice to switch.
    That was a rough week And the lowest point for me in the 4 years in this business. Whenever I have a new issue on hand, I always look back at this incident and say the present situation can't be that bad

  16. #16
    Join Date
    Mar 2004
    Location
    Singapore
    Posts
    6,990
    Mine was entire datacentre power failure. It happen not only once but twice at the two datacentres I was in. Both total power failure within a month of each other. I was suppose to be attending my MBA exams. That was a nightmare!

  17. #17
    Join Date
    Apr 2003
    Location
    Bluesquare dc, Uk
    Posts
    1,591
    Quote Originally Posted by WebDevourer
    I am 15 miles away. However, I think I am paranoid. I have a second server at home with the exact same spec. I keep a complete offsite backup of data and a system image of my server. Should my box explode at the DC, I could have my second server online with almost no data loss within 1-2 hours.

    Now I am thinking what would happen if my server would fry at the DC while my server at home is stolen at the same time...
    In the past we've suffered power trips in racks. That's really really fun...particularly when it happens again an hour later. You learn from these things

    I've had a server with 500 hosted domains have it's motherboard fry at 2am, new year's day. That one had me almost in tears. I couldn't believe my luck.

    Personally, I'm like you, absolutely paranoid. That's why I'm sat in an onsite office with out suite downstairs If you know you can get to the kit fast, and you have backup plans for every possible outcome, try and relax a little. You really are *much* better off than 99% of people who have never seen the servers they use.

    But actually if you are 15 mins from the DC, chill. There are people I know who sublease racks / colo / dedicated servers / transit and they are hours away from their facility. They don't seem to sweat.

    What I might advise is if what you are doing is mission critical, have an identical hot swappable server on hand. You never know when you might need it.
    Olly | INX-Gaming
    Call of Duty 4 hosting

  18. #18
    Join Date
    Jun 2003
    Location
    Las Vegas, NV
    Posts
    858
    I would definitely have to say that my worst colo experience was trying to negotiate a contract extension on our cabinets, and having the sales manager tell me that they were going to charge us nearly double the price for less cabinets, when we weren't exactly paying "bargain basement" prices to begin with. When I said we would have to move due to the massive price increase, the sales manager said something along the lines of "we don't really like dealing with you smaller customers anyway. We'll just resell your space for double what you are paying to some other customer". We moved, as did a number of other tenants from what I understand, and the sales manager was fired 6 months later. Go figure.

    On the upside, we completed our move and love our new facility, so I suppose there is always a silver lining to those dark clouds that come rolling in from time to time.

    We also had our share of "learning experiences" - tripped breakers, fried HDD's, and once our primary ethernet drop was accidentally pulled out of the switch by someone working in our cabinet. Just a few of the many reasons I now live less than 5 minutes from our facility
    Rob Tyree
    Versaweb - DDoS Protected Cloud and Dedicated Server Hosting
    Fiberhub - Affordable Colocation Services in Las Vegas, Dallas, Miami, and Seattle

  19. #19
    Join Date
    Oct 2006
    Location
    Montreal, QC, Canada
    Posts
    139
    Well, according to your real life experience, your worst nightmare is either human stupidity or bad luck.

    In human stupidity we now have:

    - NOC claims customer is not on ACL.
    - 130F in the cabinet at night because they were turning off the HVAC to save money.
    - UPS with 15 minutes backup.
    - Power generator off-site on a trailer.
    - No FM200 chemical in the tank; the colo couldnt afford to fill it.
    - Bandwidth provider ask customer to stop using 2000 IPs with 5 days notice.
    - Sales manager tells the customer they will charge the double for less space.

    Bad luck:

    - DC out of business.
    - 2 drives crash in a RAID5 array.
    - Got attacked and paid 2600$ in bandwidth fees.
    - Two datacenter power failures within the same month while doing exams.
    - Motherboard fry at 2am, new year's day.

    I'm still laughing when I think about this power generator on a trailer. Oh God.

  20. #20
    Join Date
    Aug 2002
    Location
    Seattle
    Posts
    5,525
    I won't drop names but i'll say "rapid price increases to encourage vacancies for most profitable customers" as a general theme on a couple of occasions.

  21. #21
    Join Date
    Mar 2005
    Posts
    246
    Let me think.

    Just off of the top of my head:

    • Having datacenter techs move machines to another cabinet without notification, and then breaking a machine in transit.
    • Switch hardware failure bringing the network offline.
    • Various hardware failures, drive failures, memory failures, etc.
    ColoCrossing - Connecting Business
    Alex Vial | avial@colocrossing.com | 1.800.518.9716

    Enterprise-Class Colo & Dedicated Servers in BUF, CHI, DFW, NYC, SJC, ATL & SEA

  22. #22
    Join Date
    Apr 2004
    Location
    Los Angeles, CA
    Posts
    168
    Back before we started doing colo ourselves, getting a phone call that the building transformer has exploded, and oh by the way, the Generator we have on the roof to supply backup power "Hasn't Been Finished Yet" (Even though the contract stipulated that the building was supplying backup power in the lease).

    8 hours for DWP to hookup a temporary generator (That sat in the street), and over a week for the building to replace the one that exploded.
    Jay Smith - Evocative Data Centers
    http://www.evocative.com
    (888)365-2656 - sales@evocative.com
    Los Angeles - San Jose - Emeryville - Phoenix - Dallas - Northern Virginia

  23. #23
    Join Date
    Nov 2002
    Location
    WebHostingTalk
    Posts
    8,901
    The worst? Had a rack in a data center a few years ago... a US federal law enforcement agency served a warrant and the data center gave them 4 of my servers, thinking they were the servers they were looking for.

    Luckily, we got the down alerts pretty quick and got someone to the datacenter quickly enough to prove the DC had turned over the wrong servers and that we were in no way involved with the warrant.

    Sirius
    I support the Human Rights Campaign!
    Moving to the Tampa, Florida area? Check out life in the suburbs in Trinity, Florida.

  24. #24
    Join Date
    Oct 2006
    Location
    Montreal, QC, Canada
    Posts
    139
    and then breaking a machine in transit.
    Did they refund your server?

  25. #25
    Worst nightmare I had while colo'ing had to do with the staff I hired... I am located in canada, this vietnamese tech worker I hired lived in Cali... We needed a staff member down there to work on our servers colocated in Cali. At the interview he spoke fluent english and seemed very knowledgable... but everytime I called him on his phone to do work, he could barely speak english, sounded like he had 5 kids in his car screaming, didn't know anything about linux...

    Since that colo facility was also costing me big bucks and I couldn't rely on my own staff, I had to can the whole operation and move it up to canada. Saved me a ton of headaches... sadly moving the operation up to canada has limited price competitiveness but life has been so much easier these past 5/6 years.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •