Results 1 to 17 of 17
  1. #1
    Join Date
    May 2005
    Location
    Toronto / New York
    Posts
    163

    Thumbs down GigeNet Cloud Filesystem Corrupted Beyond Repair

    Hello.

    My VM went down last night after some maintenance and never came back up. I found out for myself this morning, since my inbox was full of my monitoring alerts. Apparently the filesystem on my VM is corrupted beyond repair. They are saying it is due to journal overruns which caused a corrupted journal and system binaries. Now I need to spin up a new VM, build it out, and restore backups.

    Is anyone else having this issue?

    This is easily the 7th outage I've had with GigeNet since joining with them back in June 2010.

  2. #2
    Join Date
    Mar 2005
    Location
    Chicago
    Posts
    60
    I want you to know that I will work with you personally and figure out what happened to ensure it doesn't happen again. During our growing and advancing we have had several problems with legacy storage devices but we keep pushing forward and create new VM's on the new technology and we have migrated you in the past most likely (thus downtime).


    I want you to know we are doing everything we possibly can to send alerts out when you run out of RAM and STORAGE so that things like this don't happen. In the new version of our portal which is set to release shortly we plan on pushing out alerts to clients letting them know when they are using all their RAM on their vm's because its easy to create a 512MB VM and run out of memory without even thinking about it.

    Please contact me directly and I'll take good care of you [email protected]

  3. #3
    Join Date
    Mar 2005
    Location
    Chicago
    Posts
    60
    I want to update you.

    We are uncertain to the exact cause but basically what happened is the journal got overran on your server and the contents didn't flush out to the disk. There was no data loss only loss of the journal. The data is still there but there is nothing to map to it.

    This was isolated to SPECIFIC versions of CentOS 5 and did not effect other distribution or Windows AT ALL.

    For now we have come to the conclusion that updating CentOS to a newer kernel and version should prevent this from happening again. Through our observations and that alone we have come to this conclusion. We are also working on recovery options for you as we speak so please work with us in the ticket system.

  4. #4
    Join Date
    Mar 2011
    Location
    Dallas, Texas
    Posts
    354
    Work with Chris, he seems that he cares and is working hard to help you and make sure everything is working good and correctly.

    Good luck and keep us posted, for reasons like this thats really no ones fault in the end, is why backups are a excellent thing to have and updated. I always recommend even if you purchase a backup service that you always have your own personal back up also as you could never know what happens now days.
    FusionNET Solutions - US/UK Locations | Adult/IRC Allowed! | DDoS Protected Networks!
    Fusion Powered: Web Hosting - Virtual Servers - Dedicated Servers | Native IPv6 Available!
    202-505-HOST | LiteSpeed Support! | Live Web Chat!

  5. #5
    Join Date
    May 2005
    Location
    Toronto / New York
    Posts
    163
    Chris,

    Thanks for your candid answer regarding the failure. I hope you guys will come out of this stronger and more stable, and that the other users that were impacted are able to get their backup data successfully. Unfortunately for me, that hasn't been the case.

    Was hoping to get an answer to my question in the ticket because I wanted to rebuild my server tonight, but after waiting 3 hours I'm getting impatient. Usually you guys respond pretty quick, and I would have called but it's late and I'm headed to bed.

    I appreciate your support, and I have spoken highly of you guys on WHT for your services, but at this point I'm just frustrated with the constant problems and outages I have experienced with Gigenet Cloud. On top of that I'm easily paying $9-10 more per month for my VM after your recent price hikes. I'm not looking forward to the daunting task of rebuilding my machine again. This isn't my full time job, and I wasn't planning to spend hours working on my server today/tomorrow. Yes it shouldn't take long, but honestly I'm sitting here thinking if I need to reinstall everything anyway, why not just do it with another provider.

    Please get back to me regarding my question in the ticket regarding the age of the restored user data, and missing databases.

    Thanks.

  6. #6
    Join Date
    Mar 2005
    Location
    Chicago
    Posts
    60

  7. #7
    Same frustrating experience here. My virtual machine got corrupted after the maintenance.
    I have been submiting tickets back and forth for the past 24 hours and finally got an answer that my database can't be recovered at all.

    Unfortunately I had no backups and my business depends on it...

    I must also say that it's the third time I get corrupted databases after their maintenances. Fortunately the first two times all got resolved after a quick repair , but this time not.

    Chris was the only guy that really tried to help me , but still couldn't recover the needed db files.

    The other support guys are really rude and not helpful.
    One of them even closed my ticket and said there was nothing they can do.
    Like it was my fault they messed up the maintenance.
    And the best thing is they even banned me from their forum so I wouldn't post my negative experience.
    Talk about customer support!

    Anyway , at this point I will just try to recover the other files and move to a more reliable web host.

    Any recommendations?

  8. #8
    Join Date
    Mar 2005
    Location
    Chicago
    Posts
    60
    I'm not familiar with your case but that doesn't sound like something we would do. Our forums ban me all the time because they are broken to some extent and we are fixing them when we migrate to a new forum backend.

    It is unfortunate but if your journal was over run thats what will happen. The data is still there but the journal is completely useless so there is nothing to map to that data.

    Like I said before this was isolated to a very few kernel versions / and seems to only effect CentOS 5.5 and below.

  9. #9
    Join Date
    May 2005
    Location
    Toronto / New York
    Posts
    163
    Quote Originally Posted by bbrock32 View Post
    One of them even closed my ticket and said there was nothing they can do. Like it was my fault they messed up the maintenance.
    ^This^. So annoying that they CLOSE my tickets mid-conversation like that, and I have to re-open it. But ya, it's just an annoyance, not a huge deal, and I'm guessing that some of their techs are just not well trained with the ticket system.

    Overall their customer support has been great over the past year. Namely, the two Chris' have always been awesome.

    Chris, the last message I got from Erik pretty much said "it is what it is" - there has been data loss and no way of recovering it now since my R1soft backups started failing a few days ago.

    And yes I already HAVE R1soft backups for free, you guys already gave me it as a freebie due to a previous severe outage.

  10. #10
    I've been a customer for more than a year and there is no reason I should lie about my situation.

    If you want to check , my ticked id is 5090 .

    Also , the fact that it was isolated to a few kernel versions makes no difference to me. I lost my database and my clients with it.

    Anyway , I can understand that things like this can happen , but the treatment I got from the support ( except Chris which was really helpful) wasn't appropriate knowing they had screwed my virtual machine.

  11. #11
    Join Date
    Mar 2005
    Location
    Chicago
    Posts
    60
    I don't want to play this down because you do have a valid concern, both of you. There are some things we can definitely do better and we are striving towards that. I think the biggest problem is the fact that VM's can run out of memory or storage without really being noticed by the customer.

    I want to fix both of your problems but the problem is, at least from what I understand about bbrock32's problem, unfixable. Erik was just trying to be straight up and honest and not string you along for the next few days. Backups are extremely important. I have had the same thing happen to me with certain boxes I have run over the years and believe me I would be EXTREMELY MAD in your situation and frustrated.

    Please understand I would be glad to extend you as much help as I can muster and also, if you could msg me privately, I will help you out as much as I can in any way. [email protected]

  12. #12
    Join Date
    May 2005
    Location
    Toronto / New York
    Posts
    163
    Chris, good talking to you. Thanks for helping with my concerns. You guys do offer a great service, and I hope things stabilize and you guys continue to do well. Best of luck.

  13. #13
    Mine lost connection last night, no idea when it came back on. Kind of bummed out, I havent been with them 30 days and so far two downtimes. I only use it to run a small trading platform but I need it to be always on/connected and running windows.

    Not sure where to try next.

  14. #14
    Join Date
    Nov 2002
    Location
    Chicago IL
    Posts
    885
    Quote Originally Posted by Kajun View Post
    Mine lost connection last night, no idea when it came back on. Kind of bummed out, I havent been with them 30 days and so far two downtimes. I only use it to run a small trading platform but I need it to be always on/connected and running windows.

    Not sure where to try next.
    kajun,

    Contact Chris, I am sure we can find a solution for you, this is not something of the norms and seems to be an issue with CentOS and its use of archaic kernels/drivers. You can also PM me or drop me an email ameen @ gigenet (.) com

    I will gladly help sort out this situation and do whatever it takes to regain optimal uptime.
    GigeNET
    Dedicated Servers + Cloud Servers + Colocation + DDOS Protection + IP Transit with FCP optimized routing
    Locations in Chicago Los Angeles and Ashburn

  15. #15
    Join Date
    Mar 2011
    Posts
    136
    Quote Originally Posted by chrisarmer View Post
    I think the biggest problem is the fact that VM's can run out of memory or storage without really being noticed by the customer.
    You've said this several times now, but what exactly does that have to do with Gigenet corrupting both these clients' data? Neither of these clients mentioned anything at all about HD space or ram troubles. The biggest problem in their worlds right now is that your company hosed their VM.
    Respectfully,
    Phillip

  16. #16
    Join Date
    Sep 2003
    Location
    Chicago, IL
    Posts
    164
    Quote Originally Posted by phil29 View Post
    You've said this several times now, but what exactly does that have to do with Gigenet corrupting both these clients' data? Neither of these clients mentioned anything at all about HD space or ram troubles. The biggest problem in their worlds right now is that your company hosed their VM.
    The sad fact is, the data is gone. Its impossible to get back now. We can think of ways to prevent it in the future and none of the issues were wide spread. The client operating system and the client filesystem obviously (to us now) have an incompatibility with the block drivers.

    The system should have went read-only and forced a reboot because it was not able to commit the journal. But, for some reason the kernel decided to block the journal task which prevented it from doing normal crash operations. This also continued to let data be written to the disk, which caused eventual filesystem corruption. This very same thing happens when a box runs out of memory and the kernel drops "dirty pages" which happen to be journal information.

    What I have narrowed this down to is xenblk drivers vs the 2.6.18 kernel build that ships with CentOS. Couple that with ext3 and its almost a perfect storm. A change in any part of the scenario would have prevented this: newer kernel, different file system, no xenblk drivers. It took all three to allow this to happen.

    Now what may be on the table here is what caused the problem in the first place. That came from a SAN that failed a raid card. The system failed over as expected and 98% of the clients on that SAN continued on business as usual. This certain software combination with a large busy disk is the only thing that proved to be a problem. For some reason the xenblk drivers threw an error which apparently the kernel didnt know how to deal with. So instead of the usual crash operations that go on to prevent data loss. The system continued to write its way into a black whole.

    For the other few people who were affected by this we have done everything we can to assist in recovering data as best as possible and our techs have been all hands on deck helping. As much as we would like to we would love to have up to the minute backups of our entire cloud but this isnt physically possible especially not free of charge. We offer several different methods of backup and will gladly place customer VMs on opposing SANs upon request.

    As unfortunate as this is customers must take backups seriously and data redundancy seriously. At the provider level, we do everything we can to prevent this, redundant RAID, Network, Hard Drives, Servers but we still cant prevent a file system from corrupting itself. I have seen this happen on boxes that didnt even experience any type of hardware failure. EXT3 has been known to corrupt itself on large data-sets. If I were running a server with a large amount of data on it I would go XFS,JFS,ResierFS,NTFS anything extent based even EXT4 or at the very least use a different operating system. Windows seems to handle IO errors the best out of all.

    We will continue to help anyone we can. This should be a shout out to everyone to give R1Soft a shot. Its an extremely good idea to have CDP. Makes it much easier to rest at night. I can say from experience I work at a datacenter and still back up all of my data to a box at my house. It cant hurt!
    Last edited by Winstyn; 04-29-2011 at 12:43 AM.
    eSited LLC - Dedicated Servers, VPS, Managed Hosting
    Nullivex LLC - Web Services, PHP Development, System administration.
    █ Visit http://www.esited.com/ or Email contact[at]nullivex.com

  17. #17
    Join Date
    Mar 2005
    Location
    Chicago
    Posts
    60
    That exactly the problem. We don't have notifications in place to let us know when this happens. By the time the machine is paused/rebooted its too late.

    I'll update with further details as soon as I get them but we are more focused on these particular issues that have come up and after that we are going to do a large scale investigation in to the exact root cause if we can duplicate it in our test environment.

Similar Threads

  1. .tar.gz header corrupted. anyway to repair?
    By WCHost in forum Hosting Security and Technology
    Replies: 6
    Last Post: 04-07-2012, 06:19 AM
  2. Replies: 7
    Last Post: 08-26-2011, 06:28 PM
  3. Anyone use gigenet cloud?
    By Ryan524 in forum Cloud Hosting
    Replies: 7
    Last Post: 09-25-2010, 07:57 AM
  4. Filesystem corrupted?
    By Nymix-CB in forum Hosting Security and Technology
    Replies: 2
    Last Post: 05-03-2005, 03:42 PM
  5. Any way to Repair corrupted mail storage files on Cpanel Systems?
    By sprintserve in forum Hosting Security and Technology
    Replies: 6
    Last Post: 08-17-2004, 12:35 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •