Results 1 to 12 of 12
  1. #1
    Join Date
    Jun 2007
    Location
    UK and Hong Kong
    Posts
    243

    Need urgent help from HyperVM/XEN experts!

    Hello,

    I am having some very strange problems with my HyperVM host node.

    Here is the set-up.

    i7-950 Processor
    2x2TB HDD in SW RAID1
    24GB DDR3 RAM
    CentOS 5.6
    HyperVM with XEN

    I have some 26 VMs running on the server and in the last week, the server has managed to 'crash' 4 times requiring a total reboot of the whole machine.

    I thought it may be one of the users using too much CPU or somthing but having ensured that the allowances for each VM was set to a sensible level, it still happened.

    Now, what happens is that all the VMs can still be pinged. Just none of the services work. No SSH, no HTTP, nothing.

    The host node itself can also be pinged as well as get to the login screen of both the HyperVM control panel and SSH. However, you cannot get pased the login.

    To resolve, I have to do a hard reset of the whole server.

    This particular node is clearly unstable and I am desperatly seeking a solution.

    If there are any experts out there, I would be willing to pay for your expert time (assuming it is affordable )

    At the moment the VPS is running fine. Since it seems to happen between a 24 and 48 hour period, I would hopefully be able to find someone to take a look, find and resolve the issue before the next 'problem'.

    Cheers in advance all!

  2. #2
    Join Date
    Jul 2008
    Posts
    456
    It sounds like you have an issue which is affecting your networking, probably some sort of flooding / DOS attack or even a VM user overloading their connection.

    Regardless, finding the source of the issue should be very simple.

  3. #3
    Join Date
    Jun 2007
    Location
    UK and Hong Kong
    Posts
    243
    I sincerely hope it is simple. I really hope it is not some underlying problem with the software or physical components of the server....

    There was one account that used 7* more processing power than the rest of the VMs combined, but this would be the case for days and days and the server would run without a hitch. I booted it just in case, but I have no idea if it is actually making an affect...

    Server Prodigy, are you able to offer some assistance or just pointing out that it should be easy?

    Cheers!

  4. #4
    Join Date
    Jul 2008
    Posts
    456
    I've sent you a PM. I don't want to violate any rules on the forum.

    If you want some tips and advice on what to look for I can provide that via this discussion if you want.

  5. #5
    Join Date
    Jun 2007
    Location
    UK and Hong Kong
    Posts
    243
    I have responded to your PM already, thanks.

    But if you don't mind shareing some of your idea, I always thing public discussions are good for future reference. Perhaps someone else will have a similar problem and your tips may prove to be useful to them :-). I know I have resolved many a problem in this way.

  6. #6
    Join Date
    Jul 2010
    Location
    ~/
    Posts
    1,382
    Hi HHKNet,

    Sounds like a nightmare, have you contacted http://www.smartservermanagement.com/ to get a node audit done, these guys are life savers.

    Have you got a cronjob dumping and emailing the output of xentop to you, if you dont I suggest setting that up and even have it as regular as 10 minutes for now so you can at least log activity even if basic while the server is offline.

    Also you should if you have not all ready get iftop installed so you can see which VPS is swallowing your bandwidth if that happens to be the case.

    Few questions:

    I assume your renting a dedi box, does your DC give you the option to monitor bandwidth to your server in real time?

    I do tend to agree is sounds like flooding causing the server to become unresponsive.

    Only other thought from the top of my head, you said you have 2 x ata disks in software raid 1, which means you don't even have the disk performance of a singe sata disk on your server and with 26 VMs on the server your disk latency is going to be insane.

    Do you have regular raid syncs? is your raid in sync 'cat /proc/mdstat'? have you placed a limit of the default weekly raid sync?

    It COULD be that your raid is having issues and is constantly syncing, based on your disk setup even with 26 VM's even if it is only trying to sync at 20MB/s its going to bring your server to an absolute crawl.

    Just some ideas, if your really stuck I am happy to have a look for you, HyperVM is not my thing but I doubt that has anything to do with it (free of course)
    -> INCEPTION HOSTING LIMITED Since 2010!
    -> I am most active on the lowendspirit hosting forum Come join us!
    -> PHOENIX USA & THE NETHERLANDS & UK EU

  7. #7
    Join Date
    Jun 2007
    Location
    UK and Hong Kong
    Posts
    243
    backtogeek,

    Thank you for your reply.

    I have contacted the guys at smartservermanagement. They no longer support HyperVM and although I have no problem with migrating to SolusVM, they said they are too busy to take on this work right now and will have to get back to me some time later.

    I do not have a cronjob dumping the email output so I think I might do just that.

    I do not think bandwidth is an issue. I can see the total traffic of the server and its nothing the network cant handle.

    The server is a colo box, not a rental. I can monitor traffic in real-time however yes.

    Flooding... possible but for some reason I find it unlikely. If a reboot can fix it, then I dont think it's a flood... am I missing somthing?

    MDSTAT shows the raid is in sync, but even when not, the performance never seems to be that bad. That being said, I am certainly considering a hardware raid card now!

    Thanks for the advice! I'll keep this thread apprised.

  8. #8
    Join Date
    Sep 2006
    Location
    The Not So Deep South
    Posts
    931
    How much ram do you have reserved for Dom0 ? Is the master in a DomU ?
    Hostigation.com - High Resource Hosting
    WHM/cPanel Servers for Hosting and Dedicated Needs
    SolusVM VPS Hosting - Big Features, Small Prices
    Like us on Facebook or follow @hostigation on Twitter

  9. #9
    Join Date
    Dec 2005
    Posts
    3,110
    This is clearly a Xen/Low level issue, HyperVM is just a control panel.

    Drop us another email and we will have a look, it wasn't clear this was the original fault when you contacted us

    Thanks
    Chris

  10. #10
    Join Date
    Jun 2007
    Location
    UK and Hong Kong
    Posts
    243
    Hey guys.

    First, the whole server has 24GB RAM of which currently dom0 has 5GB available. I made this change but it doesnt seemed to have made a difference (actually server prodigy has been helping and he made this particular change).

    Edit /etc/grub.conf using your favorite editor.
    In the "kernel /versionnumberhere" line add the following to the end of the line: dom0_mem=512M
    I plan on doing the following.

    Set up a new server with identical hardware (this time I will use the motherboards 'HW' RAID), install CentOS 5.5 (not 5.6) and setup with XEN and HyperVM again. Transfer the VMs over and I guess just 'pray' that this time things are stable.

    Chris (PCS), if you think you know what the issue is, I would be more than happy for you to take a look. I've already had 3 people look at this, 1 of which I know to be highly skilled, and I have not recieved any definitive answers. But, if you are willing to look, and you think you can fix it, I am more than happy to pay! Of course I expect it to be a long term solution that allows my server to run stable and for many hours.

    Since originally starting this thread, the server has crashed some 3 times. Downtime has been 'minimal' as I have written a script to check the status of the server and do a hard reset after 15 mins of being unresponsive.

    PCS-Chris, should I reply by E-Mail?

    Cheers!

  11. #11
    Join Date
    Jul 2008
    Posts
    456
    I'm going to take a shot at filling in the missing info before people waste a lot of time and energy on this.

    The client upgraded his server past CentOS 5.5 and ran into some of the weird networking bugs that are affecting xen systems. A friend of his talked him into trying to downgrade the system. Since then, the server goes unstable every few hours to every 24 hours without any logic or indication in the logs what is going on.

    I've worked this same issue a half dozen times in the past month for hosts who tried to update VPS server like this and the only solid, long term solution is to reinstall the OS and software and restore the client VMs.

    At first, he did not have another server we could move the clients to so we had to go with an option of making back ups of all the client VMs and then seeing how quickly we could restore the system w/ as little downtime as possible. (We're still doing the backups just in case the server goes sideways as he had no backups of client VMs to start with other than RAID based system state backups on the same machine).

    Now that he has another server, the plan is to do a hyper v to hyper v migration to the new machine. He still has this glimmer of hope that someone can magically fix the old server though which resulted in someone adjusting things last night which took the server down for an hour (non stop reboots then no RAM for the VMs, etc). The current system has massive corruption issues, I've never seen a server ignore the dom0 RAM limit setting, for example (in the kernel line of grub.conf). It also tends to begin freezing up for an hour before it crashes. There is no indication of hardware problems, etc - no seg faults, etc.

    IMO, having multiple people jump on a server and try and fix things is asking for trouble but if anyone disagrees and thinks they have a better solution, please post it.

  12. #12
    Join Date
    Jun 2007
    Location
    UK and Hong Kong
    Posts
    243
    Indeed, I think it is a bad idea to have too many people try and fix things at the same time (too many chefs). But I don't think it's that bad an idea for someone to look (or is it?).

    In any case, I am under no illusion that the solution is most likely going to be a full migration to a new box, but can you blame me for hoping ?

    IF there is a fix, and some masterful admin knows what it is, then I'm happy to hear it. But I am proceeding as if that's not going to come.

    And yes, curse my friend's recomendation to downgrade the kernel, and curse him for changing the dom0-min-mem to 0!!! I won't be listening to him again.

Similar Threads

  1. [UK] Affordable Server Management - SolusVM/HyperVM Experts - From £20/Hr
    By uksysadmin in forum Systems Management Offers
    Replies: 1
    Last Post: 04-05-2010, 08:31 AM
  2. [UK] SolusVM/HyperVM/Xen/OpenVZ Server Admin + Xen OS Templates
    By uksysadmin in forum Systems Management Offers
    Replies: 0
    Last Post: 03-04-2010, 02:54 PM
  3. URGENT! Need HyperVM expert
    By AstroNyu in forum Employment / Job Offers
    Replies: 5
    Last Post: 09-10-2009, 12:19 AM
  4. Citrix Xen or HyperVM Xen
    By JumptoMedia in forum VPS Hosting
    Replies: 4
    Last Post: 05-28-2009, 11:02 AM
  5. URGENT - Hypervm/Xen Expert Needed
    By uksysadmin in forum Systems Management Requests
    Replies: 4
    Last Post: 06-07-2008, 01:21 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •