Results 1 to 12 of 12
-
08-23-2011, 01:46 AM #1Disabled
- Join Date
- Jun 2007
- Location
- UK and Hong Kong
- Posts
- 243
Need urgent help from HyperVM/XEN experts!
Hello,
I am having some very strange problems with my HyperVM host node.
Here is the set-up.
i7-950 Processor
2x2TB HDD in SW RAID1
24GB DDR3 RAM
CentOS 5.6
HyperVM with XEN
I have some 26 VMs running on the server and in the last week, the server has managed to 'crash' 4 times requiring a total reboot of the whole machine.
I thought it may be one of the users using too much CPU or somthing but having ensured that the allowances for each VM was set to a sensible level, it still happened.
Now, what happens is that all the VMs can still be pinged. Just none of the services work. No SSH, no HTTP, nothing.
The host node itself can also be pinged as well as get to the login screen of both the HyperVM control panel and SSH. However, you cannot get pased the login.
To resolve, I have to do a hard reset of the whole server.
This particular node is clearly unstable and I am desperatly seeking a solution.
If there are any experts out there, I would be willing to pay for your expert time (assuming it is affordable )
At the moment the VPS is running fine. Since it seems to happen between a 24 and 48 hour period, I would hopefully be able to find someone to take a look, find and resolve the issue before the next 'problem'.
Cheers in advance all!
-
08-23-2011, 02:51 AM #2Web Hosting Evangelist
- Join Date
- Jul 2008
- Posts
- 456
It sounds like you have an issue which is affecting your networking, probably some sort of flooding / DOS attack or even a VM user overloading their connection.
Regardless, finding the source of the issue should be very simple.
-
08-23-2011, 02:55 AM #3Disabled
- Join Date
- Jun 2007
- Location
- UK and Hong Kong
- Posts
- 243
I sincerely hope it is simple. I really hope it is not some underlying problem with the software or physical components of the server....
There was one account that used 7* more processing power than the rest of the VMs combined, but this would be the case for days and days and the server would run without a hitch. I booted it just in case, but I have no idea if it is actually making an affect...
Server Prodigy, are you able to offer some assistance or just pointing out that it should be easy?
Cheers!
-
08-23-2011, 03:08 AM #4Web Hosting Evangelist
- Join Date
- Jul 2008
- Posts
- 456
I've sent you a PM. I don't want to violate any rules on the forum.
If you want some tips and advice on what to look for I can provide that via this discussion if you want.
-
08-23-2011, 03:10 AM #5Disabled
- Join Date
- Jun 2007
- Location
- UK and Hong Kong
- Posts
- 243
I have responded to your PM already, thanks.
But if you don't mind shareing some of your idea, I always thing public discussions are good for future reference. Perhaps someone else will have a similar problem and your tips may prove to be useful to them :-). I know I have resolved many a problem in this way.
-
08-23-2011, 04:51 AM #6Web Hosting Master
- Join Date
- Jul 2010
- Location
- ~/
- Posts
- 1,382
Hi HHKNet,
Sounds like a nightmare, have you contacted http://www.smartservermanagement.com/ to get a node audit done, these guys are life savers.
Have you got a cronjob dumping and emailing the output of xentop to you, if you dont I suggest setting that up and even have it as regular as 10 minutes for now so you can at least log activity even if basic while the server is offline.
Also you should if you have not all ready get iftop installed so you can see which VPS is swallowing your bandwidth if that happens to be the case.
Few questions:
I assume your renting a dedi box, does your DC give you the option to monitor bandwidth to your server in real time?
I do tend to agree is sounds like flooding causing the server to become unresponsive.
Only other thought from the top of my head, you said you have 2 x ata disks in software raid 1, which means you don't even have the disk performance of a singe sata disk on your server and with 26 VMs on the server your disk latency is going to be insane.
Do you have regular raid syncs? is your raid in sync 'cat /proc/mdstat'? have you placed a limit of the default weekly raid sync?
It COULD be that your raid is having issues and is constantly syncing, based on your disk setup even with 26 VM's even if it is only trying to sync at 20MB/s its going to bring your server to an absolute crawl.
Just some ideas, if your really stuck I am happy to have a look for you, HyperVM is not my thing but I doubt that has anything to do with it (free of course)█ -> INCEPTION HOSTING LIMITED Since 2010!
█ -> I am most active on the lowendspirit hosting forum Come join us!
█ -> PHOENIX USA & THE NETHERLANDS & UK EU
-
08-23-2011, 05:02 AM #7Disabled
- Join Date
- Jun 2007
- Location
- UK and Hong Kong
- Posts
- 243
backtogeek,
Thank you for your reply.
I have contacted the guys at smartservermanagement. They no longer support HyperVM and although I have no problem with migrating to SolusVM, they said they are too busy to take on this work right now and will have to get back to me some time later.
I do not have a cronjob dumping the email output so I think I might do just that.
I do not think bandwidth is an issue. I can see the total traffic of the server and its nothing the network cant handle.
The server is a colo box, not a rental. I can monitor traffic in real-time however yes.
Flooding... possible but for some reason I find it unlikely. If a reboot can fix it, then I dont think it's a flood... am I missing somthing?
MDSTAT shows the raid is in sync, but even when not, the performance never seems to be that bad. That being said, I am certainly considering a hardware raid card now!
Thanks for the advice! I'll keep this thread apprised.
-
08-23-2011, 06:58 AM #8Southern Yankee
- Join Date
- Sep 2006
- Location
- The Not So Deep South
- Posts
- 931
How much ram do you have reserved for Dom0 ? Is the master in a DomU ?
Hostigation.com - High Resource Hosting
WHM/cPanel Servers for Hosting and Dedicated Needs
SolusVM VPS Hosting - Big Features, Small Prices
Like us on Facebook or follow @hostigation on Twitter
-
08-24-2011, 04:52 PM #9Web Hosting Master
- Join Date
- Dec 2005
- Posts
- 3,110
This is clearly a Xen/Low level issue, HyperVM is just a control panel.
Drop us another email and we will have a look, it wasn't clear this was the original fault when you contacted us
Thanks
Chris
-
08-25-2011, 12:33 AM #10Disabled
- Join Date
- Jun 2007
- Location
- UK and Hong Kong
- Posts
- 243
Hey guys.
First, the whole server has 24GB RAM of which currently dom0 has 5GB available. I made this change but it doesnt seemed to have made a difference (actually server prodigy has been helping and he made this particular change).
Edit /etc/grub.conf using your favorite editor.
In the "kernel /versionnumberhere" line add the following to the end of the line: dom0_mem=512M
Set up a new server with identical hardware (this time I will use the motherboards 'HW' RAID), install CentOS 5.5 (not 5.6) and setup with XEN and HyperVM again. Transfer the VMs over and I guess just 'pray' that this time things are stable.
Chris (PCS), if you think you know what the issue is, I would be more than happy for you to take a look. I've already had 3 people look at this, 1 of which I know to be highly skilled, and I have not recieved any definitive answers. But, if you are willing to look, and you think you can fix it, I am more than happy to pay! Of course I expect it to be a long term solution that allows my server to run stable and for many hours.
Since originally starting this thread, the server has crashed some 3 times. Downtime has been 'minimal' as I have written a script to check the status of the server and do a hard reset after 15 mins of being unresponsive.
PCS-Chris, should I reply by E-Mail?
Cheers!
-
08-25-2011, 12:54 AM #11Web Hosting Evangelist
- Join Date
- Jul 2008
- Posts
- 456
I'm going to take a shot at filling in the missing info before people waste a lot of time and energy on this.
The client upgraded his server past CentOS 5.5 and ran into some of the weird networking bugs that are affecting xen systems. A friend of his talked him into trying to downgrade the system. Since then, the server goes unstable every few hours to every 24 hours without any logic or indication in the logs what is going on.
I've worked this same issue a half dozen times in the past month for hosts who tried to update VPS server like this and the only solid, long term solution is to reinstall the OS and software and restore the client VMs.
At first, he did not have another server we could move the clients to so we had to go with an option of making back ups of all the client VMs and then seeing how quickly we could restore the system w/ as little downtime as possible. (We're still doing the backups just in case the server goes sideways as he had no backups of client VMs to start with other than RAID based system state backups on the same machine).
Now that he has another server, the plan is to do a hyper v to hyper v migration to the new machine. He still has this glimmer of hope that someone can magically fix the old server though which resulted in someone adjusting things last night which took the server down for an hour (non stop reboots then no RAM for the VMs, etc). The current system has massive corruption issues, I've never seen a server ignore the dom0 RAM limit setting, for example (in the kernel line of grub.conf). It also tends to begin freezing up for an hour before it crashes. There is no indication of hardware problems, etc - no seg faults, etc.
IMO, having multiple people jump on a server and try and fix things is asking for trouble but if anyone disagrees and thinks they have a better solution, please post it.
-
08-25-2011, 01:00 AM #12Disabled
- Join Date
- Jun 2007
- Location
- UK and Hong Kong
- Posts
- 243
Indeed, I think it is a bad idea to have too many people try and fix things at the same time (too many chefs). But I don't think it's that bad an idea for someone to look (or is it?).
In any case, I am under no illusion that the solution is most likely going to be a full migration to a new box, but can you blame me for hoping ?
IF there is a fix, and some masterful admin knows what it is, then I'm happy to hear it. But I am proceeding as if that's not going to come.
And yes, curse my friend's recomendation to downgrade the kernel, and curse him for changing the dom0-min-mem to 0!!! I won't be listening to him again.
Similar Threads
-
[UK] Affordable Server Management - SolusVM/HyperVM Experts - From £20/Hr
By uksysadmin in forum Systems Management OffersReplies: 1Last Post: 04-05-2010, 08:31 AM -
[UK] SolusVM/HyperVM/Xen/OpenVZ Server Admin + Xen OS Templates
By uksysadmin in forum Systems Management OffersReplies: 0Last Post: 03-04-2010, 02:54 PM -
URGENT! Need HyperVM expert
By AstroNyu in forum Employment / Job OffersReplies: 5Last Post: 09-10-2009, 12:19 AM -
Citrix Xen or HyperVM Xen
By JumptoMedia in forum VPS HostingReplies: 4Last Post: 05-28-2009, 11:02 AM -
URGENT - Hypervm/Xen Expert Needed
By uksysadmin in forum Systems Management RequestsReplies: 4Last Post: 06-07-2008, 01:21 PM