Results 1 to 19 of 19
-
04-14-2012, 07:32 AM #1WHT Addict
- Join Date
- Jan 2008
- Location
- Montreal, Canada
- Posts
- 133
R1SOFT v3 brings down servers (I/O issue)
Hi,
Is there anyone else who had an issue with R1SOFT v3 entreprise? Each week it brings a server down because of an I/O issue. The servers are in RAID10, we have to reboot them in order for them to be back online (we see many "CDP I/O" in the process manager...).
Thank you WHT,
-
04-14-2012, 11:28 AM #2
You might have full block scan turned on for the backups. You should only be doing a full block scan for the initial backup, or, potentially, for a backup that occurs where the r1soft server isn't sure it has a good record of all the file deltas (like if you had to reinstall the cdp agent after a kernel upgrade). Full block scan is an option you can turn on / off in the backup policy, so you want to make sure it's off. It's also worthwhile to schedule the r1soft backup for an off peak time of day. For all of these reasons (and more) we started doing daily backups instead of hourly, which also helps here.
IOFLOOD.com -- We Love Servers
Phoenix, AZ Dedicated Servers in under an hour
★ Ryzen 9: 7950x3D ★ Dual E5-2680v4 Xeon ★
Contact Us: sales@ioflood.com ★
-
04-15-2012, 12:34 AM #3WHT Addict
- Join Date
- Jan 2008
- Location
- Montreal, Canada
- Posts
- 133
Hi,
Thank you for your answer. Full block scan is not active, I'll contact R1SOFT directly.
If anyone else had the same issue, please let me know
Best Regards,
-
04-15-2012, 01:01 AM #4The Linux Specialist
- Join Date
- Mar 2003
- Location
- /root
- Posts
- 23,991
Moved > Specialty Hosting and Markets.
-
04-15-2012, 06:00 PM #5Disabled
- Join Date
- Jun 2005
- Posts
- 3,455
Its normal. They will crash servers frequently, and sometimes cause file corruption on the drives as well which you need to manually fix. Welcome to the world of CDP.
-
04-15-2012, 06:41 PM #6Rebooting is a hack, not a fix
- Join Date
- May 2008
- Location
- Citrus Heights, CA
- Posts
- 1,887
iWebFusion.Net - Shared / Reseller / VPS / Bare Metal / Colocation / IP Transit / Networking
*Simply Hosting - Wholly owned networks, in-house staff, legions of fans!
-
04-15-2012, 08:40 PM #7Web Hosting Master
- Join Date
- Jun 2002
- Location
- PA, USA
- Posts
- 5,143
We rarely have issue with CDP. It we do, then my admins have not told me of the issues.
What kind of drives and how many of them do you have on your RAID10? How many servers are you backing up?
-
04-15-2012, 10:17 PM #8Web Hosting Evangelist
- Join Date
- Mar 2003
- Location
- Kansas City, Missouri
- Posts
- 462
* Upgrade your version to the latest available version
* Build a new kernel module (r1soft-setup --get-module) and then restart your CDP agent (/etc/init.d/cdp-agent restart)
There were lots of older versions of their kernel module that created IO issues. Please verify you are on the latest greatest versions. We back up quite a few systems without issues.=> • Admo.net Managed Hosting •
=> Managed Hosting • Dedicated Servers • Colocation
=> Dark Fiber Access to 1102 Grand, Multiple Public Providers
=> Over •Sixteen• Years of Service
-
04-15-2012, 10:23 PM #9Problem Solver
- Join Date
- Mar 2003
- Location
- California USA
- Posts
- 13,681
Are you using mdadm raid and cloudlinux?
Steven Ciaburri | Industry's Best Server Management - Rack911.com
Software Auditing - 400+ Vulnerabilities Found - Quote @ https://www.RACK911Labs.com
Fully Managed Dedicated Servers (Las Vegas, New York City, & Amsterdam) (AS62710)
FreeBSD & Linux Server Management, Security Auditing, Server Optimization, PCI Compliance
-
06-05-2012, 09:25 AM #10Junior Guru Wannabe
- Join Date
- Jul 2006
- Posts
- 95
i've been having this issue since January with no end in sight, did 4.0 fix it? nope.
the error related is this:
An exception occurred during the request. Unable to stop snapshot for device '/dev/xvda#' with id 1: Operation not permitted
and then when the next scheduled backup starts, it cant tell that something is still running and causes the CPU to surge and this kills the server.
they keep saying it will be fixed but nothing.
also any attempt to stop the cdp process if you can catch after the first bad backup, fails with any attempts i've tried so you STILL have to reboot to clear the issue (although at least you can turn off the backup and your server wont hang so you can do it at a good time)
yep im using cloudlinuxEthicalHost - Green, Socially Responsible Web Hosting.
-
06-05-2012, 10:14 AM #11The Guru!
- Join Date
- Nov 2007
- Location
- India, USA and Amsterdam
- Posts
- 2,581
Are you using Cloudlinux 6?
Check this thread out there seem to be people reporting performance issues here
http://www.webhostingtalk.com/showth...1155043&page=2
-
06-05-2012, 03:16 PM #12WHT Addict
- Join Date
- Jan 2007
- Posts
- 158
In our CDP world we use box backup and have never had a single stability issue with it. We also have rolled our own CDP like backup system using some custom server side scripts called by bacula.
-
06-05-2012, 03:37 PM #13Junior Guru Wannabe
- Join Date
- Jul 2006
- Posts
- 95
EthicalHost - Green, Socially Responsible Web Hosting.
-
12-11-2012, 01:39 PM #14Disabled
- Join Date
- Jun 2005
- Posts
- 3,455
The issue is back like never before in version 5. Just had 2 crash in 1 week since I upgraded to Idera Server Backup version 5. In all of them R1Soft was doing a backup and not only crashed the VM like it was normal in v3, but crashed the whole dom0 node !!! The whole hardware went crazy because of high I/O load.
When rebooting the node, the agent was still doing the backup, it never failed, even while the hardware was being rebooted, 5 minutes after it was online, it hung again because r1soft server was still hitting the server, cancelling the backup task immediately made the node respond again. This is not bad. This is AWFUL !!!
Xenserver 6 will give all type of errors under load like Input/ouput errors, without letting you enter any command at all. Stopping the backup task solves the problem.
-
12-17-2012, 11:43 PM #15Junior Guru Wannabe
- Join Date
- Jul 2006
- Posts
- 95
i've found mine to still be pretty stable so far with 5, what did support say about it I dont want to see this happening again??
thanksEthicalHost - Green, Socially Responsible Web Hosting.
-
12-18-2012, 12:28 AM #16Disabled
- Join Date
- Jun 2005
- Posts
- 3,455
I don´t contact support anymore. They never found a solution years back, so why would have changed today.
The issues are extremely rare to detect as you need to report it when its happening and I don´t know about you but I cannot have a server down for days.
Usually I reboot the machine immediately, and R1soft will cause all kinds of corruption in the file system as its still running.
This seems to happen when I/O is already high on a server. Last time I reboot a machine and it went down almost 4 minutes later, and the on the CDP v5 server the backup was still running. It never detected the server reboot either, the task was still running like if nothing happen, and the server went crazy, so I cancelled the running backup task and the server started to respond again.
There seems something you can replicate on your servers. While a backup is running on a server, the I/O will slowly increase, slowly but it does increase. For example if the its 0.90 it will increase to 0.91 after one or two minutes, and then to 0.92, and so on.
So you are better lucky that backups do not take to long to complete, otherwise you have a potential problem.
Now if the server is on load this is a problem, in particular when for some strange reason the task is frozen and just keeps running for ever. Then I/O will spike to unlimited numbers, because the I/O does increase while the agent is running, and since the task never completed, and its not stopping either, after some hours your server will crash and in a very violent way causing all types of corruptions on a file system. I had this years back and I had this last year with v4 as well.
R1soft caused me huge down times because of this, as the file systems will go into read mode only after such a crash, and you need to take the machine offline to repair it, and this can take hours and hours for huge drives.
-
12-18-2012, 01:27 AM #17Web Hosting Master
- Join Date
- Apr 2003
- Location
- Los Angeles, CA
- Posts
- 820
All those horror stories about CDP make me wonder why people use that solution.
Have you looked into ZFS snapshots + zfs send / zfs recv? We've had really good luck with it. Snapshots take a second to make, a few seconds to destroy even on datasets hundreds of GB in size. zfs send/recv pretty much saturates the gigabit link between hosts so moving 10 GB of incremental differences takes only couple minutes without any load issues.
Potential drawbacks include that ZFS is a COW file system so writes are fast, but reads can get slower over time due to fragmentation (hasn't really been an issue for us after ~1 year of use) and that the Linux port is still 0.6.x.
Pick your poison.Pings <1 ms, Unlimited Transfer, Lowest Price: http://localhost/
-
12-18-2012, 05:24 AM #18WHT Addict
- Join Date
- Jan 2007
- Posts
- 158
-
12-18-2012, 05:54 AM #19Disabled
- Join Date
- Jun 2005
- Posts
- 3,455
Well usually it works well, but then one or two times a year something strange crashes a server out of the blue and its almost always tracked to exactly the same time a backup was running. Coincidence? You make the guess.
One simple solution would be if you could set limits on the agent itself. For example if the server is at XX I/O not to run backups or abort them immediately on the agent, instead of having to manually log into the server and stopping the agent...
Or if a backup is taking XX minutes to finish, then abort the task.
If this 2 things could be configured on the agent side, this would solve allot of issues.
You should be able to set this configs on the CDP server for centralized management, and the server sends this configuration update to the agent, this settings then should be saved and enabled in the agent, not in the CDP server.
This would avoid having to configure agents manually but would still leave the agent to enforce this limits in case it lost communication with the CDP server.
This 2 settings on the agent side are very basic but could solve some issues people had in the past, in particular when a new version is released that is unstable, this could avoid the agent just going on a killing spree on a server.
Almost all issues I had and users here reported where with the server running the agent. This means, the agent has to much priority on the server, and even when the its causing huge loads, huge disk and write reads, it will not stop. It will keep running as nothing, and it will just blow up the servers drives. The agent just wants to finish its backups once its started and this is wrong.
I also noticed that version to version, from 2 to 3, fro 3 to 4 and now 5, the agent is more hungry on resources. You keep upgrading hardware, but the agent keeps wanting more and more on each version. It requires now plenty of ram and while its doing a backup its very intensive on CPU, even if it just has to replicate a few hundreds megabytes.
On every single web load metrics I have, there are spikes, and each one is exactly when a backup is running. So if you server with 4 or 8 cores are always on 0.20 load, 0.50, when a backup is running it will easily stay on 2 load until it finishes the backup.
Similar Threads
-
R1Soft issue
By warlock-m in forum VPS HostingReplies: 6Last Post: 04-06-2011, 07:28 AM -
R1Soft issue on Xen VM
By HarrySX in forum Hosting Software and Control PanelsReplies: 2Last Post: 04-02-2011, 08:47 AM -
Secret Santa Brings Servers @ $15.00 Over Cost!! Win a Nintendo Wii or Ipod iTouch
By RobertMaltby in forum Dedicated Hosting OffersReplies: 0Last Post: 12-16-2007, 11:14 PM -
Secret Santa Brings Servers @ $15.00 Over Cost - Win a Nintendo Wii or Ipod iTouch!!
By RobertMaltby in forum Dedicated Hosting OffersReplies: 9Last Post: 12-12-2007, 05:38 PM -
MegaNetServe brings back $49 /mo. Dedicated Servers
By MegaNetServe in forum Dedicated Hosting OffersReplies: 7Last Post: 03-17-2005, 05:13 PM