Is there anyone else who had an issue with R1SOFT v3 entreprise? Each week it brings a server down because of an I/O issue. The servers are in RAID10, we have to reboot them in order for them to be back online (we see many "CDP I/O" in the process manager...).
You might have full block scan turned on for the backups. You should only be doing a full block scan for the initial backup, or, potentially, for a backup that occurs where the r1soft server isn't sure it has a good record of all the file deltas (like if you had to reinstall the cdp agent after a kernel upgrade). Full block scan is an option you can turn on / off in the backup policy, so you want to make sure it's off. It's also worthwhile to schedule the r1soft backup for an off peak time of day. For all of these reasons (and more) we started doing daily backups instead of hourly, which also helps here.
Steven Ciaburri | Proactive Linux Server Management- Rack911.com System Administration Extraordinaire | Follow us on twitter:@Rack911Labs Managed Servers (AS62710), Server Management, and Security Auditing. www.HostingSecList.com - Security notices for the hosting community.
i've been having this issue since January with no end in sight, did 4.0 fix it? nope.
the error related is this:
An exception occurred during the request. Unable to stop snapshot for device '/dev/xvda#' with id 1: Operation not permitted
and then when the next scheduled backup starts, it cant tell that something is still running and causes the CPU to surge and this kills the server.
they keep saying it will be fixed but nothing.
also any attempt to stop the cdp process if you can catch after the first bad backup, fails with any attempts i've tried so you STILL have to reboot to clear the issue (although at least you can turn off the backup and your server wont hang so you can do it at a good time)
The issue is back like never before in version 5. Just had 2 crash in 1 week since I upgraded to Idera Server Backup version 5. In all of them R1Soft was doing a backup and not only crashed the VM like it was normal in v3, but crashed the whole dom0 node !!! The whole hardware went crazy because of high I/O load.
When rebooting the node, the agent was still doing the backup, it never failed, even while the hardware was being rebooted, 5 minutes after it was online, it hung again because r1soft server was still hitting the server, cancelling the backup task immediately made the node respond again. This is not bad. This is AWFUL !!!
Xenserver 6 will give all type of errors under load like Input/ouput errors, without letting you enter any command at all. Stopping the backup task solves the problem.