Web Hosting Talk







View Full Version : RZ failover response time or alternatives?


mikehall
10-25-2007, 03:06 PM
I've been reading alot of mixed review of Reseller Zoom here (as is the case with any host I suppose). Has anyone actually had the failover service take effect? I've heard it doesn't run as smoothly as it claims, and I wasn't sure what that meant. Support told me it take over after 3-5 minutes of downtime.

Alternatively does anyone else offer failover services (preferably in the US)? I've heard people mention get 2 hosts and use rsync, but i think that is difficult with a reseller account. My current host has had several outages of hours at a time, I really need some way to prevent this and don't mind spending the money to do so.

onthespot
10-25-2007, 03:33 PM
it depends on what they use, if they use heartbeat then the normal arp timeout is 2 minutes.

Nnyan
10-25-2007, 04:39 PM
Interesting question I'm not aware of anyone that has actually gone through that process (my experience with RZ was pre-failover plans). But it would not be all that difficult to test.

Cartika offers clustered service which I think is a step up from a failover system.

armata
10-30-2007, 01:17 PM
As an ex-user of RZ Failover service, I must say that it was VERY unreliable, in fact the downtime was worse than for not-failover option (which was pretty bad too).

mikehall
10-30-2007, 01:19 PM
Can you give me more details? Does the service just not function? Would the backup server also go down, data center issues?

PremiumHost
10-30-2007, 08:02 PM
Interesting question I'm not aware of anyone that has actually gone through that process (my experience with RZ was pre-failover plans). But it would not be all that difficult to test.

Cartika offers clustered service which I think is a step up from a failover system.

I think H-sphere cluster doesn't work the same way as what RZ offer.
I don't know much about the reliability of RZ failover service though.

Jedito
10-30-2007, 08:34 PM
Cartika offers clustered service which I think is a step up from a failover system.

There's not relationship between fail over and clustered, in a clustered system if a server goes down, there's not another server comming up to cover the server down, in a fail over system there's an alternative server mirroring the data of the server which came up when the other it's down.

cartika-andrew
10-30-2007, 09:25 PM
There's not relationship between fail over and clustered, in a clustered system if a server goes down, there's not another server comming up to cover the server down, in a fail over system there's an alternative server mirroring the data of the server which came up when the other it's down.

Jedito is obviously correct - there isnt a "failover" server in our cluster or clusters like ours - however, our system and cluster seems to have better uptime then any failover system I have seen (and not just by a little). The issue with failover systems is that there is a delay to "failover" and typically, since hardware failures arent really the primary cause of service interuptions in shared environments - failover accomplishes nothing more then carrying an issue over to a new server and causing an outage or outages in the meantime. Failover systems are a poor mans cluster and I have never seen one that will out perform or experience better uptime then a services cluster - all else being equal....

ldcdc
10-30-2007, 09:48 PM
and typically, since hardware failures arent really the primary cause of service interuptions in shared environments - failover accomplishes nothing more then carrying an issue over to a new server and causing an outage or outages in the meantime.Even more so when the backup server is slightly less powerful than the failed server (to reduce cots), or the job of 2 servers is switched to a single one, as apparently is done here: http://resellerzoom.com/failover-technology.shtml

Annex
10-30-2007, 10:02 PM
Jedito is obviously correct - there isnt a "failover" server in our cluster or clusters like ours - however, our system and cluster seems to have better uptime then any failover system I have seen (and not just by a little). The issue with failover systems is that there is a delay to "failover" and typically, since hardware failures arent really the primary cause of service interuptions in shared environments - failover accomplishes nothing more then carrying an issue over to a new server and causing an outage or outages in the meantime. Failover systems are a poor mans cluster and I have never seen one that will out perform or experience better uptime then a services cluster - all else being equal....
In a DDoS attack failover would stay up better than a cluster. Zombies cache the IP of the site they are attacking, so they can continually ping without using alot of BW. A failover would switch servers, and therefore ips, while a cluster would simply buckle.

cartika-andrew
10-30-2007, 10:06 PM
In a DDoS attack failover would stay up better than a cluster. Zombies cache the IP of the site they are attacking, so they can continually ping without using alot of BW. A failover would switch servers, and therefore ips, while a cluster would simply buckle.

Thats a poor example - if you are relying on "Failover" to protect you from a DDOS - then you are using the wrong tool for the task. Firewalls and Intrusion Detection/Prevention devices should be used to mitigate DDOS attacks - again however, ddos attacks make up a very small percentage of total outages for shared hosting - and I mean very small (probably less then hardware failures) - so, not sure how this is at all relevant...

cartika-andrew
10-30-2007, 10:21 PM
Even more so when the backup server is slightly less powerful than the failed server (to reduce cots), or the job of 2 servers is switched to a single one, as apparently is done here: http://resellerzoom.com/failover-technology.shtml

Well, I honestly do not know ANYTHING about RZ's failover - but, what you have described above - would almost be less reliable then a single server setup... might as well stick with single server cpanel setups from really reliable and proven cpanel providers (ie downtownhost or bluefur or dynamicnet or rochen, etc) - as you will be MUCH better off....

shared hosting is a different beast - we were (and are) SO close to launching our load balanced clustered solution - but, the more we looked at this, and the more we tested - the more we began to understand why all of these load balanced, high availability systems (or "Grid" systems as some call it) have not been able to outperform our existing services cluster. The reason is simple - each website hosted in a shared environment needs to be considered a single point of failure - and the reality is, most outages in a shared environment are caused by websites and users (directly or indirectly) - so, increasing hardware redundancy does little or nothing to increase reliability in shared hosting - and in fact, if you have massive arrays of servers and just load on even more domains and users - well guess what - more points of failure and lower overall uptime....

Sure, it is easy to launch a system with redundant load balancers, redundant load balanced nodes and redundant shared storage - but, thats not what gives you higher availability. We are getting 3-4 9's of uptime right now on our shared services cluster (accomplished by minimizing the # of users per x units of environment, by leaving excess capacity within the system and by having specialized server nodes configured specifically for certain services) - in order to do this, we need to be more expensive - but, proof is in the pudding and the results have been there - so, when looking at launching a system with higher availability it has become painfully obvious that load balancing and higher availability will cost even more money - all of this talk about using clustering to reduce costs to consumers and streamline resource utilization, etc - is basically a load of crap (pardon the french) - cheap shared hosting, crammed with too many users is still cheap shared hosting crammed with too many users - doesnt matter what sort of super mega cluster or fail over system you are using. This shouldnt really be a surprise - each 9 of hosting costs money - always has and always will - not sure why companies market massive shared plans on "grid" or "Failover" or "cluster" systems saying "redundant everything, 100% uptime" - then go and jam 1000's of websites onto a given environment. Either they honestly didnt know or understand what causes downtime, or, they didnt care and just marketed the hell out of it and hoped for the best....

sgarbus
10-30-2007, 10:21 PM
The issue with failover systems is that there is a delay to "failover"Failing over can take as little as 20 seconds -- not long at all.

failover accomplishes nothing more then carrying an issue over to a new server and causing an outage or outages in the meantime....care to elaborate on that?

Failover systems are a poor mans cluster and I have never seen one that will out perform or experience better uptime then a services cluster - all else being equal....I wouldn't necessarily call it a "poor mans cluster", as the price range for failover servers can get pretty high. As far as performance, a highly-optimized failover environment can easily outperform a clustered environment.

Care to test?

cartika-andrew
10-30-2007, 10:43 PM
I wouldn't necessarily call it a "poor mans cluster", as the price range for failover servers can get pretty high.

Sure they can - the price for any system can get expensive - but, load balanced clusters will be more expensive then anything. Failover systems traditionally werent used in shared hosting - they evolved with the sole intention of single server operations having a platform to try and compete with clustering and "Grid" type of systems... frankly, they are not an adequate solution. Now, please dont get me wrong - if you manage your resources and your environment well, you certainly will have an advantage over the average single server cpanel provider - and you certainly will have a value add over them - but, comparing failover systems to clusters or load balanced clusters isnt appropriate...

As far as performance, a highly-optimized failover environment can easily outperform a clustered environment.

Any highly optimized environment would beat out any non optimized environment - failover or clustered or grid or single server environment have very little to do with this....

Look at medialayer - they probably have one of the nicest environments - and they are built on single server solutions - but, they have taken the time and the care to build their environment customized and specialized for applications - they have under allocated their environments, they use specialized web servers, specialized configs, etc - has nothing to do with the fact they are single server or failover or cluster

Care to test?

Already tested the crap out of them internally - but, thanks anyway.... we have tested pretty much every configuration before deciding which way we were going to go. Failover is a poor mans solution for "high availability" and isnt the preferred mechanism for achieving increased availability. Every test we have done in shared environments shows that a services cluster is the way to go. After that, if you want to increase availability - you better go load balancing and shared storage - but, it has to be done right.... if its done to sell massive packages and cram even more domains on there - then heck, might as well stick with single server solutions with failover - because you will probably get better availability....

ldcdc
10-30-2007, 11:06 PM
Well, I honestly do not know ANYTHING about RZ's failover I know what they're explaining on their site. Hopefully I didn't get it wrong.

layer0
10-30-2007, 11:34 PM
In a DDoS attack failover would stay up better than a cluster. Zombies cache the IP of the site they are attacking, so they can continually ping without using alot of BW. A failover would switch servers, and therefore ips, while a cluster would simply buckle.

The IPs do not change in ResellerZoom's setup. If they did, it wouldn't be "instant".

JohnCrowley
10-30-2007, 11:45 PM
...Every test we have done in shared environments shows that a services cluster is the way to go. After that, if you want to increase availability - you better go load balancing and shared storage - but, it has to be done right.... if its done to sell massive packages and cram even more domains on there - then heck, might as well stick with single server solutions with failover - because you will probably get better availability....Amen. I'll back Andrew on this statement.
We offer mainly single server solutions, although our setup does tend to use some service clusters to offload more intensive "add-on" services. And we've found the same type of results as Andrew. A properly optimized and balanced environment on high end single servers (whether all inclusive or cluster based) that are not overloaded and closely monitored can get 99.95+% uptime year after year.

Using "good" CPU's, dual power supplies, SCSI drives in a RAID array with a high end hardware controller, and multiple backup systems can be extremeley reliable, and the simplicity of the setup can avoid these "complex" problems that crop up when running an LB / grid like super system.

- John C.

minipro
10-31-2007, 12:19 AM
RZ's failover works! It fails over and over again!

It used to be pretty good but last 3-4 months have been pathetic. Most of the time reason has been given as network outage, DC problems and DDOS attacks.

Jedito
10-31-2007, 01:31 AM
Jedito is obviously correct - there isnt a "failover" server in our cluster or clusters like ours - ....

I would like to clarify that I wasen't talking to Cartika in particular, but to clustered servers in general :) It was my fault because your company name was there and I did not clarified it in my post.

koii
10-31-2007, 01:51 PM
As an ex-user of RZ Failover service, I must say that it was VERY unreliable, in fact the downtime was worse than for not-failover option (which was pretty bad too).

Hello,

I'm sorry you feel that way about our service. I remember asking you several times for a ticket number but never got a response from you so that I can properly look into and address the issue you were having.

http://www.webhostingtalk.com/showpost.php?p=4672740&postcount=31
http://www.webhostingtalk.com/showpost.php?p=4677972&postcount=51

I would still like to review them if you can provide me those details via PM even though you are an ex-customer.

koii
10-31-2007, 02:00 PM
The issue with failover systems is that there is a delay to "failover" and typically, since hardware failures arent really the primary cause of service interuptions in shared environments - failover accomplishes nothing more then carrying an issue over to a new server and causing an outage or outages in the meantime.

You are correct there is a small delay (few minutes) when the fail over does take place. We're currently developing version 2 of our system for immediate failover with no delay.

Hardware failure may not be the primary reason for the majority of downtime but it is the main reason for prolonged downtime. The purpose of our failover system is to prevent prolonged downtime. We've all been through instances where hours of downtime are experienced due to hardware issues.


Failover systems are a poor mans cluster and I have never seen one that will out perform or experience better uptime then a services cluster - all else being equal....

Our system has the mysql server on a dedicated machine apart from the web/email server. It's obviously not a full blown cluster but it does help to separate mysql from the web server.

koii
10-31-2007, 02:12 PM
Even more so when the backup server is slightly less powerful than the failed server (to reduce cots), or the job of 2 servers is switched to a single one, as apparently is done here: http://resellerzoom.com/failover-technology.shtml

Both of our servers within the cluster are the exact specs. In our case I haven't seen performance drop even when the system is failed over. An example is we currently have a motherboard issue on one of the servers in one of our clusters that need replacement that's been in failed over state for a few days waiting for the part to arrive at the datacenter and it's performed perfectly normal.

koii
10-31-2007, 02:14 PM
RZ's failover works! It fails over and over again!

It used to be pretty good but last 3-4 months have been pathetic. Most of the time reason has been given as network outage, DC problems and DDOS attacks.

If those are the reasons given there isn't much we can do about it. I do know we had switch problems in one of our cabinets causing issues with servers connected there but that has been resolved which may explain the network/dc problems.