We host a number of websites with varying levels of traffic, about 15 site in total. My current setup is two x Linux VPS, via two difference companies. Both are pretty low spec setups (mysql, php, nginx), and i'm not really having any performance issues on either, although do have a few new sites to go live soon.
My issue is with downtime. I've experienced some downtime with one of the hosts, and it causes us issues as company when our clients sites go down (obviously)...
I like the idea of having everything load balanced, or at least with a failover should one of them not be responding. My question is, whats the best way to go about this...
Should i get a third VPS, point all my domains there, and set it up to look between the two VPS servers and send traffic to the best resource? Is there any point having a load balanced or failover situation with different hosts like this? What are the performance implications with them not being in the same network?
In an ideal world, I'd have 2/3 VPS all mirroring each other (would mirroring between different hosts cause problems or effect performance?), and then have a load balancer of some kind to fire traffic around the 'farm'...
That would still leave me open to the load balancer going down, so is there anything else anyone can suggest?
Any advice or suggestions would be much appreciated.
With the load balancer solution, you are still going to have the same problem: If load balancer dies, it all dies.
With load balancers, you also have sync issue if you don't use sticky sessions. This means that once your client visits the site and is directed to one VPS, they should always be directed to this VPS for a set period of time (say 24 hours). This avoid sync issues where they see something and it is then suddenly gone because they have been redirected to another VPS which hasn't synced yet. The load balancer also needs to be able to see when one VPS is down because if a client is stuck to one VPS and it goes down, then they can't access the site.
Instead of using a VPS host, have you looked into cloud solutions which provide automatic failover? The node your VPS is on dies, it automatically starts up on another node with no downtime.
Mirroring data to each other can be an extremely complicated process. You need to peer to peer replicate between mysql databases and physical data (if this changes).
All of this can be very costly, how expensive is downtime to you?
First it seems that all/most of your issues are with one provider? How about getting a better provider? Overall I agree the best solution is going to be with a provider that gives you fail-over, there are a number that do this. I have gone down the DIY path with services like DNSMadeEasy and file/db replication across different providers and IT does basically work most of the time. But there is still some downtime this just minimizes it. Now if I need true failover and HA I would look at a provider like Cartika (I am a big fan of them for this type of hosting).
a) there's no such thing as no downtime. A SAN backed solution ranges from 3-mins to 30 depending on the boot VM sequence and fsck. Only things that switch faster are synchronized hot mirrors. Still, it's simple and hands-off if you can live with that.
b) memcached session support in php is typically used if you have a loadbalancer front it.
c) because its rare to span memcached or active databases across datacenters, it's best just to pull everything into the primary DC A. DC B is setup similar to A but only acts as a warm spare. DNS Failover controls which dc is gets the traffic and you only need to spin up the FE's as your tolerance allows.
Below is your basic cross-datacenter cluster setup.
DC A DC B
| | | |
FE FE FE (FE)
MC MC MC (MC)
DB(pri) --- DB(slav)
Loss of LB-A: DNS Failover routes to LB-B. Small hiccup in reachability.
Loss of a FE-A: LB-A routes automatically to remaining FE-A.
Loss of all FE-A: LB-A routes to FE-B by priority (degraded but available). Loss of all stored cache sessions/data.
Loss of MC-A: Lose part of stored cache sessions/data.
Loss of all MC-A: Lose all stored cache sessions/data.
Loss of DB-A: Lose DB replication window. Promote DB-B as primary and route via MySQL Proxy. Contemplate forcing a switch to DC-B or bring back another primary in DC-A.
It might seem a bit complex, but it balances performance with availability. Most machine failures will result in some slight session hiccup but otherwise people can log back in. If better memcache resilience is required, you could try something like couchbase which will replicate entries within its cluster. That pushes session loss to only occur in situations like network failures or whole DC outages.
In the cases where you end up spanning both DCs, you can make the decision manually to failover the DNS. This is made at your leisure and you could choose to leave it temporarily as-is if you think it'll be resolved quickly. If you do make the switch, those customers behind cached DNS will still hit the old LB-A until their cache expires. The LB (if it happened to be the only thing alive at DC-A) will route everything into DC-B and back.
The above is generic, and you can try and whittle down each cluster to just one FE/MC/DB stack and skip the LB. That collapses it to a basic DNS failover between two machines at different DCs. Which obviously gives a little less flexibility in bringing things up/down and switching.