Results 1 to 32 of 32
  1. #1
    Join Date
    Dec 2000
    Location
    Montreal
    Posts
    539

    Clustering/ fault tolerance

    There is so much discussion about load balancers on the boards, but no one mention anything about true redundancy, where one server goes down it has being replaced by another one, and I'm not referring to pure mirroring, because that's too expensive. What I'm referring to some more like the raid 5 of servers. Where there is one mirrored or redundant server to many others. also interested to hear other solutions out there. At the end of the day, the ultimate goal is uptime.

  2. #2
    Join Date
    Jan 2005
    Location
    Richmond, VA
    Posts
    3,102
    Netfirms has this with their new Enterprise service. I'm planning to upgrade one of my sites to it very soon, but I'm holding out mainly to hear how others who are using it are doing. So far, it's been very hard to find folks who are. But anyway, that's just one example.
    Daniel B., CEO - Bezoka.com and Ungigs.com
    Hosting Solutions Optimized for: WordPress Joomla OpenCart Moodle
    Data Centers in: Chicago (US), London (UK), Sydney (AU), Sofia (BG), Pori (FI)
    Email Daniel directly: ceo [at] bezoka.com

  3. #3
    Join Date
    Jun 2004
    Location
    Tampa Florida
    Posts
    428
    sharkman,
    loadbalancing, with quality equipment, allows any server to die and the extra requests to be picked up by others in the pool.
    Prior to reliable load balancers many used the hot space method, But load balancing allows you to utilize that spare hardware during the long periods where there are no outages.

    Basicaly, it saves money over the hot space method and provides the same level of fault tolerance.

    As a side note, most good load balancers support the hot spare method of failover in case the load balancer itself dies, Similar to vrrp or hsrp in routers.
    Rock solid hosting and dedicated servers since 1998!
    StabilityHosting Where stability and uptime are king!

  4. #4
    There is so much discussion about load balancers on the boards, but no one mention anything about true redundancy, where one server goes down it has being replaced by another one, and I'm not referring to pure mirroring, because that's too expensive. What I'm referring to some more like the raid 5 of servers. Where there is one mirrored or redundant server to many others. also interested to hear other solutions out there. At the end of the day, the ultimate goal is uptime.
    Excellent questions/points... and love the comparison to RAID - because that is actually how it works - briiliant analogy - thanks !

    as Sharkman pointed out - this is all about uptime - and uptime is achieved through removal/identification of single points of failure...

    A single server can be more reliable with less single points of failure then a load balanced array - and that is what is often missed.

    for example:

    situation 1:
    Dual Processor, Redundant Everything like Nics, HDD's (RAID), Fans, Power Supplies - running at 100% of capacity in peak periods

    situation 2:
    2* Single Processor Machines - load balanced, single fans, single Power Supply, Single Hard drive, single nic - LB array is running at 100% of capacity at peak periods

    Situation 1 will always be more reliable then situation 2 as situation 2 has ALOT more single points of failure then situation 1 (because if either node fails in situation 2, the array will fail). Now, this is an extreme example to illustrate a point.

    loadbalancing, with quality equipment, allows any server to die and the extra requests to be picked up by others in the pool.
    Vantage, obviously completely agree - Ideally this is how it would work and the resultant benefits would be higher uptime by removing single points of failure not to mention greater performance through streamlined utilization of resources. This is of course assuming that it is done correctly with redundant storage, load balancers, etc... The problem is load - the problem always has been load and always will be load. If you are asking an array of servers to handle more load then they are capable of, you are creating multiple points of failure (more so then if you are asking a single server to handle loads beyound its capabilities) - as, if any of those nodes go down, the array may crash and or performance will be dramatically affected. Evidence of this can be seen everywhere - as many of these HA solutions have been brought down by just this very thing...

    I guess my convoluted point is - Load Balancing is wonderful and can improve uptime and performance, but, it needs to be done properly and enough resources need to be thrown at the array so as to accomodate x number of units failing (depending how redundant you want to be) -

    this is very much like RAID arrays (to get back to Sharkmans original point ) and disk drives - where RAID1 allows for 1 hard drive to fail and higher levels of RAID will allow for more hard drives to fail - in a LB Array - you need to identify how many nodes can fail without affecting service - and in order to accomplish this, your array needs to be running at an overall capacity which can accomodate 1 or more nodes failing during peak periods without affecting service levels - otherwise, you are just creating unnecessary single points of failure...

  5. #5
    Join Date
    Dec 2000
    Location
    Montreal
    Posts
    539
    Lol, Carika my friend, that was very technical. At the end of the day, do you have recommendation for a solution that duplicates raid 5 in a server format or you recommend just a good Load balancer?

  6. #6
    LOL sharkman, made me smile

    Sure I have a recommendation - redundant load balancers, 3 servers to handle the load of 2 servers (4 to handle the load of 3 and so on) (so if 1 server goes down, your array will not be affect) and a storage array.

    The higher your budget, the better quality everything you can get and the fewer single points of failure...

    I like NetAPP as storage because the solution has zeo points of failure (if you buy the right model - I think 900 series of higher) - but, they are quite expensive.

    Another solution for storage is to buy 2 less expensive NAS devices and then Load balance/mirror between them.

    Or, since NAS devices are pretty reliable, you can just ensure you have RAID protection in the NAS and plan for a potential failure in the NAS with a speedy mean time to recovery. This obviously means an additional single point of failure you need to manage - but, none the less it is an option...

  7. #7
    Join Date
    Nov 2006
    Location
    College Station, TX
    Posts
    185
    I'm not a huge fan of NAS... for a lot of applications, esp. database, you have to buy an immense amount of NAS with a big storage buffer to handle the 50 storage requests that are coming in at once. I've seen I/O wait go through the roof even with three or four servers hitting the NAS at once... and depending on how you handle mounting that network share, NFS isn't exactly the most stable beast on the block, and SMB's so resource-intensive that you may as well have the content local for all the work the processor's having to do to get to the content.

    Fibre channel is a whole 'nother deal, of course... but we have our 'control panel' for our four or five sites hosted on one server, and the load balancers route requests for the subdomain it's on to one server, which then pushes content to the other servers in the cluster. That content server can also take a content server out of the load balancing pool if it can't reach it.

    If you haven't already, read Brad Fitzpatrick's description of how Livejournal does their clustering, and the scaling issues they ran into and ultimately solved.

  8. #8
    I'm not a huge fan of NAS... for a lot of applications, esp. database, you have to buy an immense amount of NAS with a big storage buffer to handle the 50 storage requests that are coming in at once. I've seen I/O wait go through the roof even with three or four servers hitting the NAS at once... and depending on how you handle mounting that network share, NFS isn't exactly the most stable beast on the block, and SMB's so resource-intensive that you may as well have the content local for all the work the processor's having to do to get to the content.
    Completely agree. When we first started testing, redundant NAS seemed like the way to go. Cheap and NFS is supposed to be painless. However, every test we ran showed the same results. HUGE I/O wait issues, NFS permission problems (amongst other NFS issues - they can be oversome, but, it certainly isnt ideal) - end result is we went with NetAPP (fiber) - a much larger investment - but definately worth it.

    The last hurdle for us is how we handle SSL - it works now, but, not "good enough" for our liking...

    Having said this, for a small company, just starting out and wanting to implement load balancing with shared storage, or for companies selling HA solutions to SMB, its a decent solution (not ideal, but workable - and still, in my humble opion, superior to "HA" achieved through rsynching servers and such....)

    karlkatzke, I would be interested in your opinion on this - do you believe a basic 2 server LB situation with a NAS shared storage is a workable, affordable HA solution for the SMB space? (it is something we are thinking about offering, but are still undecided - would love your input)

    If you haven't already, read Brad Fitzpatrick's description of how Livejournal does their clustering, and the scaling issues they ran into and ultimately solved.
    Great read - dont think you can research HA, Shared Storage, etc without running into it Great read though and I am sure alot of people will see good value in it

  9. #9
    Join Date
    Jun 2004
    Location
    Tampa Florida
    Posts
    428
    I renewed my sun certifications a while ago and there was a guy in my group that worked on huge DB clusters. (By huge I mean 15 or 20 sunfire 25k servers with using oracle RAC clustering) aand they had a storage solution that sounded really interesting. I couldnt believe that it was fast enough though. But that was totaly my biased opinion based on my NFS history...

    They ran 4 large EMC sans.
    Mirrored.
    Into 8 (2 each) Sun sunfire 280Rs (dual 1.34Ghz sparcs3s with 16GB of ram)
    Then they has 10GBethernet cards from the 280Rs into the 25Ks. Using NFS...

    They were using suns IP address redundancy on everything. He claimed that they could loose 3 of anything behind the 25Ks and the DB servers wouldn't notice it at all.
    I mentioned that I didn't think NFS was a great way to deal with this and he said that with all the ram in the 280Rs the buffer took care of any speed issues..... I am still rather dubious about this setup. But it was an impressive cluster and if you dump that much money into something I would think it would be well thought out and would work pretty good.
    Rock solid hosting and dedicated servers since 1998!
    StabilityHosting Where stability and uptime are king!

  10. #10
    I mentioned that I didn't think NFS was a great way to deal with this and he said that with all the ram in the 280Rs the buffer took care of any speed issues.....

    and if you dump that much money into something I would think it would be well thought out and would work pretty good.
    LOL - I believe that - but, think its more cost effective to go with NetApp (never thought I would say that )

  11. #11
    Join Date
    Jun 2004
    Location
    Tampa Florida
    Posts
    428
    Just thinking that basicaly a netapp is an nfs box.. and I may give a try to using a few T-1000s and an EMC I have not been able to find a use for and set something like this (only MUCH MUCH smaller) up. NFS4 has proven to be MUCH faster than 3 and with it natively integrated into the ZFS fielsystem on Solaris10 is may respond that much more quickly.

    I am currently using said hardware to play around with iSCSI... But have been unimpressed by the transfer speeds.
    Rock solid hosting and dedicated servers since 1998!
    StabilityHosting Where stability and uptime are king!

  12. #12
    Join Date
    Nov 2006
    Location
    College Station, TX
    Posts
    185
    Quote Originally Posted by CartikaHosting
    karlkatzke, I would be interested in your opinion on this - do you believe a basic 2 server LB situation with a NAS shared storage is a workable, affordable HA solution for the SMB space? (it is something we are thinking about offering, but are still undecided - would love your input)
    If by SMB you mean Samba, I don't think I understand the question. (I think you're using SMB as 'Small to Medium Business' ... which is a marketing question that I'm not qualified to answer, as I don't have a lot of experience in the hosting world -- just some small business stuff I've done on my own. I'll take a crack at it, though.)

    Conceptually, the technical implementation is right for a business hosting -- as long as you have the details correct. You'll want a lot of RAM in the storage device to buffer reads, and as long as logfiles and page caches get written locally. Most upper-end SMB sites (as in, the kind of company that's actually going to deal with this much traffic, and for whom their website is mission-critical) are dynamic these days, so you want to have a lot of the code that they're executing cached. You'll want dual/dual everything in the NAS. You'll want good battery backups. You'll want to have the database either running a cluster edition, or in a redudnant mirrored config, or behind it's own L4 load balancer with mirroring on the backplane.

    I'd probably turn to something like Redhat Cluster Suite if I was going to pursue this end of the business. They're supposed to release a new version in March that'll really be the cat's pajamas.

    Quite honestly, what *I* would find valuable as a small business would be geographically balanced, affordable-for-small-businesses hosting. Geographical balancing is possible with a lot of newer load balancers, and while yes you'd have to do mirroring, most sites aren't changing frequently enough for that to become a huge issue as long as you provide a 'live mirror' button. With the majority of businesses being in disaster-prone areas, I'd love if my website could survive a tsunami on the west coast, an early blizzard in the midwest, thunderstorms that spawn tornados in the south/southeast, and a tornado or nor'easter making it's way up the atlantic coast... simultaneously. Seems we're entering an active weather cycle that's going to put disaster recovery plans to the test. I'm not sure that the small business market is ready for it, but there are a lot of medium-sized e-Commerce businesses who are on VPS's or dedicated servers, and it's technically feasible as long as you figure out some way to geographically balance and integrate the *database*...
    Last edited by karlkatzke; 12-30-2006 at 01:39 PM.

  13. #13
    You do get better uptime, but you can't get true 100% uptime because when your mail server or Apache needs to be troubleshooted, by the time you realize it, your server has been down several minutes, even if it only takes several more minutes to "unplug" the bad server, to reload it and to plug it back.
    Josh Lieber

    iTechPath | Fully managed servers with 24/7/365 support.
    PHP 5, MySQL 5, RHEL, cPanel & rvskins, and much more...

  14. #14
    Load Balancers have checks they perform to confirm that the server is still active and able to accept new connections. Once that server stops responding, it's taken out of the mix and no longer given traffic until it starts to respond again.
    Datums Internet Solutions, LLC
    Systems Engineering & Managed Hosting Services
    Complex Hosting Consultants

  15. #15
    Hello

    I think you're using SMB as 'Small to Medium Business'
    Sorry for the confusion - by SMB I meant Small and Medium Business - not Samba (my apologies for the acronyms )

    I'd probably turn to something like Redhat Cluster Suite if I was going to pursue this end of the business.
    Exactly

    The technical specs you outlined pretty much agree with what we are looking at. Our only concern with this sort of "HA" configuration is the single points of failure in the NAS - and we are wondering out loud if our default "services" cluster configuration that we sell for high end application hosting wont, in fact, be more reliable then what was described above - though obviously a LB array of web servers could probably handle more load then our traditional services cluster.

    Quite honestly, what *I* would find valuable as a small business would be geographically balanced, affordable-for-small-businesses hosting. Geographical balancing is possible with a lot of newer load balancers, and while yes you'd have to do mirroring, most sites aren't changing frequently enough for that to become a huge issue as long as you provide a 'live mirror' button. With the majority of businesses being in disaster-prone areas, I'd love if my website could survive a tsunami on the west coast, an early blizzard in the midwest, thunderstorms that spawn tornados in the south/southeast, and a tornado or nor'easter making it's way up the atlantic coast... simultaneously. Seems we're entering an active weather cycle that's going to put disaster recovery plans to the test. I'm not sure that the small business market is ready for it, but there are a lot of medium-sized e-Commerce businesses who are on VPS's or dedicated servers, and it's technically feasible as long as you figure out some way to geographically balance and integrate the *database*...
    Very interesting concept - doesnt really fit our model as of now - but, I think you are onto something there

    You do get better uptime, but you can't get true 100% uptime because when your mail server or Apache needs to be troubleshooted, by the time you realize it, your server has been down several minutes, even if it only takes several more minutes to "unplug" the bad server, to reload it and to plug it back.
    As Datums has indicated, this really isnt an issue - most load balancers will not only direct traffic to a server that is available, but, most of them will also direct traffic to the server under the lightest load. The only time you would have an issue is with sticky sessions - and if the server that a particular session was assigned to went down, well, those users would lose their session. However, this, in my opinion isnt a big deal - it just needs to be understood (also, we are playing with ways to force terminate a session when that happens and or force a user on a sticky session to another machine when the one they are tied to stops responding - obviously they would still get logged out of a site - and if they were working on something like a CMS editing pages, etc - they would lose that data - but, thats not too bad of an option in a worse case scenario)

  16. #16
    Join Date
    Nov 2006
    Location
    College Station, TX
    Posts
    185
    Yeah, RHCS is nice because of the GFS ... global filesystem. Also available for Fedora Core and Centos, of course, without the absurd licensing fees (and without the support, unfortunately).

    Honestly, I'd like to see a 'mesh filesystem' developed for clustering -- where you have a disk partition on each server that every other server mirrors automatically... if the partition goes bad on one server, the server can still use the other partitions on the other servers while it's own array rebuilds... kind of like a scaled up raid5.

    Simple to describe, I'm sure it's frighteningly complex to integrate.

  17. #17
    Hello,

    If you want to do it more cheaply ... then go for hosting your site in 2 different hosting companies. In that way also you can keep the guaranteed uptime of your site.

    Thank you.

    Regards,

  18. #18
    Join Date
    Mar 2006
    Location
    Reston, VA
    Posts
    3,132
    along the lines of fault tolerance dedicated machines, www.openqrm.org looks promising. We havn't tested it yet but in theory if your running a dedicated machine and it goes down for whatever reason, your OS image would then be booted to a server on standby mode. Downfall? Need a nice little SAN or a netapp on fiber to keep speeds good.

    Openqrm looks to be clustering ont he fly by allocating servers on stand by with X operating image, swap is stored on the host computer and all the filesystem ect is stored on the SAN.

    Anyone else deal with qrm yet?

    But as far as a HA dedicated server anyone can mirror a server to another machine, set the min active hosts to 1 server set the priority to 10 for the "leading server" and 1 for the failover server, lead box goes down failover takes its place. only draw back would be mysql replication. But hey thats where the new mysql 5.1 clustering comes into play whenever that becomes stable if it hasn't already.

  19. #19
    Join Date
    Nov 2006
    Location
    College Station, TX
    Posts
    185
    The problem with MySQL clustering is that it's all RAM-based ... so if your database size exceeds what your RAM/kernel/hardware can deal with, you're screwed.

    Haven't dealth with qrm, but I'm leery of anything that has to *boot* an image. I've dealth with LTSP a lot and ... well, the booting is the biggest struggle.

  20. #20
    Join Date
    Oct 2006
    Posts
    68
    It seems the only point of failure that early or late generates half an hour of downtime is when your Raid-1 or Raid-5 array is decayed, you need to stop your machine to change the drive and to rebuild the array.

    One possibility to avoid it would be to have a Raid-1+1 array with the 2 pairs of disks located on 2 different machines. But then we wouldn't be able to rebuild the Raid array without stopping disk i/o...

    Is there a storage solution that allows you 100% uptime and where you can change decayed disks on the fly without having to stop i/o on the alive disks? In other words, a solution that doesn't force you to pull the plug to change defectuous hard drives.

  21. #21
    Join Date
    Oct 2006
    Posts
    68
    Does GFS provide 100% uptime? What happens when the HDD that hosts your heavily accessed database dies? Does it require to stop everything while you change the drive?

  22. #22
    Join Date
    Dec 2006
    Location
    /dev/null
    Posts
    41
    Quote Originally Posted by karlkatzke
    The problem with MySQL clustering is that it's all RAM-based ... so if your database size exceeds what your RAM/kernel/hardware can deal with, you're screwed.

    Haven't dealth with qrm, but I'm leery of anything that has to *boot* an image. I've dealth with LTSP a lot and ... well, the booting is the biggest struggle.
    Don't forget space needed for buffers and cache. On hosts with large databases with lots of indexes and access this could easily be another 512MB of ram.
    Caro.Net: Support is everything
    Offering High Quality Dedicated Servers.

  23. #23
    Join Date
    Dec 2006
    Location
    /dev/null
    Posts
    41
    Quote Originally Posted by TigerHosting
    It seems the only point of failure that early or late generates half an hour of downtime is when your Raid-1 or Raid-5 array is decayed, you need to stop your machine to change the drive and to rebuild the array.

    With a controller that supports hot-swap, you shouldn't have to down the box at all. You should be able to replace the drive and rebuild on the fly. Other then the disk i/o performance going down a bit there shouldn't be any negative results of doing a hot swap and rebuild.
    Caro.Net: Support is everything
    Offering High Quality Dedicated Servers.

  24. #24
    Join Date
    Oct 2006
    Posts
    68
    Won't there be data inconsistency? As the users are writing data to the alive HDD while the other HDD is recovering data at the same time...

  25. #25
    Join Date
    Dec 2006
    Location
    /dev/null
    Posts
    41
    The raid controller should take care of that. An example of this in action would be the use of a hot spare.
    Caro.Net: Support is everything
    Offering High Quality Dedicated Servers.

  26. #26
    Join Date
    Dec 2000
    Location
    Montreal
    Posts
    539
    I love opening such great discussion in WHT . So would love to bring it to some form of conclusion. For a small to medium size shared hosting company, what would be the most cost effective yet stable solution to get a clustering/ fault tolerance system going? I do consider Nas to be fairly affordable, but from what I hear from this thread it's not that stable.

  27. #27
    May I suggest Centos Cluster Suite with GFS?
    Josh Lieber

    iTechPath | Fully managed servers with 24/7/365 support.
    PHP 5, MySQL 5, RHEL, cPanel & rvskins, and much more...

  28. #28
    Join Date
    Dec 2006
    Location
    Missoula,MT
    Posts
    45

    Our MySQL hosting - failover architecture

    Ok, here is our concept on how we handle mitigating multiple levels of failure with our Managed MySQL hosting business.

    This requires a little overview of our system architecture.

    We have active/active pair of 2 machines. The machines have the following hardware:

    4 x AMD Opteron 8212 (8 x 1 MB Cache) (Dual Core - 64bit capable)
    8 GB DDRII RAM (667)
    3 x 73 GB SCSI - 15k rpm drives
    Gig ethernet
    2 Nic, with private VLAN

    The host OS is RedHat Enterprise Linux 4. We support on each node mysql 4.1.20 and mysql 5.1.14.

    Ok, now with that said, we have a series of what we call "sensors". These are essentially monitoring scripts that assist our nodes in determining when and how to failover failed resources.

    We have isolated the following failure scenarios:

    A little bit of info:

    We have 2 machines, we call them A and B. Each machine has 3 drives, the / partition is one drive, it contains the host OS, the logs from all MySQL processes and an "internal" MySQL instance. The other two drives are mounted as /disk1 and /disk2, with no RAID.

    For each version of MySQL we do the following:

    ex. mysql 4.1.20

    1. The data directory for 4.1.20 on server A exists in the following format:
    /disk1/sA/mysql4/4.1.20/vA
    /disk2/sA/mysql4/4.1.20/vB

    *the vA is our internal naming convention for "volume A"
    *the sA is our internal naming convention for "server A"

    2. Because we also have the exact same configuration on server B, it gets a mirror of server A's content, but doesn't run any of the resources of server A until failure. A "df" of /disk1 and /disk2 would look identical.

    For Server A Volumes on Server B we have the same paths:

    /disk1/sA/mysql4/4.1.20/vA
    /disk2/sA/mysql4/4.1.20/vB

    Server B would have it's own native volumes as well that look like:

    /disk1/sB/mysql4/4.1.20/vA
    /disk2/sB/mysql4/4.1.20/vB

    What this does is allow us to map 4 IPs for our MySQL 4.1.20 processes. 1 IP is mapped to a specific server and volume, with 2 per node.

    What happens when server A has a failure that would cause loss of service to MySQL?

    1. The sensors would trip.

    2. A series of cleanup protocols would be enacted on server A from within server A if it is alive. If A is down completely, server B handles bringing up resources on itself automatically and will do things like repair mysql issues, turn on local monitors that used to be local to server A and so on.

    3. A kill of the instance of MySQL on A that was the problem, if isolated to the instance. The data would be synced again, with MySQL off, alarms tripped that server A is down.

    4. Server B would bring up the resources of A and notify of the takeover.

    The end results is that the IPs and MySQL content from server A are brought up, in a structured, automated way on server B, complete with notifications and alarms that allow the MySQL processes to continue to run, but immediate investigation occurs with some usefull information about what happened.

    Also, by spliiting up the volumes over two disks per machine we are able to have one disk fail on A, the MySQL resources will come up on B, giving us time to handle a replacement of A or whatever and then bringing it back into rotation. While, at the same time, databases on Server A Volume B continue without seeing any interruption of service.

    Now, this isn't a million dollar EMC "you can bash 3 blades with a sledgehammer and MySQL won't miss 10 bytes of data" type solution. The concept behind this is to use dedicated servers, add on the software to handle the failures with a robust sensors and monitoring system and MySQL technical support and admin support that fixes these failures, although, the failures have already been handled, it's better than having to open a ticket. In all cases we strive to be the first to know about failure and we want to have the policy that at 2am if a drive fails, you will not know about it until you check your email at 8am and it doesn't matter because the failover happened, the failed component was repaired and the original service was reinstated all while you sleep with minimal loss of data and minimal downtime, typically about 6 to 11 seconds.

    We decided on the beefy machines versus 5 or 10 medium or smaller machines because with replication the number of slaves that can potentially get "out of sync" when master failure happens is not worth the investment. Also, seeing as how we have logicaly 8 procs per machine and one query in MySQL will grab one proc at a time, having a multi-proc box with decent RAM and fast drives along with reducing the number of replicated slaves, we see better performance. Also, we don't limit the usage of the system resources, which allows things thing data imports at 2AM to grap 8cpus and max out 8GB of RAM, as the only decent way to improve write performance in MySQL is with hardware. Also, the paradigm is easier for us to manage, these two server pairs are A or B and have identical configurations.

    Ok, I have hijacked the thread long enough.

  29. #29
    Join Date
    Dec 2006
    Location
    /dev/null
    Posts
    41
    The answer as to what's going to work for you is highly dependent upon what kind of services and data you're trying to make HA, what/who is writing data (and how frequently this is happening), what level of interruption and or downtime you're willing to endure, how much you're willing to spend, and what your risk is if you're down (what is the "cause and effect" of you being offline). Your original post was too vague to truly give you an educated opinion as to what you need to look into.


    HA solutions for a clean Apache are going to be different then one using php and mysql to provide content. They're also going to be different then those providing SMTP/POP3/IMAP services.

    Equally so, data written by your software is different then that uploaded by users (i.e. profile updates vs. image uploads). Is this data time sensitive (must these changes be reflected immediately upon submit)?

    Every solution has its own ways of failing. You must assume at some point in time your solution will fail to some degree. How are you going to handle this and how many faults can your solution endure before you're functionally offline?

    What is your budget like? Can you put enough money into the solution to properly design and maintain the hardware/software? There's obviously different levels of HA and the higher your budget the closer you can get to true 100% uptime.

    What is the risk to you, and your customer(s), if your solution is offline completely? Are your customers simply going to open a ticket with you inquiring about why their service is offline or are you going to be sued because the doctor trying to access his patient files couldn't get to them in time to save their life (a little far fetched but plausible). All service providers have an obligation to provide quality service to the best of their abilities but I'd like to think that if you're providing a truly critical service your responsibilities are significantly greater. As such it should also influence your design.


    I realize this doesn't really answer your question directly but hopefully will give you an idea as to what questions you need to answer yourself so that we can better help you.
    Caro.Net: Support is everything
    Offering High Quality Dedicated Servers.

  30. #30
    Join Date
    Nov 2006
    Location
    College Station, TX
    Posts
    185
    Chris said pretty much everything that needed to be said for a medium sized hosting company. Keep in mind that your requirements for an individual corporation are going to be VERY different -- and currently, most load balancing technology and development has been done with single companies hosting few domains in mind. I think the Coyote Point guys mentioned to me last time I was chatting with the tech sales staff that they were bringing out some things that will make it easier for hosts to deal with load balancing.

    One point of clarification: NAS is stable, but *can be* slow, causing LOTS of iowait on your servers. NFS is stable when configured correctly (but can be a REAL pain in the *** to get configured correctly), and many NAS-mounts use NFS. ... NAS is also WAY too slow for i/o intensive activities like databases.

    RHEL/CentOS Cluster Suite is not a magic button you can push and have everything work, which was said many times... please read the thread before tossing a one-liner reply in. To expand on the "not magic": Cluster Suite is a rebranded LVS combined with the formerly proprietary GFS... GFS being another implementation of NFS. There's quite a few things that scare me, frankly, about RHEL/CentOS Cluster Suite from a "Guy who's *** is on the line if the thing shits the bed" standpoint. I ran down the choices when I was looking at a solution for work.

    If *I* were to run a small or medium host, I'd probably have a BEEEEFY MySQL server with a hot swap replication slave running behind it, several pizza box web servers running behind a Coyote Point or other 'hardware' load balancer, and I'd set up a 'point of entry' for clients (cpanel, ftp, etc.) with a custom daemon that watches for new content or setting changes and rsync's the changes from the point of entry to the web clients.

    I haven't handled SMTP/POP/IMAP clustering, but that would probably live on a NAS-type device and have several processor nodes handling the requests via layer 4 load balancing. Nothing generates more complaints than slow email, and in a shared hosting environment, it's too easy to get spammers on your system that blow your smtp/pop/imap out of the water.

    Also, everything would be vendor equipment -- HP, Dell, or Sun, with support contracts and some spare parts like hard drives kept in stock. None of the 'buy a server from newegg' crap. That'll save you $1000 in the short run, and cost you a bajillion dollars the first time the fit hits the shan. I'd rather buy some year old vendor servers off of eBay for processor nodes as opposed to having a big collection of random crap.

  31. #31
    Join Date
    Sep 2005
    Posts
    45
    What about blade server solutions on this topic ?
    Blade servers can work in load-balancing mode and files can be located on a shared storage solution ?

  32. #32
    Join Date
    Jun 2007
    Posts
    64
    If your looking for a clustered environment, I personally use imountain.com hosting solution today. Very reasonably priced and good performance (I don't want to say great yet, as I have not run any tests against them yet. Great support also.

    Cartika hosting and reliablesite.net are a couple of other hosts that offer, although I have not personally tried them yet. Cartika seems to have great support. I'm not even a customer yet, but I emailed them some technical questions and they got back with me quckly with some pretty lengthy explanations.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •