Web Hosting Talk







View Full Version : Clustering and other Big Computer advice needed...


Brian S
11-15-2002, 02:10 AM
Hi Folks,

Welp, the time has come where my free hosting service is growing to it's technical limits. I'm starting to chew on several options for where to go from here. Inevitably, I will need some sort of custom solution.

I'm hoping some of you could point to useful information on hosting and cluster technology. Does useful and affordable clustering exist outside of the science field? Ultimately, I'd like to have a solution act as as one big virtual server, where I can add storage and servers at will and have it seamlessly integrate into the system. Anything out there like that?

My main problems are common: storage and processing power. I can no longer have storage and processing in one server, so I'm looking for ideas on how to host the service across servers while maintaining a simplified single administration and user interface.

If you've got ideas, or vendors to point to, feel free to respond here, or pop me an email at brian@250Host.com .

Thanks,

Brian

daveman
11-15-2002, 02:19 AM
I'm not at this point yet but definitly am interested in the repsonses. :)

rusko
11-15-2002, 02:44 AM
you are both welcome to pm me so i can give you an IM address to discuss this. i dont claim to be an expert on the subject, but the discussion may be worth it =]

FHDave
11-15-2002, 02:51 AM
Just note, the extra performance will not scale linearly as you add more and more web servers. In fact, it will be saturated after several servers (might depend on what prgram you use).

It may worth noticing that to some extent, apache can be your bottleneck. Starting at around 150-200 httpd connections, you will start to see some server slow down (not sure whether this is the case with newer apache versions). Please read the following article: http://www.acme.com/software/thttpd/benchmarks.html .
For simple pages, images, etc, you may perhaps do better with thttpd. Or you can let your apache handles dynamic pages and let thttpd handles your static pages. I have heard people do this, although at this point I am not yet sure how to.

Best of wishes!

neil
11-15-2002, 05:21 AM
voxel.net does managed clusters.

timelord
11-26-2002, 12:20 AM
(This is a summary of some information about clusters that I sent to Brian. I figured I would post a summary to the thread.)

The first question is: What do you mean by "clustering"? It means different things to different people/operating systems.

o Windows: Clusters are mostly a hot/fail scenario. You have a service running on one machine, and if it fails, it restarts on another machine. YOU CANNOT SHARE A FILE ACROSS MEMBERS OF THE CLUSTER. (There are rumors that Microsoft is working on a clustered file system, but it is not currently available.)

o Networking: This generally refers to load balancing solutions, where you have a service running on N servers, and you have something either in front of the servers (load balancer) or managed by the servers (VPS on Linux, Microsoft Application Server, etc.) so that there can be one virtual server (a shared IP address) and the actual work is shared among the N actual servers.

o Linux: Linux supports multiple clustering models:
- Hot/Fail clustering
- Virtual Private Server
- Parallel Computing: the "clusters" used in research facilities
- Cluster file server/applications: a single file system can be shared among multiple servers, and applications (such as Oracle) can run on the multiple server accessing the same database!
- SSI (Single System Image): True clustering - the servers make up one big server. Every server "sees" the processes on all servers, they share one file system, distributed lock manager, etc.

o Blade computers: A whole other approach! (Everybody saw Dell's announcement of their blade computer? It's worth understanding it.)

Probably one of the best (most elegant) solutions is http://www.egenera.com - it is what Merrill Lynch used when they went to Linux.

So, at the end of the day, whether "clusters" are "useful and affordable" really depends on both your definitions and your needs. Database and application servers need a very different sort of cluster than a web server. And, as a side note, cluster's have been standard in business environments that require very high availability (banking, telecomunication, health care, etc.) for over a decade.

[Sorry it took me so long to reply, I had to first get a WHT username!]

cperciva
11-26-2002, 12:31 AM
Originally posted by Brian S
Ultimately, I'd like to have a solution act as as one big virtual server, where I can add storage and servers at will and have it seamlessly integrate into the system. Anything out there like that?

Short of SGI Origin systems and suchlike, no. (And if anyone tells you otherwise, they don't know what they're talking about.)

That said, for many applications you can get "close enough" to that ideal; but you have to do such things at the application level.

apollo
11-26-2002, 02:23 AM
Well, there are some clustered filesystems ... have a look at Sistina's GFS......

Brian S
11-26-2002, 03:48 AM
That's what I wanted to avoid -- modding programs to understand the virtual server. I think a decent compromise would be to use a clustering system like Linux Virtual Server or a hardware load balancer, and have requests distributed among a farm of servers, all used in conjunction with a large NAS or SAN storage array. But I think even that is overkill for me at this point.

I think my plan for the short term is to purchase a beefy dual Xeon system with about a Terabyte of RAID 5 storage. It should meet my needs reasonably.

Brian

Brian S
11-26-2002, 03:52 AM
On the topic of Clustered File Systems, I could find *no* information regarding Oracle's CFS. I've heard about systems where the cluster file system shares all the server farm's Hard Drives and they act as one big drive, but I've seen no information regarding this. If anyone has info about filesystems that can do this, and the benefits and drawbacks, I'd appreciate it.

Thanks,

Brian

timelord
11-26-2002, 02:47 PM
As was mentioned you do have GFS/OpenGFS (www.opengfs.org) - however it requires a dedicated node to handle the distributed lock management.

You may want to read the press release where RedHat talks about how they were going to distribute Oracle's cluster file system (http://www.redhat.com/about/presscenter/2002/press_ocfs.html) as well as articles written when Oracle announced it earlier this year (http://www.databasejournal.com/features/oracle/article.php/1446381). The best one is Oracle's FAQ on their cluster file sytem ( http://otn.oracle.com/tech/linux/htdocs/ocfs_faq_110602.html).

In addition, there is also CIFS (common internet file system). It was designed by Microsoft, but Samba has a freeware implementation (and I believe it is in the Linux kernel).

For your activity, I think you would be better servered by a storage controller (an external raid controller).

You also asked about the benifits and drawbacks. From a high level perspective, the benifits are obvious - the ability to see your files from all your systems. However, there are several drawbacks:
o Locking: How is locking information shared? Does this produce extra traffic and delays?
o How do remote nodes get notified of dirty data/cache (to prevent different nodes from having different opinions about the contents of a file)?
o Increased lantancy and slower data access (because of network overhead).

Stuff that has a high write rate (such as log files), should probably NOT be placed on this type of shared storage (a SAN being an exception).

Brian S
12-02-2002, 08:24 PM
I wanted to thank Dean of TimeLord Consulting. I exchanged quite a few emails with him and he went out of his way to educate me on the aspects of High Availability and Load Balancing. Considering this is is business, he really didn't have to be so helpful.

Thanks again Dean. I'll be keeping you in mind.

Brian

zerphyte
12-02-2002, 09:53 PM
You can do a fbsd ha cluster pretty easy for that. I have put together several of them for freehosts before. What os are you currently running and what freehost scripts do you use?

sqposter
12-02-2002, 10:51 PM
Just a note to timelord, Don't waste your time with blade systems as of yet. I did some reasearch ( some else did some also ) and came to the following conclusion based on unix ( linux) system.

current pricing for a full 2 racks ( 330 to 448 servers ), total processing power, and floor spacing cost at a carrier nutral facility does not equal the same as still using 1u servers and related equipment. the only need for blades is in a situation where compactness is critical and clustering/load balancing is required. Web hosting ( even on the google level ) does not yet require blades.

SQ

anantatman
12-03-2002, 01:18 AM
I agree.

The cost of ISP real estate and BW is so cheap nowadays, using blade servers to optimize it not really very cost effective. At a certain point, if criticality is so important, its better to go ahead and invest in two or three mainframe or midrange hardware than to quibble with 1/10U PC servers.

I do look forward to seeing PowerC or PowerPC blade servers with SCSI HDs...

300-400 Quality Processors in 1 Rack: Thats some real power in a small space...

Brian S
12-03-2002, 01:28 AM
Originally posted by zerphyte
You can do a fbsd ha cluster pretty easy for that. I have put together several of them for freehosts before. What os are you currently running and what freehost scripts do you use?
I run Linux, currently with a modified version of HomeFree 3.x. For the new server(s), I will likely use HomeFree Pro. I'm curious how you take care of the storage delimma. Offering 250MB per user, 250Free's storage must be cheap and plentiful.

Feel free to email me at brian@250Host.com if you wish.

Thanks,

Brian

zerphyte
12-03-2002, 01:41 AM
ah yes homefree pretty nifty script had some clients that used to use that one. I have done serveral things in the past are you on a budget? Do you want to use ide or scsi drives?

timelord
12-03-2002, 01:26 PM
Blade servers are one of those "interesting things". What a blade server -is- varies from vendor to vendor.

Now, if you were talking about Compaq's blade server (where each blade is an independent server with its own network connection, and a built-in hard drive), then its advantage would be where you were using 20-30 1U boxes and you would have easier management and [possiblely] cheaper pricing as a blade should be half the price of a 1U server (and I'm talking about a Compaq 1U server vs. a Compaq blade - home built boxes don't count!). These blades would only be "clusterable" in a load balance sense.

However, if you are talking about a blade server like www.engera.com has, well that is a horse of a different color :cool: . This type of blade server has a single connection to the storage subsystem for the entire blade server, a single network connection for the entire blade server, and the ability to dynamicly add CPU's into a virtual server. Where would I use this type of blade server? For a big processing applications (banking, health care, trading firms) where the virtualization and the ability to move CPU's around would be very handy.

Would I be looking at a blade server to reduce floor space? No - since most coloc/ISP space is for 2 (a half cage) or 4 racks. (Qwest being different - they setup in units of racks instead of cages.) However, if you had 420 1U servers (which, by definition, requires 10 racks), the difference in floor space rental rate at a coloc/ISP between 1 rack and 10 racks can be significant (about $6K-12K/month).

As in all things, blade servers are an answer to a problem and a solution in and of itself. It is always important to make sure you understand your problem.

sqposter
12-03-2002, 02:21 PM
Originally posted by timelord
Blade servers are one of those "interesting things". What a blade server -is- varies from vendor to vendor.

However, if you had 420 1U servers (which, by definition, requires 10 racks),

As in all things, blade servers are an answer to a problem and a solution in and of itself. It is always important to make sure you understand your problem.

Few note's

420 1/u servers = 6 or 10 racks, depends of the rack system you use.

You make an abolute perfect case on what the problem and the possible solution is about blades in general. thumbs up on that .

When blade server's come to the market with the hosting solution being the basic design, then I think we will see that advancement of blades the current RCL set up is worthless overall to any host.

when I did my research, I went and looked at about 5 different blade designs that had to be windows and/or unix compatable. I choose the most expensive floor space I could find ( carrier neutral hotels in NY ) and asked the basic question of cost for hosting windows and unix. Came to the answer that with current pricing on lease based systems ( used Dell and Gateway and baseline control ) I could configure out to 900 servers before I needed to think about cost effective space concerns ( New York location other areas had different cost)

Now where you do get some savings is the global " servicing " of the servers ( heck you could at some times update 23 servers at a time), so you need to include that time savings, but in unix that is mute. Windows it becomes a bigger concern.

SQ