hosted by liquidweb


Go Back   Web Hosting Talk : Web Hosting Main Forums : Dedicated Server : Questions about how to handle HUGE site
Reply

Forum Jump

Questions about how to handle HUGE site

Reply Post New Thread In Dedicated Server Subscription
 
Send news tip View All Posts Thread Tools Search this Thread Display Modes
  #1  
Old
Web Hosting Master
 
Join Date: Jun 2002
Location: San Diego, California
Posts: 788
Exclamation

Questions about how to handle HUGE site


Well, this may or may not happen, but I am just curious how something like this would be handled, a site that would be primarily for voting, lets just say we have a PHP script that will handle this.

The entire process will be two pages, a page to select two options, and a page that thanks you for your vote and tells you your vote has been counted. both pages will be around 100kb (Including images) and will be accessed anywhere from 25-80 million times (All unique visit ors) and each will go through this process, and have there vote added into a MySQL database) all in less then a 1 hour period, considering the servers are running MySQL 4.1, THTTPD and PHP 4.3.10, how would something like this be handed, what possible cluster options are there, and what possible hardware options are there, saying that you had a $15,000 to handle JUST the servers, and around a $15,000 budget for bandwidth, (Interdistributable, if you think one might cost less and one might cost more, balance it out) If you believe this is not possible, please list why and what a reasonable output you think would be, thank you for your input.

-Tee



Sponsored Links
  #2  
Old
Retired Moderator
 
Join Date: Sep 2004
Location: Flint, Michigan
Posts: 5,765
I have never handled anything close to that size, so keep that in consideration.

You are talking 2.5TB to 8TB of bandwidth having to be transfered in a one hour period.

That's 41 to 133 Gigs a minute.
.68 to 2.2 Gigs a second

That's obviously a LOT of transfer by itself, but that should be able to be handled by caching the pages in a ramdisk type of sitatuion to serve as it's not going to be reads or writes.

As for bandwidth speed you are talking about needing around:

5440 Mbit/sec to 17,600 Mbit/sec

(these are all rough numbers, not defining).

At the cheap price of $10/mbit you are talking about 54,000-176,000 (to lease hte lines for a month, I'm sure something could be worked out) just to take care of the transfer.

Please understand that I do not know any of this for sure and as I said I have never done anything of this sort. I am sure I am missing something that would completely drop the prices much much lower.

As for handling the needs of this you would definatley have to use a cluster sitation. I would go with a high-end load balancer across several web-nodes and a few HIGH end database machines on a master:master setup.

__________________
Mike from Zoodia.com
Professional web design and development services.
In need of a fresh hosting design? See what premade designs we have in stock!
Web design tips, tricks, and more at MichaelPruitt.com

  #3  
Old
Web Hosting Master
 
Join Date: Jun 2002
Location: San Diego, California
Posts: 788
Yeah, well, considering it would only be a one time thing if it was done, It really wouldent be smart to lease the lines for a whole month...

(Note: Im not planning on doing this, this is just a thought / estimation of something)

Sponsored Links
  #4  
Old
Retired Moderator
 
Join Date: Sep 2004
Location: Flint, Michigan
Posts: 5,765
Oh as I said I am not a guru at all on any of this stuff. I just like to dabble with the thought of what would happen. I'm sure a provider could take care of you if they hand enough bandwidth available and make a deal with you for only a few hours. Although if you do lease hte lines for month I call dibs on using them for a day because I did math for you!

Best of luck with everything.

P.S. Why are the pages 100K in size?

__________________
Mike from Zoodia.com
Professional web design and development services.
In need of a fresh hosting design? See what premade designs we have in stock!
Web design tips, tricks, and more at MichaelPruitt.com

  #5  
Old
Web Hosting Master
 
Join Date: Jun 2002
Location: San Diego, California
Posts: 788
Like I said, Im not actually planning on doing this after reading those estimations, I wouldent be making much and overall it wouldent be worth it, it was just an estimation for a business deal that I might of considered...

The pages would be around 100kb in size due to an image or two and a very very lightweight design...

Thank you for the help although, If anyone else wants to discuss this feel free to post, but Im pretty much done, thanks for your help!

  #6  
Old
Web Hosting Master
 
Join Date: Jan 2003
Posts: 1,715
We (at the office) batted this around awhile back in a 'what would it take for an American Idol web-vote' scenario, although I assumed a minimal page size.

Unless you need a live result, a database app, and certainly an external database server, is a bad road to travel. You will be running 20,000 * QUERY_COUNT queries per second. Even if they were SELECT 1+1, the context switches (or network latency + CS <cringe>) alone would be a crushing blow. We settled on a log file, although a ReiserFS directory should work. The idea is to have the requests cross paths as absolute little as possible between requests, preferably not at all.

You would want to run the images and the entry page (which could be static) on separate servers from the voting system, so you can run thttpd over there. Similarly, you would want a single-process httpd with an embedded language, such as AOLserver, for the voting side. If you fork, you die; swap, you die; even context-switch, you die.

You could do the hardware side for under 15k but, unless development costs are separate from the hardware and bandwidth budgets, I wouldn't bother trying.

__________________
Game Servers are the next hot market!
Slim margins, heavy support, fickle customers, and moronic suppliers!
Start your own today!

  #7  
Old
Junior Guru Wannabe
 
Join Date: Jul 2004
Posts: 40
Why use a database? Couldnt you just keep a tally in a flat file?

I guess there would have to be user sessions, so then a db would be needed

Also, 100k seems alot, but I guess those fancy Idol type sites are pretty heavy.

  #8  
Old
Retired Moderator
 
Join Date: Oct 2004
Location: Southwest UK
Posts: 1,159
My opinion on this is quite simple: parallelise it.

Unless you need instant results (ie where you cannot spend half an hour collating results), then simply have a dozen (or ten dozen) servers running exactly the same thing. It doesn't sound like you care about people voting multiple times so just load balance the site.

If you do care about people voting multiple times, you would need to load balance intelligently - instead of the usual round-robin DNS setup, you'd want to point people to one of your servers according to their IP address, then they will always go to the same server if they try to log on multiple times and can be caught. (if they use a different proxy, you're stuffed anyway, as they won't be using a single login to vote twice anyway).

If you still need speed, I wouldn't use MySQL to store the votes, you'd want a simpler record vote-and-write custom app (a very simple bit of C code would do, writes to a flat file or even keeps votes in memory until it has a bit of spare CPU where it writes them all to disk). The results can then be bulk imported to a DB outside the voting window and your results queried (unless its real simple voting, the C app can write out the running totals, and you just add them up from all the flat files that are generated).

The cost of hardware.. you'd rent the space from Akami so the cost wouldn't be that much. The cost of bandwidth, again, you'd probably talk to Akami for that too. In fact, look at http://www.akamai.com/en/html/servic...g_contest.html and get them to do it all for you

  #9  
Old
Web Hosting Master
 
Join Date: Feb 2004
Posts: 2,197
I would agree with the above, spread across several servers using direct routing, it will usually save you a lot of cost as well, as you could do it over multiple providers. Unless you require live stats regarding voting, they can be kept across multiple DB servers, all of which can be merged at a later date (and checking for duplicate votes by checking for duplicate IPs).

With that sort of budget its not worth it, American Idol would have a lot bigger budget than $15k/voting session.

__________________
crucialparadigm - Affordable, Reliable, Professional :
Web Hosting
24/7 Support Web Hosting Reseller Hosting Cloud/VPS Plans Dedicated Servers

  #10  
Old
Web Hosting Master
 
Join Date: Oct 2002
Posts: 702
I would handle this in software as much as possible. Write the poll in C and embed that in Thttpd. Have all the votes written to memory within the proccess then have a script insert the data from the proccess into mysql every minute or so. You just went from 20,000 mysql queries per second to less than 1. After you had that working you would need to contact a distributed content network and have them host your script on 10 to 20 servers throughout their network. Hosting this at one facility would be stupid and putting the program as close to the voter as possible will reduce bandwidth costs.

__________________
ServerMatingProject.com
The World's first server mating experiment
We give new meaning to I/O intensive and hot swap

  #11  
Old
Web Hosting Master
 
Join Date: Nov 2003
Posts: 1,093
Quote:
Originally posted by jasjbow
Why use a database? Couldnt you just keep a tally in a flat file?

I guess there would have to be user sessions, so then a db would be needed

Also, 100k seems alot, but I guess those fancy Idol type sites are pretty heavy.
Flat files would be worse in my opinion...

__________________
ManageMyServices was sold by me in September 2009. I no longer have any affiliation with this company.

  #12  
Old
Web Hosting Master
 
Join Date: Jan 2003
Posts: 1,715
Because you're thinking 'flat file database'. Just append each vote to the file in a log style. If you need a 'live' result, have a separate process occasionally run through the logs and total up the results, discarding any repeats it finds. The filesystem will serialize the writes, but the voting system is entirely lock-free, which is the primary goal.

__________________
Game Servers are the next hot market!
Slim margins, heavy support, fickle customers, and moronic suppliers!
Start your own today!

  #13  
Old
Web Hosting Master
 
Join Date: Jun 2002
Location: San Diego, California
Posts: 788
I was thinking that a flat file system would be bad because all servers wouldent be able to easily share results and IP's if nessessary, and also it would result in File I/O off the charts, otherwise it would be a good idea.

  #14  
Old
Web Hosting Master
 
Join Date: Sep 2001
Location: Seattle, WA
Posts: 3,084
Well it depends, if it's vital to block repeat visitors you would need to have some form of shared IP table that shows who has voted and who hasn't.

Alternatively, you could post-process the results and throw away duplicates, so you wouldn't have to worry about doing it at the time of vote.

__________________
Jim Reardon - jim/amusive.com
SiteSurvival Professional, Expensive Hosting -=- Shrink URLs Down For Posting!

  #15  
Old
Web Hosting Master
 
Join Date: Jan 2003
Posts: 1,715
Quote:
Originally posted by Tee
I was thinking that a flat file system would be bad because all servers wouldent be able to easily share results and IP's if nessessary, and also it would result in File I/O off the charts, otherwise it would be a good idea.
On a local filesystem, reads would be cached, and even a large record size would write under 10 MB/sec across the server group. Exceptional, but well within even IDE's abilities. It would be beyond the reaches of NFS or GFS, probably even over Myrinet, which have network-latency locking and refreshes. Besides, you'll need to deal with that IO in any storage format, unless you intend to buy 40 GB of RAM.

In the flat-file approach, you would throw out the duplicates during post-process, as amusive said. You'd probably use cookies to provide the 'you have already voted' feedback to keep the honest people honest.

If you really need live duplicate checking, I think you're back to the big-box topology. You would need a fast record search for duplicate checking. You could build your own shared binary or B+ tree, but I'd just abuse ReiserFS or XFS directories, which already use them and automatically manage the memory footprint.

__________________
Game Servers are the next hot market!
Slim margins, heavy support, fickle customers, and moronic suppliers!
Start your own today!

Reply

Related posts from TheWhir.com
Title Type Date Posted
Right to Be Forgotten Continues to Create More Regulations for Search Engines Web Hosting News 2014-09-19 12:41:29
Bluehost, HostGator Among EIG Brands Hit by Massive New Year's Eve Outage Web Hosting News 2013-12-31 16:59:34
Could Website Hackers be Chasing Hosting Customers Away? Blog 2013-08-27 09:07:42
UK Tech Blog Challenges CloudFlare, Web Hosts on Free Speech Blog 2013-08-12 10:32:52
CloudBeat 2012 Web Hosting Events 2012-11-20 15:35:02


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes
Postbit Selector

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump
Login:
Log in with your username and password
Username:
Password:



Forgot Password?
Advertisement:
Web Hosting News:
WHT Membership
WHT Membership



 

X

Welcome to WebHostingTalk.com

Create your username to jump into the discussion!

WebHostingTalk.com is the largest, most influentual web hosting community on the Internet. Join us by filling in the form below.


(4 digit year)

Already a member?