Results 1 to 15 of 15
  1. #1
    Join Date
    Jun 2002
    Location
    San Diego, California
    Posts
    788

    Exclamation Questions about how to handle HUGE site

    Well, this may or may not happen, but I am just curious how something like this would be handled, a site that would be primarily for voting, lets just say we have a PHP script that will handle this.

    The entire process will be two pages, a page to select two options, and a page that thanks you for your vote and tells you your vote has been counted. both pages will be around 100kb (Including images) and will be accessed anywhere from 25-80 million times (All unique visit ors) and each will go through this process, and have there vote added into a MySQL database) all in less then a 1 hour period, considering the servers are running MySQL 4.1, THTTPD and PHP 4.3.10, how would something like this be handed, what possible cluster options are there, and what possible hardware options are there, saying that you had a $15,000 to handle JUST the servers, and around a $15,000 budget for bandwidth, (Interdistributable, if you think one might cost less and one might cost more, balance it out) If you believe this is not possible, please list why and what a reasonable output you think would be, thank you for your input.

    -Tee

  2. #2
    Join Date
    Sep 2004
    Location
    Flint, Michigan
    Posts
    5,766
    I have never handled anything close to that size, so keep that in consideration.

    You are talking 2.5TB to 8TB of bandwidth having to be transfered in a one hour period.

    That's 41 to 133 Gigs a minute.
    .68 to 2.2 Gigs a second

    That's obviously a LOT of transfer by itself, but that should be able to be handled by caching the pages in a ramdisk type of sitatuion to serve as it's not going to be reads or writes.

    As for bandwidth speed you are talking about needing around:

    5440 Mbit/sec to 17,600 Mbit/sec

    (these are all rough numbers, not defining).

    At the cheap price of $10/mbit you are talking about 54,000-176,000 (to lease hte lines for a month, I'm sure something could be worked out) just to take care of the transfer.

    Please understand that I do not know any of this for sure and as I said I have never done anything of this sort. I am sure I am missing something that would completely drop the prices much much lower.

    As for handling the needs of this you would definatley have to use a cluster sitation. I would go with a high-end load balancer across several web-nodes and a few HIGH end database machines on a master:master setup.
    Mike from Zoodia.com
    Professional web design and development services.
    In need of a fresh hosting design? See what premade designs we have in stock!
    Web design tips, tricks, and more at MichaelPruitt.com

  3. #3
    Join Date
    Jun 2002
    Location
    San Diego, California
    Posts
    788
    Yeah, well, considering it would only be a one time thing if it was done, It really wouldent be smart to lease the lines for a whole month...

    (Note: Im not planning on doing this, this is just a thought / estimation of something)

  4. #4
    Join Date
    Sep 2004
    Location
    Flint, Michigan
    Posts
    5,766
    Oh as I said I am not a guru at all on any of this stuff. I just like to dabble with the thought of what would happen. I'm sure a provider could take care of you if they hand enough bandwidth available and make a deal with you for only a few hours. Although if you do lease hte lines for month I call dibs on using them for a day because I did math for you!

    Best of luck with everything.

    P.S. Why are the pages 100K in size?
    Mike from Zoodia.com
    Professional web design and development services.
    In need of a fresh hosting design? See what premade designs we have in stock!
    Web design tips, tricks, and more at MichaelPruitt.com

  5. #5
    Join Date
    Jun 2002
    Location
    San Diego, California
    Posts
    788
    Like I said, Im not actually planning on doing this after reading those estimations, I wouldent be making much and overall it wouldent be worth it, it was just an estimation for a business deal that I might of considered...

    The pages would be around 100kb in size due to an image or two and a very very lightweight design...

    Thank you for the help although, If anyone else wants to discuss this feel free to post, but Im pretty much done, thanks for your help!

  6. #6
    Join Date
    Jan 2003
    Posts
    1,715
    We (at the office) batted this around awhile back in a 'what would it take for an American Idol web-vote' scenario, although I assumed a minimal page size.

    Unless you need a live result, a database app, and certainly an external database server, is a bad road to travel. You will be running 20,000 * QUERY_COUNT queries per second. Even if they were SELECT 1+1, the context switches (or network latency + CS <cringe>) alone would be a crushing blow. We settled on a log file, although a ReiserFS directory should work. The idea is to have the requests cross paths as absolute little as possible between requests, preferably not at all.

    You would want to run the images and the entry page (which could be static) on separate servers from the voting system, so you can run thttpd over there. Similarly, you would want a single-process httpd with an embedded language, such as AOLserver, for the voting side. If you fork, you die; swap, you die; even context-switch, you die.

    You could do the hardware side for under 15k but, unless development costs are separate from the hardware and bandwidth budgets, I wouldn't bother trying.
    Game Servers are the next hot market!
    Slim margins, heavy support, fickle customers, and moronic suppliers!
    Start your own today!

  7. #7
    Join Date
    Jul 2004
    Posts
    40
    Why use a database? Couldnt you just keep a tally in a flat file?

    I guess there would have to be user sessions, so then a db would be needed

    Also, 100k seems alot, but I guess those fancy Idol type sites are pretty heavy.

  8. #8
    Join Date
    Oct 2004
    Location
    Southwest UK
    Posts
    1,175
    My opinion on this is quite simple: parallelise it.

    Unless you need instant results (ie where you cannot spend half an hour collating results), then simply have a dozen (or ten dozen) servers running exactly the same thing. It doesn't sound like you care about people voting multiple times so just load balance the site.

    If you do care about people voting multiple times, you would need to load balance intelligently - instead of the usual round-robin DNS setup, you'd want to point people to one of your servers according to their IP address, then they will always go to the same server if they try to log on multiple times and can be caught. (if they use a different proxy, you're stuffed anyway, as they won't be using a single login to vote twice anyway).

    If you still need speed, I wouldn't use MySQL to store the votes, you'd want a simpler record vote-and-write custom app (a very simple bit of C code would do, writes to a flat file or even keeps votes in memory until it has a bit of spare CPU where it writes them all to disk). The results can then be bulk imported to a DB outside the voting window and your results queried (unless its real simple voting, the C app can write out the running totals, and you just add them up from all the flat files that are generated).

    The cost of hardware.. you'd rent the space from Akami so the cost wouldn't be that much. The cost of bandwidth, again, you'd probably talk to Akami for that too. In fact, look at http://www.akamai.com/en/html/servic...g_contest.html and get them to do it all for you

  9. #9
    I would agree with the above, spread across several servers using direct routing, it will usually save you a lot of cost as well, as you could do it over multiple providers. Unless you require live stats regarding voting, they can be kept across multiple DB servers, all of which can be merged at a later date (and checking for duplicate votes by checking for duplicate IPs).

    With that sort of budget its not worth it, American Idol would have a lot bigger budget than $15k/voting session.
    crucialparadigm - Affordable, Reliable, Professional :
    Web Hosting
    24/7 Support • Web Hosting • Reseller Hosting • Cloud/VPS Plans • Dedicated Servers •

  10. #10
    Join Date
    Oct 2002
    Posts
    705
    I would handle this in software as much as possible. Write the poll in C and embed that in Thttpd. Have all the votes written to memory within the proccess then have a script insert the data from the proccess into mysql every minute or so. You just went from 20,000 mysql queries per second to less than 1. After you had that working you would need to contact a distributed content network and have them host your script on 10 to 20 servers throughout their network. Hosting this at one facility would be stupid and putting the program as close to the voter as possible will reduce bandwidth costs.
    ServerMatingProject.com
    The World's first server mating experiment
    We give new meaning to I/O intensive and hot swap

  11. #11
    Join Date
    Nov 2003
    Posts
    1,093
    Originally posted by jasjbow
    Why use a database? Couldnt you just keep a tally in a flat file?

    I guess there would have to be user sessions, so then a db would be needed

    Also, 100k seems alot, but I guess those fancy Idol type sites are pretty heavy.
    Flat files would be worse in my opinion...
    ManageMyServices was sold by me in September 2009. I no longer have any affiliation with this company.

  12. #12
    Join Date
    Jan 2003
    Posts
    1,715
    Because you're thinking 'flat file database'. Just append each vote to the file in a log style. If you need a 'live' result, have a separate process occasionally run through the logs and total up the results, discarding any repeats it finds. The filesystem will serialize the writes, but the voting system is entirely lock-free, which is the primary goal.
    Game Servers are the next hot market!
    Slim margins, heavy support, fickle customers, and moronic suppliers!
    Start your own today!

  13. #13
    Join Date
    Jun 2002
    Location
    San Diego, California
    Posts
    788
    I was thinking that a flat file system would be bad because all servers wouldent be able to easily share results and IP's if nessessary, and also it would result in File I/O off the charts, otherwise it would be a good idea.

  14. #14
    Join Date
    Sep 2001
    Location
    Seattle, WA
    Posts
    3,085
    Well it depends, if it's vital to block repeat visitors you would need to have some form of shared IP table that shows who has voted and who hasn't.

    Alternatively, you could post-process the results and throw away duplicates, so you wouldn't have to worry about doing it at the time of vote.
    Jim Reardon - jim/amusive.com

  15. #15
    Join Date
    Jan 2003
    Posts
    1,715
    Originally posted by Tee
    I was thinking that a flat file system would be bad because all servers wouldent be able to easily share results and IP's if nessessary, and also it would result in File I/O off the charts, otherwise it would be a good idea.
    On a local filesystem, reads would be cached, and even a large record size would write under 10 MB/sec across the server group. Exceptional, but well within even IDE's abilities. It would be beyond the reaches of NFS or GFS, probably even over Myrinet, which have network-latency locking and refreshes. Besides, you'll need to deal with that IO in any storage format, unless you intend to buy 40 GB of RAM.

    In the flat-file approach, you would throw out the duplicates during post-process, as amusive said. You'd probably use cookies to provide the 'you have already voted' feedback to keep the honest people honest.

    If you really need live duplicate checking, I think you're back to the big-box topology. You would need a fast record search for duplicate checking. You could build your own shared binary or B+ tree, but I'd just abuse ReiserFS or XFS directories, which already use them and automatically manage the memory footprint.
    Game Servers are the next hot market!
    Slim margins, heavy support, fickle customers, and moronic suppliers!
    Start your own today!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •