I'm trying to help a friend who wants to start up a website that has the potential to draw millions of users per day. I figure at the start we can underestimate.
So, I need recommendations of not only what kind of box(es) to start with, what kind of bandwidth, but also who is a really reliable provider (this site will tank if downtime).
I'm open to things like doing a round-robin DNS to a few providers in order to support both growth and redundancy, but would like to hear from people who have done this.
There's little cash to start this project. Certainly not enough to just go buy a little server farm in a colo and hire someone to manage it. And that doesn't give the long-term redundancy I'd like to see (i.e., physically separate locations down the line).
Userbase will start in the thousands, quickly skyrocket to 5-6 figs. He estimates the 'interest' could hit 7 or even 8 figs at some point. I'm assuming that we need to presume maybe 100 users connected at any point, but could easily need to scale to 1000 quickly. If the userbase grows, I'm sure the money will 'appear' to help scale with new boxes, etc.
At the start, looking for probably simple blog & gallery, maybe a simple BB in the background for comments, or maybe just the blog comments will do. Heavily cached pages, means we're delivering near-html (via PHP). If I get someone to help set up, we can do things like play around with lighttpd...
Images could be on the same server, or a different one (or a few different ones). Image downloads will be big. Just to be clear, this is NOT an adult site, this is a science & technology site.
I've heard of folks picking up a few 1000GB+ boxes, use maybe two to serve images with lighttpd or thttpd and one base apache/lighttpd box for PHP + mysql core. Thumbnails could be hosted locally or on the secondary boxes.
I need some brainstorming. Site has the potential to grow rapidly -- I need possible solutions. Don't mind paying a bit more to start to have a 20mb line that we don't use, versus paying for 1TB throughput and blow by it by 2x or 3x, or 10x. Multiple servers would work. Multiple VPSes would work. Responsiveness to the main site is key (main php/html blog, and image thumbnails), whereas image download could take longer.
How big a box. How big a pipe. Multiple boxes? Round-robin? Clustered? Redundancy approaches?
Add to that there's the potential for streaming video hosting to the same crowd (possibly underwritten by a sponsor), and we need to have likely an outsourced company that can do that stuff...
Thanks for the brainstorming!
David Chait, Editor CHAITGEAR - Consumer electronics news and reviews
Not to be mean hear, but you're asking for estimations on a site that nobody really knows what it's exactly going to do.
That would be like me saying I'm going to open up a new store and it has the potential to be the next big thing, where should I build it, how many parking spaces should I have, and what color should it be? These things all depend on the website itself.
Load balancing over multiple datacenters is a very difficult thing to do. Round-robin DNS is great until you have one machine go down for whatever reason. Now 1/2 of all requests are not going to be answered (or whatever number of machines are in the round robin). You also have the problem of simple "user 1 goes here user 2 goes here" method which does not balance the load at all. Server 1 may end up being way overloaded and server 2 might end up being light load.
For starters you could start out on a small VPS and grow as you need to. If you are this confident that you are going to have that many users, then why not look into writing a proposal for investors to get the capital going that you need to do the project right from the start?
You are saying that userbase will start in the thousands but without knowing more about the userbase it is really hard to make any suggestions. Is the userbase just people that stop by one time? Or are they active members of a portal or fourm?
I think you can kind of see what i am getting at
We need a bit more information before we can really suggest any specs or systems.
as to your latter question, because I'm helping on the side, he's fully booked, so there's barely enough time to try and quickly build a site -- certainly not enough time to try and put together a business plan and pitch to investors. I've been down that road before... no interest.
As to your earlier points, answer heard on the round-robin things. I didn't know if in combination with dynamic DNS or something, it could be an effective method for distributing traffic.
Userbase is a question mark. My guess is that the site could become an active discussion area, and that many times per year there would be events that occurred over a course of days, and discussion lasts past that. Part of the draw is photos/images that are released in chunks, to try and draw people back -- or at least that'd be my strategy. Will there be a million visitors a day 365 days a year? I'm doubtful of that! However, there could easily be dozens of days per year where the million-size userbase all descends on the site together. Maybe a few days a month, each month, and then sporadic visits (some percentage) each day the rest of the month. But I'm just guessing on traffic. Could be 1/10th those numbers, could be more spread evenly throughout a month. Could be those numbers, hitting during half the day.
Trying to chart out and estimate things of this magnitude are hard to do. The best >I< can do is get advice on how to approach setting up systems to scale rapidly as the userbase grows and we see patterns of usage.
I don't like the store analogy, but let me turn it around. I've basically told you I'm building some kind of superstore (with main departments being tech/science images, simple blog, and some comment threads, using php). I've told you I'm expecting high traffic, and that I have a concept of what high traffic means -- it means the store needs to be near a thruway, have decent parking to start, and maybe an option on an open lot next door for further parking. I'm not asking you to pick my store color, that's for me to do. And the target audience isn't one demographic, it's of interest to nearly all demographics, so I want the store to be accessible.
If I told you it was a php forum, with a huge userbase, up to 1000 people at once, you'd have a target. That target is likely too big. I'm not looking at the overhead of a forum -- it'll be more lightweight in general code and connectivity. Unlike the average forum, it'll be less everyone writing, more just reading (cached text pages, lots of thumbnails, image serving). Reasonably light php blog for content and comments, plus gallery of images, plus image serving.
I'm not sure I can give you more detail, as I'm actually trying to FOSTER some brainstorming here. Because the exact breadth or depth of content might change (more images, less discussion? or vice-versa).
But I'm sure people are out there who set up large sites, with an expectation of X, but with on-demand growth plans to scale to Y.
Apache, lighttpd, MySQL, PHP. Control panels as needed. At least semi-managed. Probably minimally needs a 10mb line, if not a 100mb or more, over time maybe multiple servers to distribute the load better. Linux is preferable. A few VPSes, each on a separate box, with a host that could scale each to an individual full box, might not be a bad approach.
Brainstorm with me. What more details can I provide? I can make up numbers about how many exact users per day on average, but it's going to be a number that spikes so I'd rather deal with some big cases, scalability, etc. I can make up numbers about how many images will be hosted, how many will people look at per day -- but my guess is that anyone who has run a site based heavily around image serving will have some gut instinct calls. Etc.
Thanks for chiming in, let's keep knocking it around.
David Chait, Editor CHAITGEAR - Consumer electronics news and reviews