Hey all. I am currently developing a project that could potentially need to support a mysql database which would be hit with millions of queries per minute.
Realistically, the load will likely never get that big, and if it did, I would first need to have a server capabale of handling thousands.
Right now I have a shared host which obviously wont cut it. I dont know that much about server management. I contacted a buddy of mine which once ran his own webhost out of a rack in florida. I am waiting to hear back from him.
Every few seconds a user will be hitting the server with two queries and then updating a row. I can likely do some query caching to help aleviate, but the nature of the project means that most queries will be unique.
Will a dedicated server handle a few hundred users doing this at once? What about a few thousand? Obviously beyond that, if millions of folks are using it, id likely have to hire someone to manage the servers and run it ourselves, correct?
Or do companies exist which handle this kind of scaling from minimal use all the way up to millions of queries at once?
If you're just starting out with a query every couple of seconds, that's only 1 instance with a few operations, none concurrent. Most shared hosts would even be able to handle that. You could start out with a VPS somewhere, then just upgrade as your userbase/traffic grows. That doesn't take into account any spikes in traffic in a short period of time though...
For something of this scale, you need a managed services partner who can evaluate your queries on an ongoing basis and make adjustments. At the end of the day, that's going to be far more important than what hardware you have in place.
Just for example, if the millions of queries are ever duplicates of eachother, inserting a caching layer in the middle could stave off half of your database hits. If they're mostly read queries, you could scale out a master-slaves replication cluster where you have many read-only nodes and one or a few write-only nodes.
You need a host who understands databases in great depth. The hardware will come naturally, and is the easy part.
Caching will be tricky if not impossible with this particular set of queries. I do not forsee a possibility that two queries will ever be the same. The results of two may be the same, but not the queries themselves.
Most data is read only once it is created. So replication with read-only would make a lot of sense.