Results 1 to 19 of 19
-
05-12-2009, 02:11 AM #1Junior Guru Wannabe
- Join Date
- Sep 2007
- Posts
- 52
Setting up a CDN (sort of) with Multiple VPS
Hi there,
As a personal project I've recently began thinking about setting up a CDN (sort of) with my various servers around the world. Basically, what it will do is.. From one 'management' point or portal (you can assume one physical server), each user can update/create their webpage there..
From that main node, it will have to replicate in REAL TIME:
a) userdir (public_html etc.)
b) mysql db's (circular mysql replication?)
c) Any updates to their apache.conf, php.ini etc..
Now I've been looking into several options.. drbd seems to be one that a lot of people suggest for the actual data sync, but I'm wary of implementing something like that outside of a local private LAN. I've also looked into rsync/unison which may work.. I'm just throwing this out there to see what people have tried and are using.
DNS will be handled by a GeoIP/Maxmind bind patch, and closest server will be given by my DNS servers. Closest 'mirror' of the user's site will be served via that hostname.
Anyone done this before or tried? The actual caching part of CDNs and etc. I'm thinking I could always just deal with via squid later, but for now the main guts is what I'm concerned with.
Thanks.
Elsdon
-
05-12-2009, 09:14 AM #2Web Hosting Guru
- Join Date
- Jun 2007
- Location
- Tokyo, Japan
- Posts
- 336
I'm interested in this as well.
The hardest part is the data and SQL replication (lag) in my opinion.Yudai Yamagishi
-
05-12-2009, 09:47 AM #3Junior Guru
- Join Date
- Mar 2008
- Location
- Savage, MN
- Posts
- 220
This is hard, and often requires a bunch of custom development, as there really isn't that much software that will do this all for you.
Circular replication works, but only if your applications are highly tuned to ensure there are never conflicts.
The way Akamai and such do the file replication is basically that if the file doesn't exist yet on one node, it sends the request off to the other node(s).. again, custom software development.
I'd love to hear if you figure out a way to get this going!
Just to note - I have yet to work with a CDN that provides a database solution (well, Akamai can write to a text file on NetStorage, but that doesn't quite count).. so if you can actually solve that problem in a way that scales and is seamless to end users, you've got a winner.-nc
-
05-12-2009, 10:06 AM #4Junior Guru Wannabe
- Join Date
- Sep 2007
- Posts
- 52
DB replication isn't so bad one-way.. But the real issue only arrives at when you want to update a DB bi-directionally.. I wonder how places like gmail do it.. or even hotmail.. I presume they're not all accessing one 'master' db for writes, and then all the geographical locations for 'reads'.. seems kind of counterproductive to me.
Custom development I'm not so much afraid of.. Being as that is my trade, it's within the scope of this project to write something that is both robust and dynamic. I'm just looking for more high-level ideas for now..
Yeah, circulation replication.. Integrity would be an issue because which db would have priority? Let's say if two 'clients' are trying to write to the same row at the same time.. Well, no.. that really wouldn't work either.. The easiest way would have to be ntp to sync all the system clocks to UTC, then timestamp every db transaction.. push them all in chronologically.. There would be some lag, but I think that is unavoidable.. I think anything in the milliseconds is acceptable, but once we're talking like 10-20s the time to update might become a factor. Interesting.
-
05-12-2009, 01:14 PM #5Junior Guru Wannabe
- Join Date
- Dec 2007
- Posts
- 64
Not really, mysql offers an auto_increment_offset that can use on all of your master nodes. As long as your tables use an integer primary key w/ auto_increment you'll have no conflicts if two users try to insert the same data into two different masters at the same time. It's basically the same thing you were suggesting with the timestamps, but at the database level instead of application level.
I use that with a master-master cluster from Dallas <--> Chicago and I've yet to have any problems.
-
05-12-2009, 01:53 PM #6Junior Guru Wannabe
- Join Date
- Sep 2007
- Posts
- 52
Hrm.. But do you have to design your application/tables to auto_increment_offset? I'd like something transparent to the application layer.. something where underlying application code doesn't need to be changed.
That concept is the tough part.. If I was just writing one application to do this, it would be easy to self-manage all of it. But something where any random 'hosting client' let's say, could come in, upload their website, setup their db and have it run and replicate without them having to do anything extra on their side.
-
05-12-2009, 02:47 PM #7Junior Guru Wannabe
- Join Date
- Dec 2007
- Posts
- 64
It's on the database level, no reconfiguring for the application. There are actually two variables you need to change. auto_increment_offset and auto_increment_increment.
auto_increment_increment changes how much the local mysql will increment by to determine the next value. With a large enough increment value and correct offsets for each server it's possible to guarantee non-conflicting writes to the same table assuming no other unique constraints apply.
-
05-12-2009, 03:16 PM #8Junior Guru Wannabe
- Join Date
- Sep 2007
- Posts
- 52
-
05-12-2009, 06:06 PM #9Web Hosting Master
- Join Date
- Jun 2005
- Posts
- 2,752
Not exactly close of the user's site but close of the (ISP) cache nameserver that the user's computer is configured to use to resolv the dns queries.
We are using geoIP DNS and VPS/dedicated servers around the world to cache static content. The main issues are: 1) VPS are a lot less stable than dedicated servers 2) Cheap VPS may eventually present low throughput -- a "distant" cache server may present better performance than a "near" cache server.
To fix issues type 1 and to mitigate issues type 2 we use several monitoring servers which provide "smart" failover, changing the edge network node servers when detecting failures / low performance.
Regarding the database problem, as we are software/SaaS developers and the CDN is "proprietary", built to run our applications, we run monitored regional Web application servers using partitioned tables, if applicable, and most of time Web services + stored procedures. The systems are designed from the zero to use the CDN architecture/infrastructure. No way to use any automatic database sync and off the question to use MySQL (sorry MySQL fans).
IMO a CDN running generic applications without changes is a great challenge. Wish you good luck!Last edited by dotHostel; 05-12-2009 at 06:16 PM.
-
05-12-2009, 08:25 PM #10Junior Guru
- Join Date
- Mar 2008
- Location
- Savage, MN
- Posts
- 220
-nc
-
05-13-2009, 05:19 AM #11Web Hosting Master
- Join Date
- Jun 2005
- Posts
- 2,752
-
05-13-2009, 05:20 AM #12Junior Guru Wannabe
- Join Date
- Sep 2007
- Posts
- 52
Hrm... Directly form Arin, weird. I never knew they had a city/country level breakdiwn for MaxMind..
-
05-13-2009, 05:23 AM #13Web Hosting Master
- Join Date
- Jun 2005
- Posts
- 2,752
-
05-13-2009, 05:25 AM #14Web Hosting Master
- Join Date
- Jun 2005
- Posts
- 2,752
ftp://ftp.arin.net/pub/stats/
The IP data is updated daily, sometimes more than once a day.
-
05-13-2009, 09:40 AM #15Junior Guru Wannabe
- Join Date
- Apr 2009
- Posts
- 30
Depending on the kind of DB load you have, you may be able to just set up a replication slave and do all of your read queries over the two.. and send the write queries to the master.
You can also do mysql clustering, but that is very memory intensive and isn't easy.
If you wanted to use DRBD but don't have a private network segment, then make one using OpenVPN.▄▄▄ Argenta Hosting, LLC
███ Business Grade VPS -- World Class Support Services -- Zero Over-subscription, guaranteed!
█▀█ On-site Consultations available in Central Arkansas
▀▀▀ 12+ years of enterprise Linux experience
-
05-13-2009, 10:31 AM #16Web Hosting Master
- Join Date
- Sep 2005
- Location
- Canada
- Posts
- 646
If your database usage is mostly reads rather than writes, a master-slave relationship works best. Thats what Google/Yahoo do when you search for example; millions of reads, relatively few writes that need to be replicated between databases.
If you only have two database servers, a multi-master setup can work fine (auto_increment in mysql). Be sure you know how to resync them if a problem occurs, which will happen eventually.
The other files can be synced with any number of tools, rsync works well depending on the delay that's acceptable to you.
-
05-13-2009, 10:38 AM #17Web Hosting Master
- Join Date
- Jun 2005
- Posts
- 2,752
All right. But the OP proposal is to offer a CDN service to run any application without change.
That concept is the tough part.. If I was just writing one application to do this, it would be easy to self-manage all of it. But something where any random 'hosting client' let's say, could come in, upload their website, setup their db and have it run and replicate without them having to do anything extra on their side.
-
05-13-2009, 10:46 AM #18Junior Guru Wannabe
- Join Date
- Sep 2007
- Posts
- 52
Yeah. There's a lot of half-way solutions that I 'could' implement but I figure since we're in the conceptual part of the design.. I'm striving for an 'optimal' solution off the bat, and not a semi-hack solution that sort of almost works, in majority of cases.
Google/Yahoo I'm aware their SEARCH is read based.. but what about GMAIL/YAHOO mail, is what I'm wondering.. Are those distributed applications via CDN? I wonder if they're just huge centralized applications, and 'file attachments' are located on CDNs.
Please keep em coming guys. I wonder if we'll be able to devise something truly devious here.
-
05-13-2009, 11:17 AM #19Junior Guru Wannabe
- Join Date
- Apr 2009
- Posts
- 30
Here's a crazy outside the box idea.. Create a parity scheme and distribute data stripes w/ parity over several master servers.. use mysql to house the stripe and parity data as blobs. The data itself is in a file based DB format, like Db2.
I suppose you'd have to have some sort of controller to route DB requests.. hm.
Probably wouldn't work, but it is creative. :-)Last edited by ttedford; 05-13-2009 at 11:30 AM. Reason: Correction
▄▄▄ Argenta Hosting, LLC
███ Business Grade VPS -- World Class Support Services -- Zero Over-subscription, guaranteed!
█▀█ On-site Consultations available in Central Arkansas
▀▀▀ 12+ years of enterprise Linux experience