
05-12-2009, 02:11 AM
|
|
Junior Guru Wannabe
|
|
Join Date: Sep 2007
Posts: 52
|
|
Setting up a CDN (sort of) with Multiple VPS
Hi there,
As a personal project I've recently began thinking about setting up a CDN (sort of) with my various servers around the world. Basically, what it will do is.. From one 'management' point or portal (you can assume one physical server), each user can update/create their webpage there..
From that main node, it will have to replicate in REAL TIME:
a) userdir (public_html etc.)
b) mysql db's (circular mysql replication?)
c) Any updates to their apache.conf, php.ini etc..
Now I've been looking into several options.. drbd seems to be one that a lot of people suggest for the actual data sync, but I'm wary of implementing something like that outside of a local private LAN. I've also looked into rsync/unison which may work.. I'm just throwing this out there to see what people have tried and are using.
DNS will be handled by a GeoIP/Maxmind bind patch, and closest server will be given by my DNS servers. Closest 'mirror' of the user's site will be served via that hostname.
Anyone done this before or tried? The actual caching part of CDNs and etc. I'm thinking I could always just deal with via squid later, but for now the main guts is what I'm concerned with.
Thanks.
Elsdon
|

05-12-2009, 09:14 AM
|
|
Web Hosting Guru
|
|
Join Date: Jun 2007
Location: Tokyo, Japan
Posts: 333
|
|
I'm interested in this as well.
The hardest part is the data and SQL replication (lag) in my opinion.
__________________
Yudai Yamagishi
|

05-12-2009, 09:47 AM
|
|
Junior Guru
|
|
Join Date: Mar 2008
Location: Savage, MN
Posts: 217
|
|
This is hard, and often requires a bunch of custom development, as there really isn't that much software that will do this all for you.
Circular replication works, but only if your applications are highly tuned to ensure there are never conflicts.
The way Akamai and such do the file replication is basically that if the file doesn't exist yet on one node, it sends the request off to the other node(s).. again, custom software development.
I'd love to hear if you figure out a way to get this going!
Just to note - I have yet to work with a CDN that provides a database solution (well, Akamai can write to a text file on NetStorage, but that doesn't quite count).. so if you can actually solve that problem in a way that scales and is seamless to end users, you've got a winner.
|

05-12-2009, 10:06 AM
|
|
Junior Guru Wannabe
|
|
Join Date: Sep 2007
Posts: 52
|
|
DB replication isn't so bad one-way.. But the real issue only arrives at when you want to update a DB bi-directionally.. I wonder how places like gmail do it.. or even hotmail.. I presume they're not all accessing one 'master' db for writes, and then all the geographical locations for 'reads'.. seems kind of counterproductive to me.
Custom development I'm not so much afraid of.. Being as that is my trade, it's within the scope of this project to write something that is both robust and dynamic. I'm just looking for more high-level ideas for now..
Yeah, circulation replication.. Integrity would be an issue because which db would have priority? Let's say if two 'clients' are trying to write to the same row at the same time.. Well, no.. that really wouldn't work either.. The easiest way would have to be ntp to sync all the system clocks to UTC, then timestamp every db transaction.. push them all in chronologically.. There would be some lag, but I think that is unavoidable.. I think anything in the milliseconds is acceptable, but once we're talking like 10-20s the time to update might become a factor. Interesting.
|

05-12-2009, 01:14 PM
|
|
Junior Guru Wannabe
|
|
Join Date: Dec 2007
Posts: 63
|
|
Quote:
Originally Posted by elsdon
DB replication isn't so bad one-way.. But the real issue only arrives at when you want to update a DB bi-directionally.. I wonder how places like gmail do it.. or even hotmail.. I presume they're not all accessing one 'master' db for writes, and then all the geographical locations for 'reads'.. seems kind of counterproductive to me.
Custom development I'm not so much afraid of.. Being as that is my trade, it's within the scope of this project to write something that is both robust and dynamic. I'm just looking for more high-level ideas for now..
Yeah, circulation replication.. Integrity would be an issue because which db would have priority? Let's say if two 'clients' are trying to write to the same row at the same time.. Well, no.. that really wouldn't work either.. The easiest way would have to be ntp to sync all the system clocks to UTC, then timestamp every db transaction.. push them all in chronologically.. There would be some lag, but I think that is unavoidable.. I think anything in the milliseconds is acceptable, but once we're talking like 10-20s the time to update might become a factor. Interesting.
|
Not really, mysql offers an auto_increment_offset that can use on all of your master nodes. As long as your tables use an integer primary key w/ auto_increment you'll have no conflicts if two users try to insert the same data into two different masters at the same time. It's basically the same thing you were suggesting with the timestamps, but at the database level instead of application level.
I use that with a master-master cluster from Dallas <--> Chicago and I've yet to have any problems.
|

05-12-2009, 01:53 PM
|
|
Junior Guru Wannabe
|
|
Join Date: Sep 2007
Posts: 52
|
|
Hrm.. But do you have to design your application/tables to auto_increment_offset? I'd like something transparent to the application layer.. something where underlying application code doesn't need to be changed.
That concept is the tough part.. If I was just writing one application to do this, it would be easy to self-manage all of it. But something where any random 'hosting client' let's say, could come in, upload their website, setup their db and have it run and replicate without them having to do anything extra on their side.
|

05-12-2009, 02:47 PM
|
|
Junior Guru Wannabe
|
|
Join Date: Dec 2007
Posts: 63
|
|
Quote:
Originally Posted by elsdon
Hrm.. But do you have to design your application/tables to auto_increment_offset? I'd like something transparent to the application layer.. something where underlying application code doesn't need to be changed.
That concept is the tough part.. If I was just writing one application to do this, it would be easy to self-manage all of it. But something where any random 'hosting client' let's say, could come in, upload their website, setup their db and have it run and replicate without them having to do anything extra on their side.
|
It's on the database level, no reconfiguring for the application. There are actually two variables you need to change. auto_increment_offset and auto_increment_increment.
auto_increment_increment changes how much the local mysql will increment by to determine the next value. With a large enough increment value and correct offsets for each server it's possible to guarantee non-conflicting writes to the same table assuming no other unique constraints apply.
|

05-12-2009, 03:16 PM
|
|
Junior Guru Wannabe
|
|
Join Date: Sep 2007
Posts: 52
|
|
Quote:
Originally Posted by mchristen85
It's on the database level, no reconfiguring for the application. There are actually two variables you need to change. auto_increment_offset and auto_increment_increment.
auto_increment_increment changes how much the local mysql will increment by to determine the next value. With a large enough increment value and correct offsets for each server it's possible to guarantee non-conflicting writes to the same table assuming no other unique constraints apply.
|
Hrm.. but it would only work if each table has a primary key.. which I assume any non-reference table would.. I will look into it. Thanks!
I guess file level sync is pretty much all I have left to think about.. and rsync does have some 'real-time' like operations..
|

05-12-2009, 06:06 PM
|
|
Web Hosting Master
|
|
Join Date: Jun 2005
Posts: 2,532
|
|
Quote:
Originally Posted by elsdon
Closest 'mirror' of the user's site will be served via that hostname.
|
Not exactly close of the user's site but close of the (ISP) cache nameserver that the user's computer is configured to use to resolv the dns queries.
Quote:
Originally Posted by elsdon
Anyone done this before or tried? The actual caching part of CDNs and etc. I'm thinking I could always just deal with via squid later, but for now the main guts is what I'm concerned with.
|
We are using geoIP DNS and VPS/dedicated servers around the world to cache static content. The main issues are: 1) VPS are a lot less stable than dedicated servers 2) Cheap VPS may eventually present low throughput -- a "distant" cache server may present better performance than a "near" cache server.
To fix issues type 1 and to mitigate issues type 2 we use several monitoring servers which provide "smart" failover, changing the edge network node servers when detecting failures / low performance.
Regarding the database problem, as we are software/SaaS developers and the CDN is "proprietary", built to run our applications, we run monitored regional Web application servers using partitioned tables, if applicable, and most of time Web services + stored procedures. The systems are designed from the zero to use the CDN architecture/infrastructure. No way to use any automatic database sync and off the question to use MySQL (sorry MySQL fans).
IMO a CDN running generic applications without changes is a great challenge. Wish you good luck!
Last edited by dotHostel; 05-12-2009 at 06:16 PM.
|

05-12-2009, 08:25 PM
|
|
Junior Guru
|
|
Join Date: Mar 2008
Location: Savage, MN
Posts: 217
|
|
Quote:
Originally Posted by dotHostel
We are using geoIP DNS and VPS/dedicated servers around the world to cache static content.
|
Just curious - which geoIP implementation are you using?
|

05-13-2009, 05:19 AM
|
|
Web Hosting Master
|
|
Join Date: Jun 2005
Posts: 2,532
|
|
Quote:
Originally Posted by natecarlson
Just curious - which geoIP implementation are you using?
|
Initially we used the patched version of Bind and Maxmind free country IP data. Works fine. But after a while we started to get the IP data directly from Arin. We have been using geoIP dns over 2 years.
|

05-13-2009, 05:20 AM
|
|
Junior Guru Wannabe
|
|
Join Date: Sep 2007
Posts: 52
|
|
Hrm... Directly form Arin, weird. I never knew they had a city/country level breakdiwn for MaxMind..
|

05-13-2009, 05:23 AM
|
|
Web Hosting Master
|
|
Join Date: Jun 2005
Posts: 2,532
|
|
Quote:
Originally Posted by elsdon
Hrm... Directly form Arin, weird. I never knew they had a city/country level breakdiwn for MaxMind..
|
They don't have.  You just can get blocks of IPs allocated by each country. We have our own routines to process these blocks and to search. We couldn't base our entire business on secondary data.
Last edited by dotHostel; 05-13-2009 at 05:29 AM.
|

05-13-2009, 05:25 AM
|
|
Web Hosting Master
|
|
Join Date: Jun 2005
Posts: 2,532
|
|
ftp://ftp.arin.net/pub/stats/
The IP data is updated daily, sometimes more than once a day.
|

05-13-2009, 09:40 AM
|
|
Junior Guru Wannabe
|
|
Join Date: Apr 2009
Posts: 30
|
|
Depending on the kind of DB load you have, you may be able to just set up a replication slave and do all of your read queries over the two.. and send the write queries to the master.
You can also do mysql clustering, but that is very memory intensive and isn't easy.
If you wanted to use DRBD but don't have a private network segment, then make one using OpenVPN. 
__________________
▄▄▄ Argenta Hosting, LLC
███ Business Grade VPS -- World Class Support Services -- Zero Over-subscription, guaranteed!
█▀█ On-site Consultations available in Central Arkansas
▀▀▀ 12+ years of enterprise Linux experience
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
| Postbit Selector |
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|
|
| Login: |
|
|
| Advertisement: |
|
|
| Web Hosting News: |
|
|
|