Results 1 to 19 of 19
  1. #1
    Join Date
    Sep 2007
    Posts
    52

    Setting up a CDN (sort of) with Multiple VPS

    Hi there,

    As a personal project I've recently began thinking about setting up a CDN (sort of) with my various servers around the world. Basically, what it will do is.. From one 'management' point or portal (you can assume one physical server), each user can update/create their webpage there..

    From that main node, it will have to replicate in REAL TIME:
    a) userdir (public_html etc.)
    b) mysql db's (circular mysql replication?)
    c) Any updates to their apache.conf, php.ini etc..

    Now I've been looking into several options.. drbd seems to be one that a lot of people suggest for the actual data sync, but I'm wary of implementing something like that outside of a local private LAN. I've also looked into rsync/unison which may work.. I'm just throwing this out there to see what people have tried and are using.

    DNS will be handled by a GeoIP/Maxmind bind patch, and closest server will be given by my DNS servers. Closest 'mirror' of the user's site will be served via that hostname.

    Anyone done this before or tried? The actual caching part of CDNs and etc. I'm thinking I could always just deal with via squid later, but for now the main guts is what I'm concerned with.

    Thanks.
    Elsdon

  2. #2
    Join Date
    Jun 2007
    Location
    Tokyo, Japan
    Posts
    336
    I'm interested in this as well.
    The hardest part is the data and SQL replication (lag) in my opinion.
    Yudai Yamagishi

  3. #3
    Join Date
    Mar 2008
    Location
    Savage, MN
    Posts
    220
    This is hard, and often requires a bunch of custom development, as there really isn't that much software that will do this all for you.

    Circular replication works, but only if your applications are highly tuned to ensure there are never conflicts.

    The way Akamai and such do the file replication is basically that if the file doesn't exist yet on one node, it sends the request off to the other node(s).. again, custom software development.

    I'd love to hear if you figure out a way to get this going!

    Just to note - I have yet to work with a CDN that provides a database solution (well, Akamai can write to a text file on NetStorage, but that doesn't quite count).. so if you can actually solve that problem in a way that scales and is seamless to end users, you've got a winner.

  4. #4
    Join Date
    Sep 2007
    Posts
    52
    DB replication isn't so bad one-way.. But the real issue only arrives at when you want to update a DB bi-directionally.. I wonder how places like gmail do it.. or even hotmail.. I presume they're not all accessing one 'master' db for writes, and then all the geographical locations for 'reads'.. seems kind of counterproductive to me.

    Custom development I'm not so much afraid of.. Being as that is my trade, it's within the scope of this project to write something that is both robust and dynamic. I'm just looking for more high-level ideas for now..

    Yeah, circulation replication.. Integrity would be an issue because which db would have priority? Let's say if two 'clients' are trying to write to the same row at the same time.. Well, no.. that really wouldn't work either.. The easiest way would have to be ntp to sync all the system clocks to UTC, then timestamp every db transaction.. push them all in chronologically.. There would be some lag, but I think that is unavoidable.. I think anything in the milliseconds is acceptable, but once we're talking like 10-20s the time to update might become a factor. Interesting.

  5. #5
    Join Date
    Dec 2007
    Posts
    64
    Quote Originally Posted by elsdon View Post
    DB replication isn't so bad one-way.. But the real issue only arrives at when you want to update a DB bi-directionally.. I wonder how places like gmail do it.. or even hotmail.. I presume they're not all accessing one 'master' db for writes, and then all the geographical locations for 'reads'.. seems kind of counterproductive to me.

    Custom development I'm not so much afraid of.. Being as that is my trade, it's within the scope of this project to write something that is both robust and dynamic. I'm just looking for more high-level ideas for now..

    Yeah, circulation replication.. Integrity would be an issue because which db would have priority? Let's say if two 'clients' are trying to write to the same row at the same time.. Well, no.. that really wouldn't work either.. The easiest way would have to be ntp to sync all the system clocks to UTC, then timestamp every db transaction.. push them all in chronologically.. There would be some lag, but I think that is unavoidable.. I think anything in the milliseconds is acceptable, but once we're talking like 10-20s the time to update might become a factor. Interesting.
    Not really, mysql offers an auto_increment_offset that can use on all of your master nodes. As long as your tables use an integer primary key w/ auto_increment you'll have no conflicts if two users try to insert the same data into two different masters at the same time. It's basically the same thing you were suggesting with the timestamps, but at the database level instead of application level.

    I use that with a master-master cluster from Dallas <--> Chicago and I've yet to have any problems.

  6. #6
    Join Date
    Sep 2007
    Posts
    52
    Hrm.. But do you have to design your application/tables to auto_increment_offset? I'd like something transparent to the application layer.. something where underlying application code doesn't need to be changed.

    That concept is the tough part.. If I was just writing one application to do this, it would be easy to self-manage all of it. But something where any random 'hosting client' let's say, could come in, upload their website, setup their db and have it run and replicate without them having to do anything extra on their side.

  7. #7
    Join Date
    Dec 2007
    Posts
    64
    Quote Originally Posted by elsdon View Post
    Hrm.. But do you have to design your application/tables to auto_increment_offset? I'd like something transparent to the application layer.. something where underlying application code doesn't need to be changed.

    That concept is the tough part.. If I was just writing one application to do this, it would be easy to self-manage all of it. But something where any random 'hosting client' let's say, could come in, upload their website, setup their db and have it run and replicate without them having to do anything extra on their side.
    It's on the database level, no reconfiguring for the application. There are actually two variables you need to change. auto_increment_offset and auto_increment_increment.

    auto_increment_increment changes how much the local mysql will increment by to determine the next value. With a large enough increment value and correct offsets for each server it's possible to guarantee non-conflicting writes to the same table assuming no other unique constraints apply.

  8. #8
    Join Date
    Sep 2007
    Posts
    52
    Quote Originally Posted by mchristen85 View Post
    It's on the database level, no reconfiguring for the application. There are actually two variables you need to change. auto_increment_offset and auto_increment_increment.

    auto_increment_increment changes how much the local mysql will increment by to determine the next value. With a large enough increment value and correct offsets for each server it's possible to guarantee non-conflicting writes to the same table assuming no other unique constraints apply.
    Hrm.. but it would only work if each table has a primary key.. which I assume any non-reference table would.. I will look into it. Thanks!

    I guess file level sync is pretty much all I have left to think about.. and rsync does have some 'real-time' like operations..

  9. #9
    Join Date
    Jun 2005
    Posts
    2,752
    Quote Originally Posted by elsdon View Post
    Closest 'mirror' of the user's site will be served via that hostname.
    Not exactly close of the user's site but close of the (ISP) cache nameserver that the user's computer is configured to use to resolv the dns queries.


    Quote Originally Posted by elsdon View Post
    Anyone done this before or tried? The actual caching part of CDNs and etc. I'm thinking I could always just deal with via squid later, but for now the main guts is what I'm concerned with.
    We are using geoIP DNS and VPS/dedicated servers around the world to cache static content. The main issues are: 1) VPS are a lot less stable than dedicated servers 2) Cheap VPS may eventually present low throughput -- a "distant" cache server may present better performance than a "near" cache server.

    To fix issues type 1 and to mitigate issues type 2 we use several monitoring servers which provide "smart" failover, changing the edge network node servers when detecting failures / low performance.

    Regarding the database problem, as we are software/SaaS developers and the CDN is "proprietary", built to run our applications, we run monitored regional Web application servers using partitioned tables, if applicable, and most of time Web services + stored procedures. The systems are designed from the zero to use the CDN architecture/infrastructure. No way to use any automatic database sync and off the question to use MySQL (sorry MySQL fans).

    IMO a CDN running generic applications without changes is a great challenge. Wish you good luck!
    Last edited by dotHostel; 05-12-2009 at 06:16 PM.

  10. #10
    Join Date
    Mar 2008
    Location
    Savage, MN
    Posts
    220
    Quote Originally Posted by dotHostel View Post
    We are using geoIP DNS and VPS/dedicated servers around the world to cache static content.
    Just curious - which geoIP implementation are you using?

  11. #11
    Join Date
    Jun 2005
    Posts
    2,752
    Quote Originally Posted by natecarlson View Post
    Just curious - which geoIP implementation are you using?
    Initially we used the patched version of Bind and Maxmind free country IP data. Works fine. But after a while we started to get the IP data directly from Arin. We have been using geoIP dns over 2 years.

  12. #12
    Join Date
    Sep 2007
    Posts
    52
    Hrm... Directly form Arin, weird. I never knew they had a city/country level breakdiwn for MaxMind..

  13. #13
    Join Date
    Jun 2005
    Posts
    2,752
    Quote Originally Posted by elsdon View Post
    Hrm... Directly form Arin, weird. I never knew they had a city/country level breakdiwn for MaxMind..
    They don't have. You just can get blocks of IPs allocated by each country. We have our own routines to process these blocks and to search. We couldn't base our entire business on secondary data.
    Last edited by dotHostel; 05-13-2009 at 05:29 AM.

  14. #14
    Join Date
    Jun 2005
    Posts
    2,752
    ftp://ftp.arin.net/pub/stats/

    The IP data is updated daily, sometimes more than once a day.

  15. #15
    Join Date
    Apr 2009
    Posts
    30
    Depending on the kind of DB load you have, you may be able to just set up a replication slave and do all of your read queries over the two.. and send the write queries to the master.

    You can also do mysql clustering, but that is very memory intensive and isn't easy.

    If you wanted to use DRBD but don't have a private network segment, then make one using OpenVPN.
    Argenta Hosting, LLC
    Business Grade VPS -- World Class Support Services -- Zero Over-subscription, guaranteed!
    On-site Consultations available in Central Arkansas
    12+ years of enterprise Linux experience

  16. #16
    Join Date
    Sep 2005
    Location
    Canada
    Posts
    646
    If your database usage is mostly reads rather than writes, a master-slave relationship works best. Thats what Google/Yahoo do when you search for example; millions of reads, relatively few writes that need to be replicated between databases.

    If you only have two database servers, a multi-master setup can work fine (auto_increment in mysql). Be sure you know how to resync them if a problem occurs, which will happen eventually.

    The other files can be synced with any number of tools, rsync works well depending on the delay that's acceptable to you.
    VPSVille.com
    Toronto, London, Dallas, Los Angeles
    Quality VPS hosting on Premium bandwidth

  17. #17
    Join Date
    Jun 2005
    Posts
    2,752
    All right. But the OP proposal is to offer a CDN service to run any application without change.

    That concept is the tough part.. If I was just writing one application to do this, it would be easy to self-manage all of it. But something where any random 'hosting client' let's say, could come in, upload their website, setup their db and have it run and replicate without them having to do anything extra on their side.

  18. #18
    Join Date
    Sep 2007
    Posts
    52
    Yeah. There's a lot of half-way solutions that I 'could' implement but I figure since we're in the conceptual part of the design.. I'm striving for an 'optimal' solution off the bat, and not a semi-hack solution that sort of almost works, in majority of cases.

    Google/Yahoo I'm aware their SEARCH is read based.. but what about GMAIL/YAHOO mail, is what I'm wondering.. Are those distributed applications via CDN? I wonder if they're just huge centralized applications, and 'file attachments' are located on CDNs.

    Please keep em coming guys. I wonder if we'll be able to devise something truly devious here.

  19. #19
    Join Date
    Apr 2009
    Posts
    30
    Here's a crazy outside the box idea.. Create a parity scheme and distribute data stripes w/ parity over several master servers.. use mysql to house the stripe and parity data as blobs. The data itself is in a file based DB format, like Db2.

    I suppose you'd have to have some sort of controller to route DB requests.. hm.

    Probably wouldn't work, but it is creative. :-)
    Last edited by ttedford; 05-13-2009 at 11:30 AM. Reason: Correction
    Argenta Hosting, LLC
    Business Grade VPS -- World Class Support Services -- Zero Over-subscription, guaranteed!
    On-site Consultations available in Central Arkansas
    12+ years of enterprise Linux experience

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •