hosted by liquidweb


Go Back   Web Hosting Talk : Web Hosting Main Forums : VPS Hosting : Setting up a CDN (sort of) with Multiple VPS
Reply

Forum Jump

Setting up a CDN (sort of) with Multiple VPS

Reply Post New Thread In VPS Hosting Subscription
 
Send news tip View All Posts Thread Tools Search this Thread Display Modes
  #1  
Old 05-12-2009, 02:11 AM
elsdon elsdon is offline
Junior Guru Wannabe
 
Join Date: Sep 2007
Posts: 52

Setting up a CDN (sort of) with Multiple VPS


Hi there,

As a personal project I've recently began thinking about setting up a CDN (sort of) with my various servers around the world. Basically, what it will do is.. From one 'management' point or portal (you can assume one physical server), each user can update/create their webpage there..

From that main node, it will have to replicate in REAL TIME:
a) userdir (public_html etc.)
b) mysql db's (circular mysql replication?)
c) Any updates to their apache.conf, php.ini etc..

Now I've been looking into several options.. drbd seems to be one that a lot of people suggest for the actual data sync, but I'm wary of implementing something like that outside of a local private LAN. I've also looked into rsync/unison which may work.. I'm just throwing this out there to see what people have tried and are using.

DNS will be handled by a GeoIP/Maxmind bind patch, and closest server will be given by my DNS servers. Closest 'mirror' of the user's site will be served via that hostname.

Anyone done this before or tried? The actual caching part of CDNs and etc. I'm thinking I could always just deal with via squid later, but for now the main guts is what I'm concerned with.

Thanks.
Elsdon



Sponsored Links
  #2  
Old 05-12-2009, 09:14 AM
YYamagishi YYamagishi is offline
Web Hosting Guru
 
Join Date: Jun 2007
Location: Tokyo, Japan
Posts: 333
I'm interested in this as well.
The hardest part is the data and SQL replication (lag) in my opinion.

__________________
Yudai Yamagishi


  #3  
Old 05-12-2009, 09:47 AM
natecarlson natecarlson is offline
Junior Guru
 
Join Date: Mar 2008
Location: Savage, MN
Posts: 217
This is hard, and often requires a bunch of custom development, as there really isn't that much software that will do this all for you.

Circular replication works, but only if your applications are highly tuned to ensure there are never conflicts.

The way Akamai and such do the file replication is basically that if the file doesn't exist yet on one node, it sends the request off to the other node(s).. again, custom software development.

I'd love to hear if you figure out a way to get this going!

Just to note - I have yet to work with a CDN that provides a database solution (well, Akamai can write to a text file on NetStorage, but that doesn't quite count).. so if you can actually solve that problem in a way that scales and is seamless to end users, you've got a winner.

__________________
-nc

Sponsored Links
  #4  
Old 05-12-2009, 10:06 AM
elsdon elsdon is offline
Junior Guru Wannabe
 
Join Date: Sep 2007
Posts: 52
DB replication isn't so bad one-way.. But the real issue only arrives at when you want to update a DB bi-directionally.. I wonder how places like gmail do it.. or even hotmail.. I presume they're not all accessing one 'master' db for writes, and then all the geographical locations for 'reads'.. seems kind of counterproductive to me.

Custom development I'm not so much afraid of.. Being as that is my trade, it's within the scope of this project to write something that is both robust and dynamic. I'm just looking for more high-level ideas for now..

Yeah, circulation replication.. Integrity would be an issue because which db would have priority? Let's say if two 'clients' are trying to write to the same row at the same time.. Well, no.. that really wouldn't work either.. The easiest way would have to be ntp to sync all the system clocks to UTC, then timestamp every db transaction.. push them all in chronologically.. There would be some lag, but I think that is unavoidable.. I think anything in the milliseconds is acceptable, but once we're talking like 10-20s the time to update might become a factor. Interesting.

  #5  
Old 05-12-2009, 01:14 PM
mchristen85 mchristen85 is offline
Junior Guru Wannabe
 
Join Date: Dec 2007
Posts: 63
Quote:
Originally Posted by elsdon View Post
DB replication isn't so bad one-way.. But the real issue only arrives at when you want to update a DB bi-directionally.. I wonder how places like gmail do it.. or even hotmail.. I presume they're not all accessing one 'master' db for writes, and then all the geographical locations for 'reads'.. seems kind of counterproductive to me.

Custom development I'm not so much afraid of.. Being as that is my trade, it's within the scope of this project to write something that is both robust and dynamic. I'm just looking for more high-level ideas for now..

Yeah, circulation replication.. Integrity would be an issue because which db would have priority? Let's say if two 'clients' are trying to write to the same row at the same time.. Well, no.. that really wouldn't work either.. The easiest way would have to be ntp to sync all the system clocks to UTC, then timestamp every db transaction.. push them all in chronologically.. There would be some lag, but I think that is unavoidable.. I think anything in the milliseconds is acceptable, but once we're talking like 10-20s the time to update might become a factor. Interesting.
Not really, mysql offers an auto_increment_offset that can use on all of your master nodes. As long as your tables use an integer primary key w/ auto_increment you'll have no conflicts if two users try to insert the same data into two different masters at the same time. It's basically the same thing you were suggesting with the timestamps, but at the database level instead of application level.

I use that with a master-master cluster from Dallas <--> Chicago and I've yet to have any problems.

  #6  
Old 05-12-2009, 01:53 PM
elsdon elsdon is offline
Junior Guru Wannabe
 
Join Date: Sep 2007
Posts: 52
Hrm.. But do you have to design your application/tables to auto_increment_offset? I'd like something transparent to the application layer.. something where underlying application code doesn't need to be changed.

That concept is the tough part.. If I was just writing one application to do this, it would be easy to self-manage all of it. But something where any random 'hosting client' let's say, could come in, upload their website, setup their db and have it run and replicate without them having to do anything extra on their side.

  #7  
Old 05-12-2009, 02:47 PM
mchristen85 mchristen85 is offline
Junior Guru Wannabe
 
Join Date: Dec 2007
Posts: 63
Quote:
Originally Posted by elsdon View Post
Hrm.. But do you have to design your application/tables to auto_increment_offset? I'd like something transparent to the application layer.. something where underlying application code doesn't need to be changed.

That concept is the tough part.. If I was just writing one application to do this, it would be easy to self-manage all of it. But something where any random 'hosting client' let's say, could come in, upload their website, setup their db and have it run and replicate without them having to do anything extra on their side.
It's on the database level, no reconfiguring for the application. There are actually two variables you need to change. auto_increment_offset and auto_increment_increment.

auto_increment_increment changes how much the local mysql will increment by to determine the next value. With a large enough increment value and correct offsets for each server it's possible to guarantee non-conflicting writes to the same table assuming no other unique constraints apply.

  #8  
Old 05-12-2009, 03:16 PM
elsdon elsdon is offline
Junior Guru Wannabe
 
Join Date: Sep 2007
Posts: 52
Quote:
Originally Posted by mchristen85 View Post
It's on the database level, no reconfiguring for the application. There are actually two variables you need to change. auto_increment_offset and auto_increment_increment.

auto_increment_increment changes how much the local mysql will increment by to determine the next value. With a large enough increment value and correct offsets for each server it's possible to guarantee non-conflicting writes to the same table assuming no other unique constraints apply.
Hrm.. but it would only work if each table has a primary key.. which I assume any non-reference table would.. I will look into it. Thanks!

I guess file level sync is pretty much all I have left to think about.. and rsync does have some 'real-time' like operations..

  #9  
Old 05-12-2009, 06:06 PM
dotHostel dotHostel is offline
Web Hosting Master
 
Join Date: Jun 2005
Posts: 2,569
Quote:
Originally Posted by elsdon View Post
Closest 'mirror' of the user's site will be served via that hostname.
Not exactly close of the user's site but close of the (ISP) cache nameserver that the user's computer is configured to use to resolv the dns queries.


Quote:
Originally Posted by elsdon View Post
Anyone done this before or tried? The actual caching part of CDNs and etc. I'm thinking I could always just deal with via squid later, but for now the main guts is what I'm concerned with.
We are using geoIP DNS and VPS/dedicated servers around the world to cache static content. The main issues are: 1) VPS are a lot less stable than dedicated servers 2) Cheap VPS may eventually present low throughput -- a "distant" cache server may present better performance than a "near" cache server.

To fix issues type 1 and to mitigate issues type 2 we use several monitoring servers which provide "smart" failover, changing the edge network node servers when detecting failures / low performance.

Regarding the database problem, as we are software/SaaS developers and the CDN is "proprietary", built to run our applications, we run monitored regional Web application servers using partitioned tables, if applicable, and most of time Web services + stored procedures. The systems are designed from the zero to use the CDN architecture/infrastructure. No way to use any automatic database sync and off the question to use MySQL (sorry MySQL fans).

IMO a CDN running generic applications without changes is a great challenge. Wish you good luck!


Last edited by dotHostel; 05-12-2009 at 06:16 PM.
  #10  
Old 05-12-2009, 08:25 PM
natecarlson natecarlson is offline
Junior Guru
 
Join Date: Mar 2008
Location: Savage, MN
Posts: 217
Quote:
Originally Posted by dotHostel View Post
We are using geoIP DNS and VPS/dedicated servers around the world to cache static content.
Just curious - which geoIP implementation are you using?

__________________
-nc

  #11  
Old 05-13-2009, 05:19 AM
dotHostel dotHostel is offline
Web Hosting Master
 
Join Date: Jun 2005
Posts: 2,569
Quote:
Originally Posted by natecarlson View Post
Just curious - which geoIP implementation are you using?
Initially we used the patched version of Bind and Maxmind free country IP data. Works fine. But after a while we started to get the IP data directly from Arin. We have been using geoIP dns over 2 years.

  #12  
Old 05-13-2009, 05:20 AM
elsdon elsdon is offline
Junior Guru Wannabe
 
Join Date: Sep 2007
Posts: 52
Hrm... Directly form Arin, weird. I never knew they had a city/country level breakdiwn for MaxMind..

  #13  
Old 05-13-2009, 05:23 AM
dotHostel dotHostel is offline
Web Hosting Master
 
Join Date: Jun 2005
Posts: 2,569
Quote:
Originally Posted by elsdon View Post
Hrm... Directly form Arin, weird. I never knew they had a city/country level breakdiwn for MaxMind..
They don't have. You just can get blocks of IPs allocated by each country. We have our own routines to process these blocks and to search. We couldn't base our entire business on secondary data.


Last edited by dotHostel; 05-13-2009 at 05:29 AM.
  #14  
Old 05-13-2009, 05:25 AM
dotHostel dotHostel is offline
Web Hosting Master
 
Join Date: Jun 2005
Posts: 2,569
ftp://ftp.arin.net/pub/stats/

The IP data is updated daily, sometimes more than once a day.

  #15  
Old 05-13-2009, 09:40 AM
ttedford ttedford is offline
Junior Guru Wannabe
 
Join Date: Apr 2009
Posts: 30
Depending on the kind of DB load you have, you may be able to just set up a replication slave and do all of your read queries over the two.. and send the write queries to the master.

You can also do mysql clustering, but that is very memory intensive and isn't easy.

If you wanted to use DRBD but don't have a private network segment, then make one using OpenVPN.

__________________
Argenta Hosting, LLC
Business Grade VPS -- World Class Support Services -- Zero Over-subscription, guaranteed!
On-site Consultations available in Central Arkansas
12+ years of enterprise Linux experience

Reply

Related posts from TheWhir.com
Title Type Date Posted
Elastica Launches Data Science Powered Cloud Security Solution, Raises $6.3M in Funding Web Hosting News 2014-02-20 10:32:24
Hivelocity Offers Customers Free Cloud Storage Web Hosting News 2014-01-13 15:16:53
How Web Hosts Can Keep Ecommerce Customers Online Beyond Cyber Monday Web Hosting News 2013-12-02 17:08:57
Vodien Launches Full Rack Colocation Solution at Singapore Data Center Web Hosting News 2012-12-27 16:07:25
Yahoo! Japan Saves Time Deploying Multiple SSL Certificates with GlobalSign Managed SSL Web Hosting News 2012-07-09 17:37:47


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes
Postbit Selector

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump
Login:
Log in with your username and password
Username:
Password:



Forgot Password?
Advertisement:
Web Hosting News:



 

X

Welcome to WebHostingTalk.com

Create your username to jump into the discussion!

WebHostingTalk.com is the largest, most influentual web hosting community on the Internet. Join us by filling in the form below.


(4 digit year)

Already a member?