gabeosx
01-27-2002, 04:59 PM
Anyone know what software can be used to create virtual servers (ala FreeVSD) in FreeBSD? Everyone says that its integrated in the OS, but I was wondering how to use it.
![]() | View Full Version : Virtual Servers in FreeBSD 4.5 gabeosx 01-27-2002, 04:59 PM Anyone know what software can be used to create virtual servers (ala FreeVSD) in FreeBSD? Everyone says that its integrated in the OS, but I was wondering how to use it. ScottD 01-27-2002, 10:12 PM Look at jail(8), I believe this is what you are looking for. Works great, but be prepared for some problems. I have not yet been able to get MySQL to work within a Jail on FreeBSD 4.4, but I think this is changed in 4.5. Good luck. ScottD 01-28-2002, 01:28 PM Update: The MySQL trouble I was having was my own bad. I am building a system that automatically deploys virtual Jail environments and I failed to use the --preserve when tarring the distribution. This caused /tmp to be non-world writeable therefore causing MySQL to choke when trying to create /tmp/mysql.sock as use mysql. Whew! FreeBSD jails seem to be very good so far, but they do lack a controllable CPU/Memory meter. For disk you can use vnodes via vnconfig (4.x-RELEASE) or mdconfig (5.x-CURRENT). I believe 5.0 will introduce some better restrictions and access control via TrustedBSD. We shall see! Scott davidb 01-28-2002, 02:38 PM Freevsd does not work in FreeBSD. If you want to use FreeBSD, I would recomend waiting for hostgui or getting plesk. ScottD 01-28-2002, 03:26 PM Neither Plesk nor HostGUI will create virtual private serves similar to those of FreeVSD. The jail(8) system command will do so, as was asked in the original question regarding "Everyone says that its integrated in the OS." T_E_O 01-29-2002, 09:27 AM Originally posted by DizixCom Update: The MySQL trouble I was having was my own bad. I am building a system that automatically deploys virtual Jail environments and I failed to use the --preserve when tarring the distribution. This caused /tmp to be non-world writeable therefore causing MySQL to choke when trying to create /tmp/mysql.sock as use mysql. Whew! FreeBSD jails seem to be very good so far, but they do lack a controllable CPU/Memory meter. For disk you can use vnodes via vnconfig (4.x-RELEASE) or mdconfig (5.x-CURRENT). I believe 5.0 will introduce some better restrictions and access control via TrustedBSD. We shall see! Scott Hi Scott, I'm working on something similar. I've encountered the same mysql problem and also found the same solution :) I'll have a look at vnodes / vnconfig, thanks for the tip :) If I think of something that could be useful to you, I'll PM you :) jks 01-29-2002, 10:15 AM Originally posted by T_E_O Hi Scott, I'm working on something similar. I've encountered the same mysql problem and also found the same solution :) I'll have a look at vnodes / vnconfig, thanks for the tip :) If I think of something that could be useful to you, I'll PM you :) Another solution is to use a proxy application that transfers the data normally sent through the mysql.sock socket through TCP/IP to MySQL on the non-jailed server (or server in another jail). T_E_O 01-29-2002, 10:39 AM Originally posted by jks Another solution is to use a proxy application that transfers the data normally sent through the mysql.sock socket through TCP/IP to MySQL on the non-jailed server (or server in another jail). Could you name me a util to do this ? I'm not sure if I want one big MySQL server or many small ones. One big server is probably easier for backup purposes, but I already see how customers will fight for a database named 'forum' or 'polls' :D jks 01-29-2002, 10:45 AM Originally posted by T_E_O Could you name me a util to do this ? I'm not sure if I want one big MySQL server or many small ones. One big server is probably easier for backup purposes, but I already see how customers will fight for a database named 'forum' or 'polls' :D Hmm, I know a Linux util named "sock" that does this (find it on Freshmeat). I think I just did a POE Perl script for it, when I needed it. A single big MySQL server is to be preferred over many smaller ones. This is mostly due to increased performance, as caches, etc. can be shared by the users. ScottD 01-29-2002, 11:14 AM TEO, I PM'd you but it seems to have magically disappeared. I would be interested in sharing ideas with you on the obstacles you encounter and such. I am prototyping all of my automation tasks right now using shell scripts, and so far I've been able to: 1. build distribution set (make world/make distribution) 2. create vnode file system image with the distribution set 3. reuse the vnode image to create many jails 4. backup the jail (simply copy the image!) 5. start and stop the jail from outside 6. remove the jail. My goal is to automate this, writing the scripts in python and later building a Zope interface to make it totally web configurable. Beyond that, I'd also like to 'cluster' this so one point of control will run several servers. My prototype is pretty much done, I'm really just spending time researching other various methods of rolling out the distribution (hard links, union file systems, etc). Let me know if you run into anything or if you've already encountered things I've not yet! I know bandwidth usage monitoring is one thing I have not even tried to figure out, but it will be a must. Scott ScottD 01-29-2002, 11:18 AM For what it's worth, in a jailed environment you really wouldn't have control over the user installing MySQL themselves so you might as well just give it to them. I have thought hard and long on ways to restrict these things, but they can install Oracle as long as they have the disk space so why bother building the base installation with something that isn't really standard? In theory, I agree, it's a great idea. I would definitely prefer to have a separate MySQL server serving the private nodes, but it just isn't enforceable. On the other hand it might make for a really good managed private node. I hadn't thought of it like that yet. For those who don't really need root access, but do need the specialization earned from a jailed environment. Jks, thank you for adding yet another possibile tool to my ever growing arsenal! Eventually maybe I'll actually figure out exactly where I want to go :) Scott Anatole 01-29-2002, 12:57 PM We also run virtual private servers (VPS) in jailed FreeBSD 4.4. environment. Do you know how to enable quotas inside VPS? So are user could create virtual servers and use quotas for each of them? Backup of VPS is another problem. If you "(simply copy the image!) " - you will have a lot of disk space wasted with distributions of apache, mysql, sendmail or any other software your VPS are using. Any ideas how to optimize it? Anatole 01-29-2002, 12:59 PM By the way: did you try to put some kind of control panel inside VPS? We give Webmin for VPS users, but it is not for novices, so a lot time wasted supporting it. Will plesk run in VPS environment? T_E_O 01-29-2002, 02:04 PM Hi Scott, I did get your PM, but this information might be useful for others too. Until today I have not been looking into vnconfig, as I was unaware of its existance. I was working on letting the jails share a base installation using hard links and chflagging them to schg. As you know, flags cannot be removed from within a jail, so people from different jails cannot modify each others libraries and/or binaries. But installing a new version of a program that is installed in the jail by default would be impossible this way, as they cannot even remove these hardlinked files. Then I got the idea of putting those hardlinks all in /skel and then symlink all files in the usr, sbin, bin etc. etc. directories to those hardlinked files in /skel, but replacing binaries and so with your own files would still be a hassle, as you'd have to remove the symlink first and most installation scripts are not designed to do that. So that's not a good option either.... You're talking about reusing a vnode image to create many jails. Do you mean you're just making a copy for every jail or are you using one image for all jails ? It's true that you cannot force a user to use your central mysql database, but I can explain to them that it is much safer because that database is replicated by the backup server and things like that. So I think I'll be able to convince them to do that, but of course they can of course do whatever they like. I have created a daemon that constantly monitors all jails. For every jail that should be up and running it checks if there is a process running in it. If there isn't, it starts the jail. And it also takes care of shutting down jails that should be down or are unknown. It shuts down a jail by just killing all the processes that are running in that jail. Do you have a 'friendlier' way to do this ? :) The daemon I created is written in C and gets its data from a mysql server. At the moment I'm creating a PHP frontend to this database for easy administration. Traffic monitoring is easy in my opinion. I use ipfw to do this. For every IP assigned to a jail I make two rules like these: 'ipfw add 1000 allow all from any to ip/32' and 'ipfw add 1000 allow all from ip/32 to any' If you run ipfw -a then it will show the used traffic in the third column in bytes. You can reset the counters with 'ipfw zero'. I hope we can exchange some more ideas / suggestions. Best regards, Hans Allis ScottD 01-29-2002, 02:22 PM On quotas, I have not tried enabling them inside a jail but I doubt they would work without matching uid/gid with an account on the outside. That would be a real pain to maintain. Further, I am not sure that would even work inside a vnode file system. Interesting thought to apply some research to though. Remember, a jail isn't a true virtual environment. It's just an enhanced chroot. On backups, disk space is real cheap and good backups are invaluable. I really like the idea of copying a single file into place, mounting it, and viola -- back in business. Also, bzipping the image will cut it down quite a bit and you can then store several backups on off-site tapes / vaults. On control panels, I have not tried anything beyond webmin so far. I am really not sure how they'll work. You can only have one IP in a jail with no exceptions so you are definitely limited to name based hosting only, and never more than one SSL certificate per jail. That is until TrustedBSD is available, I think. I think as time goes on, better things will begin trickling out of the FreeBSD world for managing and running jailed environments. For now, I think they are best used for single sites or simple name based hosting. Scott T_E_O 01-29-2002, 02:36 PM Originally posted by Anatole By the way: did you try to put some kind of control panel inside VPS? We give Webmin for VPS users, but it is not for novices, so a lot time wasted supporting it. Will plesk run in VPS environment? I'm trying to install plesk in a VPS at this very moment just to answer your question. I'll be using a custom-made control panel when my hosting server goes in production. ScottD 01-29-2002, 02:43 PM Hans, For reusing the vnode images, I mean simply copying it over and over again. I really don't see a good way to make a shared installation unless it is restricted like you say, with chflag schg. I am pretty sure 'make world' actually sets these flags automatically. I am really beginning to like the idea of providing a separate MySQL instance and defaulting mysql clients to point to that instance. It is making more and more sense the more I look at it, especially for managed solutions. I think a lot of potential customers might actually be intimidated by having to manage all of the software themselves, so they'd be best with a remote mysql instance. And, to take that farther, you can offer other services like PostgreSQL quite easily, giving them many options without having to install anything. Your daemon sounds pretty cool! I haven't gone as far as building watchdogs yet, but here is essentially what I do to bring down a jail:# first execute /etc/rc.shutdown $jail_cmd /bin/sh /etc/rc.shutdown # now find all remaining processes and kill them find /proc -name "status" -print | while read stat; do grep "${domain_name}$" $stat >/dev/null 2>&1 if [ $? -eq 0 ]; then p=`echo $stat | cut -f3 -d'/'` kill -TERM $p kill -KILL $p fi doneIt seems to work pretty good, but leaves a couple httpd's occasionally. I haven't spent a lot of time perfecting this process yet. As I said, I'm still prototyping everything. Thanks a bunch for the bandwidth monitoring! This will save me a ton of research time! Scott <edit> I just re-read this. Please bear in mind that the above quote is typed in from memory, and makes use of existing variables. When creating a new jail I automatically create a 'site_env.sh' which can be sourced into any environment with a few variables defined: domain_name and jail_cmd are among them. 2nd edit, sheesh just slap me! The -d'' should read -d'/', and now does. </edit> T_E_O 01-29-2002, 03:06 PM Too bad: plesk won't run inside a jail :( Plesk needs the shmget() system call which is disabled inside a jail because it could compromise security :( T_E_O 01-29-2002, 03:36 PM Hi Scott, At the moment I don't see a way to share standard files either :( This would really be a great way to save disk space, but it's just not gonna be that way :( Can you tell me if the rc.shutdown script does what it should do ? I think the way you're doing this won't work, because it will execute in a parallel jail with the same hostname and not in that same hostname. So in my opinion it won't kill any processes inside that jail :( Your ideas about also adding one central postgresql server or something sounds good... But I'd have to learn how those work :D I think I'm gonna let my clients choose between a managed and an unmanaged jail. The unmanaged jail meaning that they can only use the control panel to change passwords and a few basic things. The managed jail would allow a user to add and remove domain names, change e-mail settings and stuff like that. With an unmanaged server, people will get the root password of the server and as long as the server is managed, they won't get it :) This ensures the integrity of the files created and used by the control panel and saves me from handling support requests like "hey, I have a managed server, but I broke the control panel. Can you fix this for me ?" Also, I will configure all programs to have their data in a /data directory in the jail. That way I can easily just make backups of those files to tape, because I only have 6 gigabytes of tape-space available. The backup server will keep in sync with the entire contents of the primary server and only files under /data are put on tape :) What do you think of this ? Best regards, Hans ScottD 01-29-2002, 03:52 PM Here is an excerpt of running /etc/rc.shutdown with ps -aux before and after:ssh root@jail01 "ps -aux" USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 1631 1.0 0.4 2312 1940 ?? SJ 3:35AM 0:00.05 sshd: root@notty ( root 1534 0.0 0.1 1048 700 ?? IsJ 3:24AM 0:00.00 /usr/sbin/inetd -w root 1536 0.0 0.1 1004 724 ?? IsJ 3:24AM 0:00.01 /usr/sbin/cron root 1538 0.0 0.3 2224 1696 ?? SsJ 3:24AM 0:00.30 /usr/sbin/sshd root 1560 0.0 0.6 3632 2948 ?? SsJ 3:24AM 0:00.05 /usr/local/sbin/ht root 1563 0.0 0.1 644 452 p0 IJ 3:24AM 0:00.01 /bin/sh /usr/local mysql 1576 0.0 1.0 26980 4936 p0 SJ 3:24AM 0:00.06 /usr/local/libexec www 1577 0.0 0.6 4032 3104 ?? IJ 3:24AM 0:00.01 /usr/local/sbin/ht www 1578 0.0 0.6 3648 2944 ?? IJ 3:24AM 0:00.00 /usr/local/sbin/ht www 1579 0.0 0.6 3648 2944 ?? IJ 3:24AM 0:00.00 /usr/local/sbin/ht www 1580 0.0 0.6 3648 2944 ?? IJ 3:24AM 0:00.00 /usr/local/sbin/ht www 1581 0.0 0.6 3648 2944 ?? IJ 3:24AM 0:00.00 /usr/local/sbin/ht www 1599 0.0 0.6 3660 2956 ?? IJ 3:25AM 0:00.00 /usr/local/sbin/ht root 1632 0.0 0.2 1096 776 ?? SsJ 3:35AM 0:00.01 csh -c ps -aux root 1527 0.0 0.1 948 664 ?? SsJ 3:24AM 0:00.01 /usr/sbin/syslogd root 1633 0.0 0.1 404 244 ?? RJ 3:35AM 0:00.00 ps -aux bash-2.05a# ssh root@jail01 "/bin/sh /etc/rc.shutdown" Shutting down daemon processes: mysqldstty: stdin isn't a terminal apache. Saving firewall state tables:. bash-2.05a# ssh root@jail01 "ps -aux" USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 1660 3.0 0.4 2312 1940 ?? SJ 3:36AM 0:00.05 sshd: root@notty (s root 1534 0.0 0.1 1048 700 ?? IsJ 3:24AM 0:00.00 /usr/sbin/inetd -wW root 1536 0.0 0.1 1004 724 ?? IsJ 3:24AM 0:00.01 /usr/sbin/cron root 1527 0.0 0.1 948 664 ?? SsJ 3:24AM 0:00.01 /usr/sbin/syslogd - root 1661 0.0 0.2 1096 776 ?? SsJ 3:36AM 0:00.01 csh -c ps -aux root 1662 0.0 0.1 396 244 ?? RJ 3:36AM 0:00.00 ps -aux root 1538 0.0 0.3 2224 1696 ?? SsJ 3:24AM 0:00.30 /usr/sbin/sshd I am not sure it works if you just use the jail(8) command, but since the jail is already running you can run it with ssh nice and easy, and that actually logs into the "virtual node" before executing. Works like a charm! Uh, so far at least. :D I am with you in thinking about managed and unmanaged solutions. I think the managed solution is great for a lot of people who need to keep their stuff more secure -- no php snooping or other silly troubles. The unmanaged solution provides great services for the more technically savvy, and likely at a discounted rate. The idea of a /data directory is also very good. It may be difficult to force some apps to use it, but most should work just fine. Personally, I'm not as concerned about disk -- perhaps I should be! :crap: Scott T_E_O 01-29-2002, 04:07 PM Oh i see. That explains it to me. Running it through ssh should indeed work. Maybe I should do that too, it's a good idea :) I'm not sure if I'll offer the unmanaged jail at a lower price, because it allows customers to install extra programs which use up cpu and memory... but I'm not sure yet :) Having everything put in a /data directory is not as hard as it seems. Now that I think of it, the passwd files and alike might cause some trouble... maybe I'll just hardlink those in a /data/config directory so that they'll be backupped too :) Apache and proftpd can be configured to have their config files in the /data dir pretty easily. I'm not exactly sure how I did it, but it was pretty easy. I'm concerned about disk space because I only have limited space available on the NAS server of the colocation facility. I'd have to pay 10 dollars or so per extra gigabyte. Hans Anatole 01-29-2002, 04:19 PM And how about quotas inside Jails? ScottD 01-29-2002, 04:23 PM For quotas: Like I mentioned in a previous post, it would be very difficult without having the same UID/GID's defined on the host as those in the jail. For example, if you have user joe in group joe, then you would have to create joe in the jail and the host and the UID and GID for joe would have to match. I don't see another way right now, but maybe there is something simple that I'm missing. If you are just trying to limit disk usage by a single jail, then the vnode file system (man vnconfig) will solve it. You can also growfs a vnode to allocate more space if requested. For vnodes I am having trouble suppressing each mounted vnode from showing up in 'mount', even when exec'd from within the jail. While this isn't really a problem, it may be considered too revealing. Scott T_E_O 01-29-2002, 04:48 PM I'm not sure about the quotas as I don't think I will use those :) I'm trying to figure out if I can hide the mounts from processes inside a jail :) I'm looking at /usr/src/sys/kern/vfs_syscalls.c right now to see if I can hide them T_E_O 01-29-2002, 05:23 PM Okay, I have modified the kernel, compiled it and the server is now rebooting... will edit this post when it gets back up ;) hehe, it doesn't show those vn* mounts from within a jail anymore, but instead it shows a rule like this: " on ()" for every mounted vnode. So I'll have to look at this a little closer :) edit2: rebooting again :) edit3: Ok, this is great, check it out. First the output from running 'mount' on the host machine: [teo@VC2]:~$ mount /dev/twed0s1a on / (ufs, local, soft-updates) /dev/twed0s1f on /usr (ufs, local, soft-updates) /dev/twed0s1e on /var (ufs, local, soft-updates) procfs on /proc (procfs, local) /dev/vn0c on /usr/realities/test1 (ufs, local) Now, I'll show you the output of 'mount' when logged into a jail: [teo@VS202]:~$ mount /dev/twed0s1a on / (ufs, local, soft-updates) /dev/twed0s1f on /usr (ufs, local, soft-updates) /dev/twed0s1e on /var (ufs, local, soft-updates) procfs on /proc (procfs, local) This is really great. I'm now even considering faking the entire mount output... :) I could even make it look as though I'm running a machine with some kind of expensive scsi raid controller :D Maybe that would be wrong, but I can however show ONLY the mount of the root partition to jail-users and maybe the procfs mount, but nothing more :) Okay, I'll let you know tomorrow what I was able to do :) T_E_O 01-29-2002, 06:24 PM Okay, I made some nice progress already :) My jails are gonna be named 'realities' and the entire service will be named 'second reality' (I am not affiliated with secondreality.com !). Now look here what a nice output 'mount' gives when run from inside a jail: [teo@VS204]:~$ mount /dev/2ndrlty on / (ufs, local, soft-updates) procfs on /proc (procfs, local) [teo@VS204]:~$ All else is hidden... :D :D Scott, if you want the code changes I made in the kernel, let me know. After all it was your idea to hide them :) ScottD 01-29-2002, 06:32 PM Wow, nice! That was quick. :D I think I'll take the challenge and attempt to do this myself, though a couple of hints would be nice. I was looking at getvfsent.c and wasn't certain that was the right place. Where did you make the changes? Scott dside443 01-29-2002, 06:45 PM what about networking? does each user get its own network device to secure it? and memory usage? are processes killed or memory quota'd? ScottD 01-29-2002, 06:52 PM jail does not provide any virtualization of memory or cpu. It isn't a true virtual environment, just an enhanced chroot. For networking, a jail is assigned one IP and cannot have any more than that. The IP is an alias to the host NIC. Raw sockets are prohibited in a jail. Scott ScottD 01-29-2002, 06:56 PM I found the code for limiting the mount results in the jail. The getfsstat() function in vfs_syscalls.c, should be cake from here! Scott T_E_O 01-29-2002, 06:58 PM Originally posted by DizixCom Wow, nice! That was quick. :D I think I'll take the challenge and attempt to do this myself, though a couple of hints would be nice. I was looking at getvfsent.c and wasn't certain that was the right place. Where did you make the changes? Scott hehe :) I modified vfs_syscalls.c in the kernel source. I don't think you should modify the entire c library and I don't think it would work either, as one could just run getfsstat directly from the kernel, I think... but I'm not that much of a kernel hacker :) Anyway, I changed the getfsstat function. You should check if p->p_prison is NULL or not. That tells you if the process calling the function is jail()ed or not... Oh btw, I modified the kernel once before to enforce limits on memory usage. When calling the rset_limitr function or something like that it would also check if the process is jail()ed or not and deny or allow the call accordingly. Now I could really enforce these limits on a jail and even the root inside the jail could not raise those limits :) T_E_O 01-29-2002, 06:59 PM Originally posted by DizixCom I found the code for limiting the mount results in the jail. The getfsstat() function in vfs_syscalls.c, should be cake from here! Scott Hehe.. you're pretty quick too :D T_E_O 01-29-2002, 07:07 PM Okay, I'm gonna get some sleep. Could luck with the code Scott. Hans ScottD 01-29-2002, 07:10 PM This is some pretty good stuff. I'll have to recap it all and post it later on. You've done the memory cap and the mount hiding, now I wonder how difficult it would be to cap CPU as well, or at the very least cap the niceness allowed within a jail. Then figuring out how to make these sysctl's... Sounds fun! :D DigitalXWeb 01-29-2002, 08:09 PM Wow!! Great work guys.. You have actually solved a few problems I have run into trying to get this to work properly.. I will be testing out your methods on a test box shortly. Keep up the great work!! T_E_O 01-30-2002, 03:43 AM Capping the niceness... we should be able to get that working :) But I'm working on the idea of sysctl's first. I think I've already got the first one working (jail.fake_mounts) but I'm waiting for the box to come back up :) DigitalXWeb: it's nice to hear that, thanks! I've you've got any suggestions, please share them with us :) T_E_O 01-30-2002, 03:45 AM Hehe, look what I've got: [root@VC2]:/home/teo# sysctl -a|grep fake_mo jail.fake_mounts: 0 [root@VC2]:/home/teo# :D Now all I've gotta do is let vfs_syscall.c look at it before deciding if it'll hide the mounts or not... piece o'cake :D oh btw, I found something about the quotas inside jails: /* XXX PRISON: could be per prison flag */ static int prison_quotas; #if 0 SYSCTL_INT(_kern_prison, OID_AUTO, quotas, CTLFLAG_RW, &prison_quotas, 0, ""); #endif looks like that's work in progress.. :) T_E_O 01-30-2002, 04:36 AM Scott, I guess somebody already worked out the idea of capping the niceness: [root@VC2-Control]:/home/teo# renice -5 182 renice: 182: setpriority: Permission denied [root@VC2-Control]:/home/teo# renice 10 182 182: old priority -4, new priority 10 [root@VC2-Control]:/home/teo# renice -4 182 renice: 182: setpriority: Permission denied Saves us some work :D T_E_O 01-30-2002, 05:07 AM By the way, Scott, are you familiar with DUMMYNET ? I'm considering capping every jail to 25Mbps by giving all of them their own 'pipe', so that noone will be able to eat up all available bandwidth. The box itself will be on to a 100Mbps connection. Does this make sense ? ScottD 01-30-2002, 08:45 AM Teo! You are on a roll... I had attempted to make the change to vfs_syscalls.c and everything seemed okay at first. Basically, I managed to create a hung process trying to unmount a filesystem and I haven't got my server back up yet (it needs a hard boot). Serves me right for not experimenting on the box I have in house. So, what do we have now: 1. Bandwidth monitoring. 2. Vnodes 3. Hiding mounts 4. Capped niceness Potential madness: 1. Capped memory 2. Quotas 3. Possible dummy net. Interesting idea the DUMMYNET. It would be really cool to be able to assign a DUMMYNET device which could then be aliased with multiple IP's. I'm off to deal with work stuff and hopefully will be back working on this a bit later, from the machine I have here this time. Scott T_E_O 01-30-2002, 10:22 AM Hi Scott, I hung my server too yesterday, but luckily I have it very close to me, about 10 meters :) I'll give you a list of what's working, what's not and what has to be done. Bandwidth monitoring: Done. Not very hard to do using ipfw. Vnodes: Done. And using quotas on a vnode is no problem. Hiding mounts: 80% done. Can switch this on and off with a sysctl variable. The device name of the root filesystem can be renamed, but the new name is also visible on the host system, which isn't very nice. I'll work on creating a fake mount to show when a jailed process asks for a list of mounts. I'll also add a sysctl variable to set the name of the device of that fake root mount. Capped niceness: Done. It seems this was already done by the freebsd development team. Capped memory: 99% done. I have modified my kernel to disallow jailed processes to raise the resource limits. So I think you can easily limit an entire jail to a maximum of, let's say, 100 megs of memory. The fixation of the resource limits is also controllable via a sysctl variable. I'll have to do some checking to make sure that the memory limit is for the total of all the processes in a jail and not for every single process. Quotas: Work in progress. Using quotas on a a vnode is no problem, but using them from within a jail seems to be non-standard. I found some variables in the kernel that are related to this subject, but I haven't had the time yet to figure that out. Also I'm not sure if I'll have a use for this. The size of the entire jail is capped by using the vnodes and I don't think customers will have a use for limiting the size per user. Dummynet: Done. Dummynet is a standard feature that can be enabled when compiling the kernel. But we could have a little discussion on how to use it and maybe get some great ideas from each other. Scott, are you familiar with dummynet ? You can create so-called tunnels. For every tunnel you can set bandwidth, delay and packet loss. Then you can use ipfw to pass specific traffic through this tunnel. So you could send all traffic from and to your server through one tunnel or you could send all http traffic origination from your server through one tunnel and all other traffic through another tunnel. Or you could make a tunnel for every single jail and send all traffic from and to that jail through its own tunnel. There are a lot of options available with dummynet, but at the moment I think I'm gonna give every jail a tunnel and cap every tunnel at say 15Mbps. That way no single jail can use all available bandwidth for himself :) Btw the name 'tunnel' is a little misleading because that term is also used for connecting to network over another network or making an ipx/spx connection over tcp/ip and things like that. Dummynet tunnels have nothing to do with those tunnels ;) The SysVipc issue Until I started playing with jails I had no clue as to what this was. I've seen the name some times, but I had nothing to do with it. From within a jail, sysvipc access is denied by default because it could be a security risk. However programs such as plesk and postgresql require sysvipc access and are thus unable to run inside a jail. SysVipc access can be allowed by simply changing a sysctl variable. I have tried this and I've now got plesk running like a charm, but I'm not sure of the security issues involved. It would be cool if we could enable this access without any risks so that almost any freebsd software will run in a jail :D Can anyone tell me some more about this ? Oh and Scott, have you heard of 'jailng' ? I ran into this this afternoon and it seems quite interesting. It talks about giving each jail its own sysvipc space and being able to insert processes into the jail from the host system, which is not possible with normal jails at the moment. But from what I read it's still far from ready for production systems, so I guess we won't be able to use it :( Regards, Hans ScottD 01-30-2002, 11:01 AM I just did a reinstall of FreeBSD 4.4 on a machine I have available in house, so I'll be using it for testing from now on. Short walk to reboot while picking up a Mt. Dew. :D This Dummynet thing sounds very interesting. I have some reading to do on this one. What system calls are made to create the tunnels? I'll do a google scan real quick and try and learn. Quota's are probably not much of a concern for me personally either, but it might be good to know for future reference. I'll leave it as a low priority. SysVIpc, interprocess communications, most notable shared memory (shm). SHM is a real efficient way for processes to communicate locally. On 32 bit machines you can move 32 bits in one operation (movsd) using (I think) the same number of processor cycles as an 8 bit move (movsb). Instead of using TCP/IP to localhost, using shared memory essentially removes any communications bottlenecks. I think Mysql uses TCP exclusively and could gain a lot by using shared memory. You may be able to set Mysql up that way now, I've never really tried. I think the security hole in jails with SysVIpc enabled is that they can talk to other processes via the shared memory, and they are root so they have full permissions. This provides a mechanism for jails to cross over into other jails or into the host itself. JailNG looks very promising. I've tried to visit Mr. Watson's page on the subject but it always comes back 404 - Not found on server. I am pretty sure that it has been entirely integrated into TrustedBSD, which I think will be the ultimate BSD hosting platform -- sometime in 2003 :) On my plate right now, in order of priority: 1. Upgrade local server to 4.5-RELEASE. 2. Implement mount hiding with sysctl. 3. Study Dummynet From there, who knows! Eventually we should probably share some of our code and perfect these things. I wonder if any of the FreeBSD folks would be interested in reviewing for an actual patch. There may be some overlap with what they are doing in 5.0. Scott T_E_O 01-30-2002, 01:51 PM Hehe, modifying the kernel is indeed easier on a box that's physically very close to you :D I don't know which system calls to use to setup tunnels, coz I've always done that through ipfw. Thanks for your explanation on the shared memory stuff. I'm gonna try to find out if I can keep a jailed process from accessing memory that is in use by a process in another jail... That might take some time ;) Oh and I've made these two sysctl's: jail.fakeroot_enabled jail.fakeroot_devname So you can turn the fake replies on and off and you can configure which fake device to show as the device that the root filesystem is mounted from. So, if enabled, 'mount' only shows a procfs mount on /proc and a /dev/fake (or whatever value you set the sysctl too) mount on / All other proc mounts are also hidden because otherwise it would show you a proc mount for every jail :) I couldn't find any downloads on jailNG either :( the only thing I'd really like to borrow from it is the separate sys v memory :) Too bad TrustedBSD is not production ready as it'll be a really nice hosting OS indeed :) I'm not sure if our code is usable for the new FreeBSD as I think they'll rewrite a large part of the jail code... :( But of course it's worth trying :) Hans T_E_O 01-30-2002, 04:07 PM Hey Scott, listen up :) I think I made some good progress making IPC usage secure for use with jails. For every memory segment that is registered I also store the p->p_prison pointer :D And I'm checking if the prison address stored with the memory segment registration is the same as the on stored in the struct of the calling process. And then I deny or allow the access accordingly. I really believe I'm getting somewhere now and I'll be able to run postgresql and plesk inside jails :D :D :D I'll share the code with you, if you like and if it's finished ;) (I've been modifying sysv_shm.c) Hans ScottD 01-30-2002, 04:23 PM Oh my, you are just humming along with thist stuff! I haven't had much time today to get back to work but tomorrow hopefully will be different. We need to figure out a place to share this code, which seems to be all yours right now anyways! In any case, the IPC stuff is probably one of the most important parts for success in this! Excepting a completely virtualized network device allowing multiple IP's within the jail, I'm not sure what could be more beneficial. Keep it up man, I only hope I can actually contribute something before you beat me to it. :D :D My local machine is doing a build world right now on the 4.5-STABLE source tree where I'll start working the code a bit. A couple more ideas to toss out there. The prison pointer can be used to uniquely identify a jail, so what is stopping us from enhancing 'ps' to spit out PID's based on a jail 'id'? This certainly beats the trick of looking in /proc/${pid}/status for the IP. Sure, an IP uniquely identifies a jail as well, but what if we were able to virtualize a network interface so the jails could have more than one IP? This stuff is starting to look really cool! Scott T_E_O 01-30-2002, 04:45 PM Hehe.. well, I'm modifying the kernel at a pretty high speed, but the problem is that I am not sure if the code I produce is really acceptable :) Even though it's working perfectly it might have some disadvantages that I'm unaware of. It would be nice to be able to share the code indeed as you might be better at finding security risks in my code than me.. :D I don't think I'll be any good at the network idea you came up with. Having multiple ip's available would be fantastic, but I don't have any clue on how to implement such a thing. The address of the prison pointer sure is nice, but it's not a practical number to work with. We should try to let the kernel give each started jail a unique id somewhere between 0 and 65000 just like it does to prisons. We could even call it a JID :D It'd be fun to issue commands like 'killjail 41' :) Maybe you could give this and/or the networking a try. I'll be looking into the sys v a little deeper. I think memory sharing is working now, but I haven't even looked at the messages yet :) T_E_O 01-30-2002, 05:19 PM Okay, the semaphores are done too. I'll have a look at the messagings stuff tomorrow :) All the sys v ipc functions are still enabled and sisabled with the jail.sysvipc_allowed sysctl. I think I'm gonna get some sleep now, because I'll really have to do some work tommorow on my normal job :D I've been spending too much time thinking about jail features we could implement :) Hans ScottD 01-30-2002, 05:27 PM Haha, sleep well! Hopefully I'll be able to fill in for you tomorrow and come up with some equally hair raising patches! I am now looking into net/if.c and there is only one place where jails are even mentioned. I'll focus on this area for now. Scott DigitalXWeb 01-30-2002, 11:14 PM Geesh, You guys are really rolling along with this.. Can you give me an update either here or PM as to what you are currently working on and what is currently working. I dont have much time during the week to help out with this but I do have some time on the weekends. If you are concerned about security issues with the modifications, if we can get the code posted somewhere I have a person who is able to look through it and test everything out fairly quickly and let us know if there are any security issues that may need to be addressed. Keep up the great work!! Hopefully this will also work in 4.5 with little or no modification. T_E_O 01-31-2002, 04:14 AM Hi Scott and DigitalXWeb, I found out that the fake root device is not perfect yet, because for example plesk looks straight through it :) I'll have to figure out what system calls plesk uses to do this, because even programs suchs as 'df' report only the faked root mount and nothing else... I've come up with some (in my opinion) great stuff: When a jail is started, a prison structure is made. In this prison structure we could store a unique ID number, just as Scott mentioned some posts ago. But it should also be possible to keep track of which processes are in that jail and we should be able to insert processes in the jail. Imagine a system call named 'jail_join' which you just pass the ID of the jail. It'd be really nice to just run a command such as: jailrun /bin/sh /etc/rc.shutdown to make a jail shutdown. And after that you could just check to see if the prison structure is gone. And if it's not, you could just get a list of the processes in it and kill them... :D To prevent a race condition (processes spawning inside the jail faster than we can kill them) we could add an element called 'locked' or Sth and modify the kernel so that it will not allow any more processes to be spawned inside that jail when this element is set. I guess we're getting somewhere near the jailNG code if we could get this to work ;) Brian: Thanks for your offer. I'll try to keep track of what and where I modify so that he won't have to check the entire kernel source :D I'm developing on a 4.5-PRERELEASE box at the moment, so I think it won't be much trouble to get it running on 4.5-RELEASE :) T_E_O 01-31-2002, 05:27 AM Looks like I'll have to modify all the statfs system calls too :D I'll work on that today and maybe I'll have some time left to checkout the other things I've been talking about in my previous post :) T_E_O 01-31-2002, 07:28 AM Ok, I bent and twisted my kernel a little more and plesk's falling for it :) I'll give you a rundown on what it does and what it doesn't : From outside the jail, everything appears as it really is. From inside the jail: getfsstat (used by mount and df to obtain a list of all mounted filesystems) shows only the mount of /proc and the statistics of the REAL root filesystem instead of the vnode that the jail's files are in. The problem with this is that inside the root it will look like there's not much space left, because the real root filesystem is almost always pretty small. So a run of df -h would show for example: Filesystem Size Used Avail Capacity Mounted on /dev/secondreality 194M 72M 106M 41% / procfs 4.0K 4.0K 0B 100% /proc Which is not a pretty sight :) I think I will be able to work around this by: - Finding the path of the jail in the prison structure - Finding the mount that this path belongs to with use of fstat() - Reporting that mount as if it is mounted from the fake root device and as if it is mounted on / Statfs cooperates nicely when run from inside a jail, but it reports the real mountpoint. So it would for example show /dev/fake as the root fs device, which is nice. But it would also show that /dev/fake is mounted on /usr/jail/123.123.123.123 :( But I can change that fairly easy :D Ok, I'll let you know how this goes :) ScottD 01-31-2002, 02:58 PM Long day so far! I am just now able to start hacking away, server is rebuilt with 4.5-RELEASE and ready to roll. Once again, Teo, your progress is incredibly fast! I was thinking about ways to plug any holes and it came to mind to start trussing some key processes from within the jail (truss mount), which should show us all of the system calls. You can also attach to running processes (truss -p pid). Naturally this will not show everything, but it should help a bit. I have not yet really looked into the networking code too much and it may be a bigger bite than I can chew, but I'll not know until I really look. I'm wondering if we should consider taking this offline from WHT as well. I am not sure too many people are really interested in reading all of our jibberish. I can set up a server to hold code and a forum in no time at all. If so, I can also post a final message here indicating where other interested people can go to participate or lurk. Off I go! I'll now attempt to mess with the net code a bit. Scott DigitalXWeb 01-31-2002, 03:12 PM Moving this off of WHT may be a good idea, since it seems to be a small group interested in it at this time. Let me know where you move it if that is done, I would like to help out where I can, just need the time to do so :). The rate you both are going by the time the weekend gets here you will have most of the things already figured out ;). Keep up the great work, I am also building a 4.5 box so I can use it as a test box.. T_E_O 01-31-2002, 03:54 PM Hi Scott and Brian, I'm trying to fix the get getfsstat system call.. I'm already fixed it so that the path of the jail is stored in the prison structure, so I can find out the path of the jail when a call to getfsstat is made. Problem is trying to figure out which mounted filesystem this path is on. I don't think I can just make a call to statfs, because it assumes that the path is in userspace memory. If you have any suggestions on this, they're very welcome :D I'll try working on the JID idea this evening (it's 8:40 pm at the moment) and see if I can get that working :) The truss utility might be very useful indeed. I'll have a look at it to see how I can use it. Thank for the great idea ! Moving off this forum might indeed be useful, but the downside of this is that my post count on WHT won't increase as fast as it's been doing during the last few days :dgrin: If it's not problem to you, Scott, it'd be nice if you'd set up a forum for this and maybe some place to dump code to. Brian I guess there will be more than enough coding to do when the weekend starts :D Hans ScottD 01-31-2002, 04:42 PM Okay, all set. Browse to www.makefile.com. There is only one link to get to the forums. I'll have to figure something out as far as file storage, but you can add attachments to posts. Scott allera 10-13-2002, 10:32 PM Have you guys progressed any further on this? I tried www.makefile.com but got a forbidden error. I was following this thread back when in January but forgot all about it. I've got a bit of spare time on my hands and wanted to help out with the project -- if there is anything left to help out on by now. Also, are there any improvements in the latest (or potential future improvements) FreeBSD code regarding jail()? Thanks! Alex ScottD 10-14-2002, 09:45 PM T_E_O had made most of the progress, but I haven't heard from him in quite some time. Last I heard he was up and running successfully with several mods to the FreeBSD kernel that dealt with the IPC and file system hiding issues. Who knows, by now he's probably written a whole virtualized networking layer and figured out how to implement Ensim's fairsched on top of it! With the right talent, these things could be done but I'm personally not up to the task. My skills just don't take me into the world of hard-core kernel hacking required to make something like this work nicely. I don't think it is hard to do necessarily, but it does take some serious caution and intense testing to make sure your environments are secure. Regarding Makefile.com, one day I'll develop the domain into something useful. I once had a dream of building a glorius developers portal, but was sidetracked by so many things and now there are so many good ones out there it seems pointless to reinvent something that I have no unique idea for. So right now it just makes a pretty good vanity email addresss. allera 10-14-2002, 10:13 PM Originally posted by DizixCom My skills just don't take me into the world of hard-core kernel hacking required to make something like this work nicely. I don't think it is hard to do necessarily, but it does take some serious caution and intense testing to make sure your environments are secure. I feel the same way. I'll see if I can find T_E_O and find out how he progressed with it. T_E_O 10-16-2002, 02:38 AM Hey guys, nice to see this old thread get alive again :) I am indeed up and running and I could show you guys a demo of it sometime this week. I'm not that much of a hardcore programmer and I don't think I would be able to change the scheduling code inside the kernel, but it's nice to hear that you believe I could :D DD-SNC 10-16-2002, 02:57 AM Wow. This is a really good thread. I just wanted to say that 4.7 has some new features to help with the memory sharing.. mortimer 10-16-2002, 03:11 AM Check out Virtuozzo for BSD - http://virtuozzo.com. It does support FreeBSD 4.5 AFAIK. allera 10-16-2002, 09:14 AM Originally posted by T_E_O I am indeed up and running and I could show you guys a demo of it sometime this week. Definitely. I'd like to poke around and see what features you've added to jail. :) ScottD 10-16-2002, 09:33 AM Hello T_E_O! Glad to hear that things are running good for you. A demo would be really cool, especially of the admin tools. I remember you had mentioned that you had the ability to start and stop processes in jails among other things, but I removed he bloody forums database that had our old conversations. I know I had that backed up somewhere, hrm. Here is a link to the fairsched stuff, it's designed for linux but I imagine the principles are pretty similar and it may be adoptable to FreeBSD. I'm not certain. Link: http://fairsched.sourceforge.net T_E_O 10-16-2002, 11:28 AM Hi Scott, It's nice to be back here :) I've got the ability to start a process and then 'inject' it into a running jail :D That's a must-have if you want to be able to give support to your clients on issues like configuration and/or scripting problems. I can easily launch a new shell into a jail so I don't have to have everyone's root password. Furthermore I can freeze a jail, which means that it's impossible to fork new processes inside the jail. This is very nice if you want to shutdown a jail because it is impossible for a process to keep forking and forking faster than you can kill them :) And of course it's possible to get a list of running jails from the kernel with the corresponding hostname and IP-address. And.. and... and... :) Oh well, I'll show you guys some of the possibilities sometime this week :) mind21_98 02-08-2003, 09:55 PM I'm not sure if anyone is interested, but I just had this idea recently. I found a patch called mijail (from http://garage.freebsd.pl/) which seems to implement multiple IP support for jails. I ported it to 5.0-RELEASE and added a few sysctls and things: pacific# sysctl -a | grep jail jail.jails.test_freebsd_org.max_ram: 0 jail.jails.test_freebsd_org.max_cpu: 0 jail.jails.test_freebsd_org.max_procs: 0 jail.jails.test_freebsd_org.procs_used: 7 jail.jails.test_freebsd_org.ram_used: 3407872 jail.jails.test_freebsd_org.cpu_used: 0 security.jail.set_hostname_allowed: 1 security.jail.socket_unixiproute_only: 1 security.jail.sysvipc_allowed: 0 pacific# ram_used is updated every time a new process is forked and every time a process exits. The limiting functions seem to work great, considering that they haven't been tested too well. I believe I may have also stepped on all of you with regards to the other features, since I also have mountpoint hiding: pacific# df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/da0s1a 257838 85038 152174 36% / devfs 1 1 0 100% /dev /dev/da0s1e 257838 8 237204 0% /tmp /dev/da0s1f 2753428 1500778 1032376 59% /usr /dev/da0s1d 257838 61112 176100 26% /var /dev/md0 302190 145620 132396 52% /usr/jail/0 devfs 1 1 0 100% /usr/jail/0/dev procfs 4 4 0 100% /usr/jail/0/proc pacific# ssh mooneer@10.0.0.3 Password: [...] %df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/md0 302190 145620 132396 52% / devfs 1 1 0 100% /dev procfs 4 4 0 100% /proc % devfs does seem to perform some checks if it's mounted in a jail, since the system didn't screw itself by me overwriting /proc/kmem in that particular jail. I'm considering fixing the SysV functions next, since those will be needed for control panels to work properly. After that, I'll add some sysctls that will let me add and remove IP addresses in real time without having to take down the jail first. Finally, I'll write some system calls that automatically kill all processes in a particular jail. If I don't end up losing interest first, I may even write a Web interface for it. :) ScottD 02-09-2003, 12:36 PM Hey mind, it seems that you've been doing some good work. Any plans to share some of your kernel mods? How are you doing the mounting? Are you using a union file system or vnconfig? Curious minds want to know! mind21_98 02-09-2003, 01:55 PM Originally posted by DizixCom Hey mind, it seems that you've been doing some good work. Any plans to share some of your kernel mods? How are you doing the mounting? Are you using a union file system or vnconfig? Curious minds want to know! Unionfs, according to the man page, is *very* unstable. Didn't stop me from trying it for 5 minutes though. I'm currently using mdconfig (replaced vnconfig in 5.0). While messing around, I noticed I could use the "disklabel" command on mdconfig created disks. I was able to use it to create "partitions" for the VPS. This would come in handy later on in case a client needs a custom partitioning scheme. :) I want to share the mods, but I'm not sure how that'd work if later on I decide to sell licenses to the Web-based application I described earlier. T_E_O 02-09-2003, 04:47 PM mind21_98: nice work. Thanks a lot for the link to the multiple-ip-patch. If I'd known that it's so simple to implement this I would've put some more effort in it last year :) Okay, I hope I'm not gonna regret this, but I'll just share the kernel adjustments I made so far. I'm aware that my code is not always as neat as it could be (*cough* second half of jailshow.c *cough*), but it works perfectly. I can't remember exactly if the normal jail() system call will still function if you add these modifications to the kernel, but I believe I paid some attention to that, so it prolly will. If you spot any possible security issues in this code, please inform me about it at teo@t-e-o.demon.nl. Oh yeah, it patches just fine against the RELENG_4 cvs branch dated 02/08/2003, but you should make a little adjustment to kern/syscalls.master (remove the UNIMPL lines for 364 through 366) and run "sh kern/makesyscalls.sh" I forgot to tell, these modifications implement: - Mount partitioning - Freezing / unfreezing of jails (frozen means you can't fork() in the jail) - Listing of all jails with IP's and pathes - Inserting processes into the jail - Usage of the Sys V IPC function SHM and SEM mind21_98 02-12-2003, 12:29 AM Update: I looked around a bit in the FreeBSD scheduler, and found it may be possible to edit scheduling priorities based on certain jails. I wrote a quick code snippet that takes nice values into account as well as a special jail sysctl. :) This is what I get when I boot up the modified kernel and start top on both the jail and the host system: 1001 1323 1320 0 96 0 2132 1384 select S+ p2 0:00.04 top 1001 1332 1329 0 129 0 2100 1344 select S+J p4 0:00.03 top (fifth column is the priority field) In this case, I set the sysctl value to 50 (it ranges from 5 to 100). The bottom line is the top that was running in the jail. I'm not sure if this has any actual bearing on the amount of CPU time a certain process uses though. On 5.0-RELEASE though, it appears that the actual priority values according to ps go from -64 to 191 or so, making it more difficult to determine if it's actually working. mind21_98 03-04-2003, 02:03 AM All right. I have a patch now, with a working CPU/RAM limiter. If anyone's interested, it's at http://msalem.translator.cx/dist/jail_seperation.v6.patch. Features: Mountpoint hiding Quota support Number of Processes/CPU/RAM limits "Hiding" jails from non-root non-jail users (including jail filesystems) Multiple IP address support (with INADDR_ANY support - no need to explicitly specify IP addresses in configuration files) Segmented SYSV support Hot addition/deletion of IP addresses while jail is running This patch is designed for FreeBSD 5.0. If anyone has any feature suggestions or bugs, please let me know. |