Web Hosting Talk







View Full Version : New approach to bandwidth monitoring?


Jake29
04-01-2002, 09:50 AM
Okay- this is admittedly a half-baked proto-idea, and i'm not sure if it is even feasible, but I would like to hear other people's thoughts:

What I have learned about bandwidth monitoring is that the only elegant and comprehensive solution is to do bandwidth monitoring via ip addresses using a tool like bandmin, ntop, etc. So in a "reseller environment", each reseller gets and real, global IP address, so that the server owner can bill them for all the traffic of their site and all thier virtual hosts. The reseller, however, is unable to have the same ability to bill thier customers accurately, and at best can tally a set of log files for a more-or-less close approximation.

What I am wondering is whether there would be a way to set up a "virtual lan", so that each virtual host would have an lan ip, ie 10.0.0.1, 10.0.0.2, etc. You can do something like this for apache using mod_proxy and mod_rewrite, but of course, that would only be http traffic, and that's covered by log files. Is there some weird DNS chaining that could be done, so

Global DNS: example.com ==> public ip 204.x.x.x
Local DNS @ 204.x.x.x: example.com ==> local ip 10.x.x.x?

Now the bandwidth monitor can record all traffic for each 10.x.x.x, and each virtual host's traffic can be billed and reported accurately!

If anyone has any idea of what I am talking about, and has any suggestions on how it might be implemented, I would greatly appreciate your thoughts.

half-baked, proto-Jake29

jstout
04-01-2002, 03:16 PM
I think it could be done but you would run into problems with NAT when you started pushing decent amounts of bandwidth.

Starhost
04-01-2002, 03:52 PM
Don't do that! It looks nice at first but when you have loads of servers their. Your router won't handle it!

What I suggest is to monitor incomming and outcomming traffic for every user. And then count the total of all user (for a vhost) together for the correct bandwaidth :-)

Or just use webalizer and process the amount of datatraffic by running a simple PHP script :-) And just add for example 10% (for email / ftp traffic).

suc6.

priyadi
04-01-2002, 11:02 PM
[i]
Or just use webalizer and process the amount of datatraffic by running a simple PHP script :-) And just add for example 10% (for email / ftp traffic).
[/B]

Won't work. I've seen users who use email very extensively, and hardly use any web traffic at all.

Starhost
04-01-2002, 11:17 PM
Then how would you suggest keeping track of email activity for bandwisth monitoring? Because I really don't know a way to handle that.

webx
04-01-2002, 11:43 PM
Any qmail logs?

The sendmail logs do describe the size of email. So processing them you can find the traffic for each domain.

Starhost
04-02-2002, 04:20 AM
I use EXIM, and haven't find anything (yet) to process them to calculate bandwidth. I'll investigate that tonight. But woudn't it be easier to just calculate www+ftp and than add a percentage? Because mail isn't used that often. And further more I don't let my customers send mail by mij server (not from a mail client).

priyadi
04-02-2002, 09:40 AM
A few thing to note for those who want to calculate bandwidth on HTTP traffic.

- web server access_log doesn't log HTTP request, most HTTP request are small and therefore negligible, however things like file uploads can be very significant

- it doesn't take HTTP headers into calculation, for both requests and response.

- nph CGI scripts don't generate correct bytes count in access_log

The world is not perfect :(

Starhost
04-02-2002, 09:50 AM
May I ask you how you would calculate bandwidth?

(while using the namebased host option in apache)

kindred
04-02-2002, 10:21 AM
Depending on how "virtual" your customers are, its pretty simple to setup a perl module to watch packets in and out of the system.

That would then be fine for web/ftp where the customer has their own IP, and for mail you'd have to make sure they make the MX records point to their IP again and not your main server.

Same goes for any service, as long as you can get them to use their own IP (which could also be enforced with user id's in an ACL, which for example ipfw can do).

I played around with PCAP the other day, and managed to make pretty little graphs of every IP aliases on my interface (in bits/sec, ala MRTG/Cricket) using RRDTool.

Lots of options I think, just many ways to do it - some more accurate than others.

Andy.

Jake29
04-02-2002, 10:26 AM
Vexing, Isn't it?!?!? When you start to dig a little, basing bandwidth on log files is extremely deficient. There have been a lot of complaints w/ cpanel's inaccuracies, and one reason that I've been avoiding going that route is that there have been so many complaints. What i've been learning is that these aren't really so much deficiencies in cpanel, as much as they are industry-wide systemic flaws in the way virtual hosting is done.

Jake

Starhost
04-02-2002, 10:38 AM
kindred,

You are talking of per IP. That would mean that EVERY customer on my server would need their own IP. And with the current situation, RIPE isn't providing that much IP's for this kind of stuff.

So how you say it I wouldn't call it virtualhosting. But Ip-based hosting.

kindred
04-02-2002, 11:00 AM
Originally posted by Starhost
kindred,

You are talking of per IP. That would mean that EVERY customer on my server would need their own IP. And with the current situation, RIPE isn't providing that much IP's for this kind of stuff.

So how you say it I wouldn't call it virtualhosting. But Ip-based hosting.

If you can show a need for the IP, then you should be able to get them.

Other than that, sounds like log parsing is the only other method.

Sorry for the mixup with words, from the original post I assumed there was the ability to account per IP.

webx
04-02-2002, 02:13 PM
Ok, let me dig it further.

One thing is sure that with name-based hosting, it is inefficient and very difficult to make accurate BW usage graphs. So many logs and so many non-logs usage.

But with IP-based hosting, how would you include the usage of sendmail/qmail, FTP, file uploads thru HTTP, mails generated by scripts, and if SSH is provided: any FTP/HTTP downloads on the server itself? (wget, lynx, ftp)

My guess is everything may not be included. So this comes down to well-calculated approximation.

And this would be for resellers from one box. For big companies offering dedicated server, it's pretty easy for them to watch just one IP and bill the customer :(

Starhost
04-02-2002, 03:04 PM
On freeBSD (for example) their are standard scripts in ipfw that calculate Bandwidth per IP. And when everybody site is running on their own IP. Then ALL bandwith generated by that host will be calculated.

So all email has to be delivered tot the same IP as the websiite is using. That isn't that hard to calculate. The only thing were is need for is calculate Bandwidth using Namebased hosting.

jakis
04-02-2002, 03:19 PM
I think vhost people are not that hopeless but we need extra settings and way out of standard but if your current software didn't suit your jobs, go use others, don't stuck it. FTP server can generate CLF log format and MTA like Postfix has log analyzer developed by third party. Finally you need some programming effort to collect things up.

Jake29
04-02-2002, 03:29 PM
Starhost:

Okay- so back to the original question. What would it take to set up some proxy so that each host could use a 10.x.x.x address? Then you would get the comprehensive auditing of an ip, but not consume a global ip address?

Jakis:

That still doesn't take care of things like output generated by cgi scripts, etc. Not to mention it get convoluted when you start dealing with multiple pop accounts per domain etc.


PS- I tested, and you can bind more then 256 ips per Nic on 2.4 kernels in linux. I was afraid this was a problem, but it is not.

Jake29

webx
04-02-2002, 03:51 PM
I have a shared account on FreeBSD with SSH. I download big files using wget, lynx or ftp. All these downloads do not use my IP for web site. They use the main IP of the server.

:confused:

priyadi
04-03-2002, 12:21 AM
Originally posted by masood
I have a shared account on FreeBSD with SSH. I download big files using wget, lynx or ftp. All these downloads do not use my IP for web site. They use the main IP of the server.

:confused:

Linux is able to measure data transfer for each user id (instead of IP address). Just program your bandwidth monitoring scripts to count data transfer initiated from inside the server by looking at the owner of the connection, not by its local ip address. I'm sure there got to be something similar in FreeBSD.

bacid
04-03-2002, 02:26 AM
trafcount works under fbsd and it calculates bandwidth usage by user..

i believe they are hosted on sourceforge.

good luck

jakis
04-03-2002, 03:58 AM
my users run php/perl scripts that get files from other servers . he cannot get a too big file due to the execution time limit. But the scripts get files every damn visit which result in bandwidth lost to main IP without recording to log files. most php/perl run without execCGI so every files were downloaded from the same UID as webserver.

Starhost
04-03-2002, 04:11 AM
So the maner to calculate the bandwith is monitoring it from every user on the server and than add the totall of all users running on the same account.

To get the datatraffic used by that website. I'll see if I can develop a litlle script for it using PHP and freeBSD. Wish me luck :-)

Jake29
04-03-2002, 05:04 AM
Priyadi:

PLEASE PLEASE PLEASE give me a pointer to something that tracks connections by user id on linux. Anything at all!!! I have searched all over google, etc to find anything on linux that does this. Even a technique to get that information to help get me started would be SOOOOOOO appreciated!!! Please please please help me out, just a little.

If I could get that, then real accurate accounting becomes almost.... possible :stickout

Jake

Jake29
04-03-2002, 06:00 AM
This looks great, but unfortunately, won't run on a modern linux kernel.

http://www2.empnet.com/ipacct/

Jake

Starhost
04-03-2002, 06:47 AM
Originally posted by Jake29
Starhost:

Okay- so back to the original question. What would it take to set up some proxy so that each host could use a 10.x.x.x address? Then you would get the comprehensive auditing of an ip, but not consume a global ip address?


You would just need to setup one LAN and then let all the traffic go to a standard gateway. And that gateway is connected with the gate way of your ISP.

At least this is how it should be I think. Then there is a problem. This will generate a enormous amount of NAT requests. And I think your standard gateway (from your LAN) won't be able to handle it. Therefor it will slow down. And that isn't a good thing. So I think that this isn't a very good solution. I must admit, I also thought of it.


I looked at trafcount for my server and I don't think it is a good thing, just because my server run's as httpd. So all http traffic will be calculated for the user httpd and not to the owner of the files that are transfered. So I guess the best thing you could do to calculate bandwidth is, use the loggs of FTP and WWW and add a certain amount for all other traffic that you didn't handle. So just at 10% for example. That is how I think is the best. If you people got another idea about this please let me know. So we can disguss it :-).

jakis
04-03-2002, 06:59 AM
Good idea. I should calculate major traffic like http through it's own log file and other service which has no transfer log like pop3 could be calculated on user basis. One problem persist , what if a user run http scripts that get files from remote server like nph-proxy.cgi , how can I include this activity ? can I setup local proxy to count these traffic ?

priyadi
04-03-2002, 10:01 AM
Originally posted by Jake29
Priyadi:

PLEASE PLEASE PLEASE give me a pointer to something that tracks connections by user id on linux. Anything at all!!! I have searched all over google, etc to find anything on linux that does this. Even a technique to get that information to help get me started would be SOOOOOOO appreciated!!! Please please please help me out, just a little.

If I could get that, then real accurate accounting becomes almost.... possible :stickout

Jake

Sorry, I don't know any specific program that does that. But I know it is possible. And it is only possible for daemons that run as user, for instance it won't monitor bandwidth usage for Apache on each user as Apache run as a single userid.

Take a look at iptables man page on the option --uid-owner.

Starhost
04-03-2002, 10:17 AM
I don't think there is a real method to calculate the bandwidth wich is being used throw CGI scripts. What I should do is calculate a standard percentage extra for an account to cover such traffic.

Ahmad
04-03-2002, 12:42 PM
Originally posted by priyadi
A few thing to note for those who want to calculate bandwidth on HTTP traffic.

- web server access_log doesn't log HTTP request, most HTTP request are small and therefore negligible, however things like file uploads can be very significant


As this is bandwith to the server, it is downstream, so it might not be of any importance. The other points, however, are very true.

The numbers in log files are those sent in the HTTP responce, they can be easily tempered with in any CGI script. This might be exploited by experienced users that want to push large amounts of bandwith (a proxy server or a downloads station), so he creates a script that pushed all the large files and reports 1 kilobyte for each, for example. For the visitors of the site, it will be working perfectly, except that they will get strange progress bars for the downloads. The downloads' progress bar will reach 100% very soon, while the download will actually continue and take a longer time, so user will not get an indication of how much is left of the file to download.

EDIT: Somehow, I posted this reply without reading the rest of the thread, sorry for that.

When I read the rest of the thread, I still find people asking about situations that are actually all about downstream (getting files from SSH, fetching files from other server by PHP and CGI scripts, .. etc).

Does that really matter?

webx
04-03-2002, 02:05 PM
Originally posted by Ahmad


As this is bandwith to the server, it is downstream, so it might not be of any importance.

... situations that are actually all about downstream...

Does that really matter?

Yes. The "big brother" providing dedicated server charges based on what is passing through LAN card. So it does matter.

jakis
04-03-2002, 02:23 PM
user who write simple scripts to retrieve webpage/files from remote servers on every call is a true pain. His scripts didn't check if remote file is modified or not. I have a user like this where webalizer reported 5GB/mo usage but after moving his sites off the server , the bandwidth reduced more than 10GB/mo.

Starhost
04-03-2002, 02:47 PM
If you have such a user. I would just email him and say what is hoing on and that it should be changed. Further more only very experienced clients know how they can use a PHP/CGI/PERL script to get files that are really big without parsing them to the http logs.

webx
04-03-2002, 02:53 PM
Starhost,

You can only do that (asking customer what's going on?) if you have put something like that in your TOS as not allowed things.

Starhost
04-03-2002, 03:07 PM
All TOS shoudl have in it that when it uses to much things resources you can report it.

Further more how should you handle it?

webx
04-03-2002, 03:39 PM
Originally posted by Starhost

Further more how should you handle it?

This is what we are trying to find out :confused: some technical solution to complete BW monitoring.

Starhost
04-03-2002, 04:03 PM
You will never be able to measure the actual BW a client transfer. At least that is what I'm thinking. Therefore I'm just processing the logs I have and add a certain percetage over it.

los
04-03-2002, 05:55 PM
you can also create vlans on your swithces then for each vlan create a community and run a tool like mrtg or this one i found that reads the mrtg comunity logs from the switch of router.

http://bjorn.swift.st/traffic/

just my 2 cents...

-carlos

Ahmad
04-04-2002, 02:18 AM
I see. I had the impression that it is only downstream you are charged for.

Thanks

priyadi
04-04-2002, 08:49 AM
Originally posted by Ahmad
I see. I had the impression that it is only downstream you are charged for.

Thanks

We are charged for bandwidth, both incoming and outgoing. Incoming bandwidth is not significant at the moment, however with things like web services, RDF, P2P, etc, it is going to be used more in the future.

It is only a matter of time when a user is going to create his own web directory by downloading the entire database from dmoz.org

...not that I'm giving them ideas :)