Check what's new at the Revision History at the end of this document. Thank you for (proof)reading this. If you have anything to tell me about it, don't hesitate, tell me: Goesta.Smekal@chello.at.
Good point ! ;-) I do not claim to be the first/best/most comprehensive/...
author covering this topic. As Neo says in the movie 'The Matrix': 'I am
just another guy.' In fact, there are people who wrote on this earlier. Just
check the Related Pages. I also don't provide
a 'turnkey solution' (a term from computing stoneage, when computers got
delivered by trucks. As far as I know it was kind of an advertising term
for a system that was ready to use at the turn of the (power) key), there
are others.
What this paper will cover are the fundamential basics about traffic metering and some approach to store the data collected for later processing.
This article is intended for interested audience, who like to poke around in the system they use. If you are looking for a './configure && make && make install' approach, you are wrong here, so go now, find another solution.
Imagine the following situation: You are in control of a small (or maybe not that small) LAN connected to the Net via a leased line or flat rate ISP. Your online time doesn't matter, but the trandfered data does. They charge you for every byte above some limit.
To be able to control this situation, you need IP Traffic Accounting
. In other words, some system counting every bit that moves up or
down your leased line.
I was in such a situation, when we upgraded our Internet connection from
ADSL to a 2MBit leased line at my employer recently. Being responsible
for the whole IT system I wanted to know exactly how much data traverses
the line and what services are used most.
I will now describe the techniques I used to achieve this goal, the software I cose and the scripts I wrote.
+---------------+
------LAN------>eth0 | Linux Gateway | eth1<------DMZ------
office PCs +---------------+ public access
eth2 servers
^
|
|
+--------------+
| ISP's router | -> leased line --> the Net
+--------------+
There is one ethernet segment homing the office PCs, I call it the 'LAN'.
Another one forms a 'demilitarized zone' (DMZ) which means, these hosts
are accessible from the Net via official IP addresses and hostnames, but
protected by the packet filtering gateway. Finally that gateway is connected
to the Net via the ISP's Router.
The Gateway has three network interfaces, but only one (eth2 in the example) must be passed by all the traffic.
If you lack full control over your gateway, your accounting will not be accurate, since you have to tweak the gatewas'y configuration. So get root now ;-)
Alternatively, you can just put another box in front of your 'black box'-gateway
which will just pass all traffic on ...
Further on you need Perl, some DBMS (I use PostgreSQL for reasons explained in a later version of this doc) and a way to transfer data from the gateway to the DBMS-host (ncftp might serve well). Apache and some nifty HTML forms, as well as gnuplot for graphs are optional.
The data gets collected at regular intervalls (by a script launched from
cron) and written to a file. Every night another script extracts the relevant
figures (Prerl was designed to do exactly such things), puts them together
in a tabular ASCII file and transferes it to the host homing the database.
There the file is read in by another Perl script, which stuffs the traffic
count into an SQL DB and (optionally) uses gnuplot to draw a nice daily
graph of the bandwith utilization.
Finally Perl and CGI help creating interactiv web pages, showing statistics about arbitrary periods.
To collect the byte count of various IP traffic types I create a new chain
called Accounting. This chain contains a line for every kind
of packet I want to count, but no targets, so the packet just traverses
the chain untouched.
Put the following lines somewhere at the beginning of your netfilter configuration file:
iptables -N Accounting
iptables -A Accounting -o eth2 # upstream traffic
iptables -A Accounting -i eth2 # downstream traffic
iptables -A Accounting -p tcp -m multiport --ports www # HTTP
requests and response
iptables -A Accounting -p tcp -m multiport --ports pop3,smtp
# e-Mail
Remember, in our example, eth2 is the interface connected
to the router. Replace it with your 'external' interface name.
The first two '-A' lines just count any packet going out that interface and in from it. Sum them up and you have got the overall traffic, regardless of protocol or direction.
In the last two lines we get somewhat specific. The multiport keyword matches any ports mentioned afterwards, be it source or destination port. So we count requests and responses in the same line.
If you want an ISP like setup, counting traffic from hosts or subnets just add something like:
iptables -A Accounting -s 192.168.1.10
If 192.168.1.10 is the host you need to watch. With subnets
it works alike. I just have to check at netfilter.org for subnet adress
specification ... back soon.
To activate the chains it is necessary to insert the following lines at the top of the FORWARD chain:
iptables -A FORWARD -i eth2 -j Accounting
iptables -A FORWARD -o eth2 -j Accounting
If any line remains before those, packets matching it don't get counted.
And if anybody knows how to put this into one line, let me know. It would
ba a lot more beautiful.
To get our job done we need to store the data counted. This is done in three stages. First at regular intervals IPTables counts are appended to a file:
#!/bin/bash
date >> /root/scripts/Accounting.dat
iptables -L -Z Accounting -vxn >> /root/scripts/Accounting.dat
This is run hourly by cron. Feel free to change the interval to your needs.
I like the hourly count because I can watch the peaks during daytime.
The second line writes a timestamp to the file while the last one reads
data from iptables, resetting chain counts at the same moment. -vxn
means 'verbose' (bytecount), 'exact numbers' (not 125k but 125130)
and 'no name resolution' (don't care about hostnames).
After this is done we can continue with more sophisticated processing:
We now have created a file containing a number of data lines for every snapshot. This was easy to get, but is hard to process. I will show you a script that converts it into tabular form and puts the resulting file to the host running the SQL database.
This is done for security purposes, because I don't feel too well with a DBMS running on the firewall.
#! /usr/bin/perl
#
# Tabelizer
#
# v0.2 30.07.2002 (c) Goesta Smekal
#
# tabelize IPTABLES Output
system ('mv /root/scripts/Accounting.tbl /root/scripts/Accounting.tbl-bak');
open (OUTFILE,">/root/scripts/Accounting.tbl") || die "Output file not writeable: $!\n";
open (INFILE,"</root/scripts/Accounting.dat") || die "Input file not readable: $!\n";
print OUTFILE "Time\tDay\tDay\tMonth\tYear\tupstream\tdownstream\tweb\tmail\n";
while (<INFILE>) { # read from stdin
if (/(^\w{3})\s+(\w{3})\s+(\d{1,2})\s(\d{2}[:]\d{2}[:]\d{2})\sCEST\s(\d{4})/)
{
$Day=$1;
$Month=$2;
$Nday=$3;
$Time=$4;
$Year=$5;
print OUTFILE "\n$Time\t$Day\t$Nday\t$Month\t$Year\t";
next;
}
elsif (/\s+(\d+)\s+(\d+)\s/)
{
$Packets=$1;
$Bytes=$2;
print OUTFILE "$Bytes\t";
next;
}
else {next}
}
print OUTFILE "\n";
close (OUTFILE);
close (INFILE);
system ('mv /root/scripts/Accounting.dat /root/scripts/Accounting.dat-bak');
system ('ncftpput -E -u username -p **** db.host accounting /root/scripts/Accounting.tbl') || die "Cannot put file to destination host: $!\n";
I will explain detail later. Just at the last line:
Replace username with a valid username on the destination
host and **** with the according password. db.host
should contain the hostname the file is transfered to.
I run this script every day at 23:05 from cron too. That way my database
gets updated daily and the worst loss after a system crash is one day's
traffic count, which can usually be interpolated from existing data with
little error.
Let's assume you have got an SQL database containing a table for you accounting
data. I use PostgreSQL (the reasons
I chose this would be enough to start a flame on MySQL vs. PostgreSQL so
I don't mention them ;-) ). My table definition reads:
Table "accounting"
Attribute | Type | Modifier
--------------+--------------+----------
A_Time | time | not null
A_LDay | character(3) | not null
A_NDay | smallint | not null
A_Month | character(3) | not null
A_Year | smallint | not null
A_upstream | integer |
A_downstream | integer |
A_web | integer |
A_mail | integer |
To put my data in there I use the Perl module Pg which can be installed easily from CPAN. (details later, I just try to satisfy the impatient)
#!/usr/bin/perl
#
# Accounting-SQL
#
# stuffs IP Accounting Data into PostgreSQL DB
#
use Pg;
open (INFILE,"</intranet/htdocs/accounting/Accounting.tbl") || die "Input file not readable: $!\n";
open (OUTFILE,">/tmp/plottemp") || die "Cannot write to Outputfile: $!\n";
$pghost = 'localhost';
$pgport = 5432;
$dbname = 'syswatch';
$login = 'syswatch';
$pwd = '*****';
$conn = Pg::setdbLogin($pghost, $pgport, $pgoptions, $pgtty, $dbname, $login, $pwd);
( PGRES_CONNECTION_OK eq $conn->status )
and print "Pg::connectdb ........... ok\n"
or die "Pg::connectdb ........... not ok: ", $conn->errorMessage ,"\n";
while (<INFILE>)
{
if (/^Time/)
{ next } # first line, only for Humans !
elsif (/(\d{2}[:]\d{2}[:]\d{2})\t(\w+)\t(\d+)\t(\w+)\t(\d{4})\t(\d+)\t(\d+)\t(\d+)\t(\d+)\t(\d+)\t(\d+)\t(\d+)\t(\d+)/)
{
$sum=($6+$7)/1048576; # division converts to MB (optional)
$up=$6/1048576; $down=$7/1048576; $web=$8/1048576; $mail=$9/1048576;
$time=$1;
unless ($day)
{ # store first day found for filename
# so we don't get confused by morning
$day=$3; $lday=$2;
$month=$4; $year=$5;
}
print OUTFILE "$time\t$sum\t$up\t$down\t$web\t$mail\n";
$result = $conn->exec("INSERT INTO accounting VALUES ('$1', '$2', $3, '$4', $5, $6, $7, $8, $9)");
}
}
close (INFILE);
close (OUTFILE);
open (FILE,"</root/Scripts/gnuplot-options") || die "Gnuplot control file inacce
ssible: $!\n";
open (NEWFILE,">/tmp/tempfile") || die "Tempfile not accessible: $!\n";
while (<FILE>)
{
s/^set title.+/set title "IP Traffic in MB, am $lday, $day\.$month $year"/;
s/Tue/Die/; s/Wed/Mit/; s/Thu/Don/; s/Fri/Fre/; s/Sat/Sam/; s/Sun/Son/;
print NEWFILE $_;
}
close (FILE);
close (NEWFILE);
system ('gnuplot /tmp/tempfile');
system ("mv /tmp/plot.png /intranet/htdocs/accounting/IP-Traffic-$year-$month-$day-$lday.png");
system ('rm /tmp/plottemp /tmp/tempfile');
system ('mv /intranet/htdocs/accounting/Accounting.tbl /intranet/htdocs/accounting/raw-data');
Well, this script is plain ugly! I am just happy it works, beautify it if you like (split would ease things up a lot) I just did not yet spend the time. I promise, it will be commented and streamlined in the HOWTO.
As a side effect, you may have noticed, it also creates a daily data graph using gnuplot. This is optional of course, but produces nifty results. All you need for that is gnuplot and an options file like this:
set term png small color
set xdata time
set timefmt "%H:%M:%S"
set output "/tmp/plot.png"
set title xxx changed by script xxx
set data style linespoints
set format x "%H:%M"
plot "/tmp/plottemp" using 1:2 title "total",\
"/tmp/plottemp" using 1:3 title "up",\
"/tmp/plottemp" using 1:4 title "down",\
"/tmp/plottemp" using 1:5 title "web",\
"/tmp/plottemp" using 1:6 title "mail"
I will continue after some reasonable amount of sleep ... stand by !