IP Traffic Accounting - mini HOWTO

DRAFT

This is the first draft of a work in progress. I am writing a document, describing how to set up a couple of scripts to monitor the amount of IP data, flowing to and from your network.

Copyright 2002, Goesta Smekal
latest changes: Aug. 25th 2002

Check what's new at the Revision History at the end of this document. Thank you for (proof)reading this. If you have anything to tell me about it, don't hesitate, tell me: Goesta.Smekal@chello.at.

Contents:

Introduction

Why did I do this ?

Good point ! ;-) I do not claim to be the first/best/most comprehensive/... author covering this topic. As Neo says in the movie 'The Matrix': 'I am just another guy.' In fact, there are people who wrote on this earlier. Just check the Related Pages. I also don't provide a 'turnkey solution' (a term from computing stoneage, when computers got delivered by trucks. As far as I know it was kind of an advertising term for a system that was ready to use at the turn of the (power) key), there are others.

What this paper will cover are the fundamential basics about traffic metering and some approach to store the data collected for later processing.

This article is intended for interested audience, who like to poke around in the system they use. If you are looking for a './configure && make && make install' approach, you are wrong here, so go now, find another solution.

Technical considerations (bad title)

Imagine the following situation: You are in control of a small (or maybe not that small) LAN connected to the Net via a leased line or flat rate ISP. Your online time doesn't matter, but the trandfered data does. They charge you for every byte above some limit.

To be able to control this situation, you need IP Traffic Accounting . In other words, some system counting every bit that moves up or down your leased line.

I was in such a situation, when we upgraded our Internet connection from ADSL to a 2MBit leased line at my employer recently. Being responsible for the whole IT system I wanted to know exactly how much data traverses the line and what services are used most.

I will now describe the techniques I used to achieve this goal, the software I cose and the scripts I wrote.

Prerequisites

First of all it is essential that you have full control of the one point all your data has to pass. Generally this is your Internet gateway. In this document I assume a setup like this:

Some ASCII art:
                       +---------------+
------LAN------>eth0 | Linux Gateway | eth1<------DMZ------
office PCs +---------------+ public access
eth2 servers
^
|
|
+--------------+
| ISP's router | -> leased line --> the Net
+--------------+

There is one ethernet segment homing the office PCs, I call it the 'LAN'. Another one forms a 'demilitarized zone' (DMZ) which means, these hosts are accessible from the Net via official IP addresses and hostnames, but protected by the packet filtering gateway. Finally that gateway is connected to the Net via the ISP's Router.

The Gateway has three network interfaces, but only one (eth2 in the example) must be passed by all the traffic.

If you lack full control over your gateway, your accounting will not be accurate, since you have to tweak the gatewas'y configuration. So get root now ;-)

Alternatively, you can just put another box in front of your 'black box'-gateway which will just pass all traffic on ...

Further on you need Perl, some DBMS (I use PostgreSQL for reasons explained in a later version of this doc) and a way to transfer data from the gateway to the DBMS-host (ncftp might serve well). Apache and some nifty HTML forms, as well as gnuplot for graphs are optional.

Basic Principles

To measure the data actually transfered we will use Linux' IPTables. You should use them for packet filtering and NAT on the gateway anyway. A special chain will be created to count the various types of data. I count the overall up and downstream, web and mail traffic. Other services are forbidden to the citicens of my LAN (draconic approach, I know).

The data gets collected at regular intervalls (by a script launched from cron) and written to a file. Every night another script extracts the relevant figures (Prerl was designed to do exactly such things), puts them together in a tabular ASCII file and transferes it to the host homing the database.

There the file is read in by another Perl script, which stuffs the traffic count into an SQL DB and (optionally) uses gnuplot to draw a nice daily graph of the bandwith utilization.

Finally Perl and CGI help creating interactiv web pages, showing statistics about arbitrary periods.

The Scripts

Collecting Traffic


For the time being, I assume you are familiar with IPTables. There is plenty of documentation at the project homepage . So I will concentrate on the essentials.

To collect the byte count of various IP traffic types I create a new chain called Accounting. This chain contains a line for every kind of packet I want to count, but no targets, so the packet just traverses the chain untouched.

Put the following lines somewhere at the beginning of your netfilter configuration file:

iptables -N Accounting
iptables -A Accounting -o eth2 # upstream traffic
iptables -A Accounting -i eth2 # downstream traffic
iptables -A Accounting -p tcp -m multiport --ports www # HTTP requests and response
iptables -A Accounting -p tcp -m multiport --ports pop3,smtp # e-Mail

Remember, in our example, eth2 is the interface connected to the router. Replace it with your 'external' interface name.

The first two '-A' lines just count any packet going out that interface and in from it. Sum them up and you have got the overall traffic, regardless of protocol or direction.

In the last two lines we get somewhat specific. The multiport keyword matches any ports mentioned afterwards, be it source or destination port. So we count requests and responses in the same line.

If you want an ISP like setup, counting traffic from hosts or subnets just add something like:

iptables -A Accounting -s 192.168.1.10

If 192.168.1.10 is the host you need to watch. With subnets it works alike. I just have to check at netfilter.org for subnet adress specification ... back soon.

To activate the chains it is necessary to insert the following lines at the top of the FORWARD chain:

iptables -A FORWARD -i eth2 -j Accounting
iptables -A FORWARD -o eth2 -j Accounting

If any line remains before those, packets matching it don't get counted. And if anybody knows how to put this into one line, let me know. It would ba a lot more beautiful.

Storing the Count

To get our job done we need to store the data counted. This is done in three stages. First at regular intervals IPTables counts are appended to a file:

#!/bin/bash
date >> /root/scripts/Accounting.dat
iptables -L -Z Accounting -vxn >> /root/scripts/Accounting.dat

This is run hourly by cron. Feel free to change the interval to your needs. I like the hourly count because I can watch the peaks during daytime.

The second line writes a timestamp to the file while the last one reads data from iptables, resetting chain counts at the same moment. -vxn means 'verbose' (bytecount), 'exact numbers' (not 125k but 125130) and 'no name resolution' (don't care about hostnames).
After this is done we can continue with more sophisticated processing:

Transfering the results

We now have created a file containing a number of data lines for every snapshot. This was easy to get, but is hard to process. I will show you a script that converts it into tabular form and puts the resulting file to the host running the SQL database.

This is done for security purposes, because I don't feel too well with a DBMS running on the firewall.

#! /usr/bin/perl
#
# Tabelizer
#
# v0.2 30.07.2002 (c) Goesta Smekal
#
# tabelize IPTABLES Output

system ('mv /root/scripts/Accounting.tbl /root/scripts/Accounting.tbl-bak');

open (OUTFILE,">/root/scripts/Accounting.tbl") || die "Output file not writeable: $!\n";
open (INFILE,"</root/scripts/Accounting.dat") || die "Input file not readable: $!\n";

print OUTFILE "Time\tDay\tDay\tMonth\tYear\tupstream\tdownstream\tweb\tmail\n";

while (<INFILE>) { # read from stdin

if (/(^\w{3})\s+(\w{3})\s+(\d{1,2})\s(\d{2}[:]\d{2}[:]\d{2})\sCEST\s(\d{4})/)
{
$Day=$1;
$Month=$2;
$Nday=$3;
$Time=$4;
$Year=$5;
print OUTFILE "\n$Time\t$Day\t$Nday\t$Month\t$Year\t";
next;
}

elsif (/\s+(\d+)\s+(\d+)\s/)
{
$Packets=$1;
$Bytes=$2;

print OUTFILE "$Bytes\t";
next;
}
else {next}

}

print OUTFILE "\n";
close (OUTFILE);
close (INFILE);

system ('mv /root/scripts/Accounting.dat /root/scripts/Accounting.dat-bak');
system ('ncftpput -E -u username -p **** db.host accounting /root/scripts/Accounting.tbl') || die "Cannot put file to destination host: $!\n";

I will explain detail later. Just at the last line:

Replace username with a valid username on the destination host and **** with the according password. db.host should contain the hostname the file is transfered to.

I run this script every day at 23:05 from cron too. That way my database gets updated daily and the worst loss after a system crash is one day's traffic count, which can usually be interpolated from existing data with little error.

Feeding the database

Let's assume you have got an SQL database containing a table for you accounting data. I use PostgreSQL (the reasons I chose this would be enough to start a flame on MySQL vs. PostgreSQL so I don't mention them ;-) ). My table definition reads:

          Table "accounting"
Attribute | Type | Modifier
--------------+--------------+----------
A_Time | time | not null
A_LDay | character(3) | not null
A_NDay | smallint | not null
A_Month | character(3) | not null
A_Year | smallint | not null
A_upstream | integer |
A_downstream | integer |
A_web | integer |
A_mail | integer |

To put my data in there I use the Perl module Pg which can be installed easily from CPAN. (details later, I just try to satisfy the impatient)

#!/usr/bin/perl

#
# Accounting-SQL
#
# stuffs IP Accounting Data into PostgreSQL DB
#

use Pg;

open (INFILE,"</intranet/htdocs/accounting/Accounting.tbl") || die "Input file not readable: $!\n";
open (OUTFILE,">/tmp/plottemp") || die "Cannot write to Outputfile: $!\n";

$pghost = 'localhost';
$pgport = 5432;
$dbname = 'syswatch';
$login = 'syswatch';
$pwd = '*****';

$conn = Pg::setdbLogin($pghost, $pgport, $pgoptions, $pgtty, $dbname, $login, $pwd);

( PGRES_CONNECTION_OK eq $conn->status )
and print "Pg::connectdb ........... ok\n"
or die "Pg::connectdb ........... not ok: ", $conn->errorMessage ,"\n";

while (<INFILE>)
{

if (/^Time/)
{ next } # first line, only for Humans !
elsif (/(\d{2}[:]\d{2}[:]\d{2})\t(\w+)\t(\d+)\t(\w+)\t(\d{4})\t(\d+)\t(\d+)\t(\d+)\t(\d+)\t(\d+)\t(\d+)\t(\d+)\t(\d+)/)
{
$sum=($6+$7)/1048576; # division converts to MB (optional)
$up=$6/1048576; $down=$7/1048576; $web=$8/1048576; $mail=$9/1048576;
$time=$1;
unless ($day)
{ # store first day found for filename
# so we don't get confused by morning
$day=$3; $lday=$2;
$month=$4; $year=$5;
}
print OUTFILE "$time\t$sum\t$up\t$down\t$web\t$mail\n";
$result = $conn->exec("INSERT INTO accounting VALUES ('$1', '$2', $3, '$4', $5, $6, $7, $8, $9)");
}
}

close (INFILE);
close (OUTFILE);

open (FILE,"</root/Scripts/gnuplot-options") || die "Gnuplot control file inacce
ssible: $!\n";
open (NEWFILE,">/tmp/tempfile") || die "Tempfile not accessible: $!\n";

while (<FILE>)
{
s/^set title.+/set title "IP Traffic in MB, am $lday, $day\.$month $year"/;
s/Tue/Die/; s/Wed/Mit/; s/Thu/Don/; s/Fri/Fre/; s/Sat/Sam/; s/Sun/Son/;
print NEWFILE $_;
}

close (FILE);
close (NEWFILE);

system ('gnuplot /tmp/tempfile');
system ("mv /tmp/plot.png /intranet/htdocs/accounting/IP-Traffic-$year-$month-$day-$lday.png");
system ('rm /tmp/plottemp /tmp/tempfile');
system ('mv /intranet/htdocs/accounting/Accounting.tbl /intranet/htdocs/accounting/raw-data');

Well, this script is plain ugly! I am just happy it works, beautify it if you like (split would ease things up a lot) I just did not yet spend the time. I promise, it will be commented and streamlined in the HOWTO.

As a side effect, you may have noticed, it also creates a daily data graph using gnuplot. This is optional of course, but produces nifty results. All you need for that is gnuplot and an options file like this:

set term png small color
set xdata time
set timefmt "%H:%M:%S"
set output "/tmp/plot.png"
set title xxx changed by script xxx
set data style linespoints
set format x "%H:%M"

plot "/tmp/plottemp" using 1:2 title "total",\
"/tmp/plottemp" using 1:3 title "up",\
"/tmp/plottemp" using 1:4 title "down",\
"/tmp/plottemp" using 1:5 title "web",\
"/tmp/plottemp" using 1:6 title "mail"

Some nice queries

To ease up processing I created some views within the DB showing daily and monthly sums. The second one is used to check our ISP's figures, the first one to see if strange things go on at weekends for instance.

I will continue after some reasonable amount of sleep ... stand by !

Examples

Related Pages

Listed here are some web pages, I found while doing some research on the topic. Some contain basic info, others offer you a complete solution. Choose:
LWN: Patch: IP Traffic Accounting with NetFilter + ULOG is an article on the topic
X/OS Experts in Open Systems BV some nice basics containing the flow of packets with 'ipfwadm'. Slightly outdated ;-)
ipac Linux 2.0 and 2.2 ip accounting package this is a complete solution, similar to what I describe here
iam - iptables accounting monster another complete approach
Accounting and Measurement of Internet Traffic background material
IP Accounting (for Linux-2.0) a chapter from the Net-HOWTO, covering ipfwadm too

Revision History

25th Aug 2002:
added 'Revision History' section
added 'Related Pages' section
wrote some more introductory text

20th Aug 2002:
added actual code

17th Aug 2002:
first publication