PNetFlow – NetFlow network statistics aggregator from PCAP in C++
A typical environment: To have a better understanding of the design decision made during the development of this tool we'll provide an insight of the environment we were intending it to work with. First we were considering capturing network traffic and saving it in a pcap format for the analyzer but as we realized, the pcap library can easily handle packets coming in from the standard input. We have revised our original approach to make the analyzer run in real-time. The University network generates a constant flow of traffic on the network and we have observed everything between 20Mbit/s traffic during breaks and as high as 700Mbit/s traffic during peak times. In order to be able to handle this high volume of traffic a careful design had to be implemented to enable us to analyze the traffic without “dropping” packets and to ensure our tool doesn't run out of memory in the middle of a capture.
Either though the pcap library is able to handle packets from the standard input we had to discover that if the analyzer is unable to finish the necessary calculations before the next packet arrives, that packet will be lost. To overcome this we have implemented separate POSIX threads.
In order to achieve the best performance in analyzing the network traffic we have implemented a multi-layered binary search tree structure to hold the information about network flows. The goal was for the data to generate statistics in a Cisco Netflow-like fashion. A Cisco Netflow is defined by a seven-tuple key and we were trying to get as close to that as possible. The seven keys are:
Source IP Address
Destination IP Address
Source port (for TCP and UDP)
Destination port (for TCP and UDP)
IP Protocol
Type of Service
Original netflow information can be found at: [1]
We have implemented six of these, the "Ingress" interface has been left out. The way these keys are used is that each key represent a layer of trees in structure. The original binary search tree has been modified to hold a third three (beside left and right) that represent a sub-level where a different key is used. Once the bottom layer of the trees has been hit, the function that calculates the statistics is being called that either initializes the flow records or updates the currently found one (storeInTree). All other insertions to the other layer are being done by a simple function that just inserts the key into the tree without updating the storage structure (storeInTreeSimple).
The order of the keys doesn't matter in respect of the end result, but different type of statistics can be also computed with this structure beside our netflow-like statistics. For example if we were to be interested in all the traffic that happened between two IP addresses we would simply call the function that is responsible for the initialization and update of the flow statistics on the destination IP layer.
We have experimented with Red-black tree's originally but as we observed the code performs queries on the trees more often than insertions hence we decided to use Splay-trees instead. The original Splay-tree has been modified such that it also able to tell if a node has just been inserted or has been in the tree. This function had to be implemented in order to determine if the storage structure needs to be initialized or to be updated after the key had been inserted into the tree.
The feature extractor is also designed to easily decide where the results should be saved. For our project we used MySQL as our data repository for finished flows but any other database engine could be used to save the results to or to do any extra work on the flow data. To extract more information from the flows that we have not covered can be done as well by extending the structure that holds the information and adding the calculations to the storeInTree function.
The software has been tested on Ubuntu 9.04 and up. The following packages are required to compile:
libpcap-dev libmysqlclient-dev libmysql++-dev
Source code can be accessed at:
svn co https://pnetflow.svn.sourceforge.net/svnroot/pnetflow pnetflow
The project should compile simply by calling make
PNetFlow can process packets coming from the standard input (-), captures on a local device (live eth0) or from a previously captured pcap file.
Calculated statistics by PNetFlow are the following for each flow
Flow time-span
Anonymized / normal source ip
Anonymized / normal desitnation ip
Source port
Destination port
ToS
Packet count
Inter-packet time average
Standard deviation of inter-packet time
Average packet size
Standard deviation of packet size
# of syn packets (for TCP)
# of ack packets (for TCP)
# of fin packets (for TCP)
# of rst packets (for TCP)
Inter-ACK time average (for TCP)
Standard deviation of Inter-ACK time (for TCP)