How to print flow information with nfdump to csv

Given you have created flows with nfpcapd, you can now use nfdump (more info here https://manpages.ubuntu.com/manpages/jammy/en/man1/nfdump.1.html) to write flows in a custom format to a csv-file.

nfdump -R  out/ -o  "fmt: %sa, %da, %ts, %tsr, %td, %pr, %sp, %dp, %pkt, %byt"  > out.csv

fmt indicates how to format the output. %sa means source address, %da destination address, etc. A full list of these features is given here 

 Nfdump converts bytes to megabytes in some cases. Use the below python snippet to convert the column back to bytes. (To read in the csv with pandas I manually adjusted the column names of the files, i.e. I removed all white spaces. plus I removed the bottom lines of the file where nfdump writes its summary.)

 

https://gist.github.com/sallos-cyber/a410c8986eec29b14e2c9d039cab5d56#file-preprocess_flows-py

import pandas as pd
#you can use the output to plot a timeline
def callPreprocessData(fn='/home/someusr/Downloads/flows.csv'):
    data=pd.read_csv(fn)
    print(data.head(3))
    data['Bytes']=data['Bytes'].fillna(0)
    data['Bytes']=data['Bytes'].astype(str)
    #silly nfdump writes bytes as integer but sometimes it converts it to
    #mb. The following finds those entries and converts them into bytes.
    data.loc[data['Bytes'].str.contains('M'),'Bytes'] =   data[data['Bytes'].str.contains('M')]['Bytes'].apply(lambda x: float(x[1:-2])*1024*1024)
    data['Bytes']=data['Bytes'].astype(int)
    data.dropna(subset=['Datefirstseen'],inplace=True)
    data['Datefirstseen']=pd.to_datetime(data['Datefirstseen'])
    data=data.set_index('Datefirstseen')

    data['Duration']=data['Duration'].astype(int)
    data['DstPt']=data['DstPt'].astype(int)
    data['Datefirstseenunix']=data['Datefirstseenunix'].astype(int)

    #remove all white spaces from all columns:
    data['SrcIPAddr']=data['SrcIPAddr'].astype(str).str.strip()
    data['DstIPAddr']=data['DstIPAddr'].astype(str).str.strip()
    data['Proto']=data['Proto'].astype(str).str.strip()

    print('I am now saving the file')
    print(data.head(3))

    data.to_csv('flows_processed.csv')                                                                                                                                                                                                                                                             

Was this helpful?

3 / 1

Cookie Consent with Real Cookie Banner