Packet Analysis with SteelScript-NetShark

A view is the most important concept of NetShark appliances when it comes to reporting and analysis. Simply put, a view consists of a packet source, optional filters to limit which packets are analyzed, and a set of metrics to extract along with rules for how to organize those metrics. More details of views can be found here.

This document explains how to use the SteelScript-NetShark package to write a Python script to interact with views on NetShark Appliances.


Setup & Imports


As the first step, it is required that you set up a Python virtual environment with the SteelScript-Netshark package installed as detailed here. All the code snippets and the Python script need to be run in the Python virtual environment.


The snippets of code in this document are broken down here so you can follow along in your own Python shell. At the end of this post you can download a single file that has all of these snippets put together in one script. You should run the script in a shell session as below.

> python <netshark_hostname> <username> <password>

You can also copy & paste each code snippet into a Python session with the cursor at the beginning of the line, then run it by pressing "Enter". It is worth noting that the code snippets with '>>>' are meant to show interactive commands and responses to gain more insights about the data. You should just copy & paste what follows '>>>' in the same line and run it in a Python shell.


Below shows what modules need to be imported.


import pprint

from datetime import timedelta

from steelscript.netshark.core import NetShark
from steelscript.netshark.core.types import Value, Key
from steelscript.netshark.core.filters import NetSharkFilter, TimeFilter
from steelscript.common import UserAuth


Then we need to create a NetShark object with a hostname, username and password.


netshark = NetShark(<hostname>, auth=UserAuth(<username>, <password>))


Create View


There are multiple ways to specify a source of packets. In this tutorial we will use the first running capture job on the NetShark Appliance. You certainly can create your own capture jobs by logging into the NetShark.


jobs = netshark.get_capture_jobs()
source = None
for j in jobs:
    if j.get_status()['state'] == 'RUNNING':
        source = j


In addition to packet source, a list of NetShark native Key/Value columns are required to specify what information is extracted from the packets and presented in this view. The set of "Key" columns define how the "Value" column data are aggregated. As indicated below, a single Key column of "server_port" determines that a row will be generated for each unique server port seen.

columns = [


In order to find out what columns are available, just execute the command below in your shell prompt.

> steel netshark fields <netshark_hostname> -u <username> -p <password>

Last step before we can create a view, an optional list of filters can be used to limit which packets from source are processed by this view.


timefilter = TimeFilter.parse_range('last 30 seconds')
filters = [timefilter]


Details can be found here about all kinds of strings that can be parsed to create time filters.


Now we can create the view as below.


view = netshark.create_view(source, columns, filters)


It is worth noting that the above operation can take long to finish if the sheer volume of packets is too large. To reduce the waiting time, you can always trim down the traffic volume by setting tighter filters.


The time info can be obtained as:

ti = view.get_timeinfo()


Most times from shark are unix epoch in nanoseconds since Jan 1, 1970. Using below command to verify the data is fetched from a range of 30 seconds.


>>> (ti['end'] - ti['start']) / 1000000000



Basic Data Analysis


The retrieved data consists of some meta data and values as multiple tuples of (port, bytes), corresponding to the requested key and value columns.


data = view.get_data(aggregated=True)
packets = data[0]['p']
t = data[0]['t']
rows = data[0]['vals']


Printing out the statistics of data as below:


>>> print('Counted %d packets starting at %s, yielding %d rows' % (packets, str(t), len(rows)))
Counted 337473 packets starting at 2016-03-03 19:06:28.181044+00:00, yielding 54 rows


One tuple looks like below:


>>> rows[0]
[443, 1550743]


Sort the rows by the bytes column (column 1) in each row.


>>> rows.sort(lambda a, b: cmp(a[1], b[1]), reverse=True)
>>> pprint.pprint(rows[:10])
[[5432, 42661178],
[41017, 12386574],
[5901, 1706624],
[443, 1550743],
[22, 895698],
[2537, 516436],
[8080, 492873],
[80, 109116],
[25, 106327],
[3978, 105304]]


Advanced Data Analysis


Assume we want to find out the total bytes on port 80 during the last 30 seconds.


def find_port(data, port):
    total = 0
    for sample in data:
        for v in sample['vals']:
            if v[0] == port:
                print "%s: %s" % (str(sample['t']), v[1])
                total = total + v[1]
    print "Total: %d" % total


>>> find_port(data, 80)
2016-03-03 19:06:28.181044+00:00: 109116
Total: 109116


We can also aggregate the tuples by 10-second intervals.


data_10sec = view.get_data(aggregated=False, delta=timedelta(seconds=10))


>>> print "Data points: %d" % len(data_10sec)
Data points: 3


Then the bytes on port 80 for those 3 intervals can be seen as below.


>>> find_port(data_10sec, 80)
2016-03-03 19:06:28.181044+00:00: 35550
2016-03-03 19:06:38.181044+00:00: 51967
2016-03-03 19:06:48.181044+00:00: 21599





Hopefully now you feel confident to create views for NetShark appliances to analyze data through the SteelScript-NetShark package. For more examples, please visit SteelScript documentation at here. It is time to create your own script!