0 Replies Latest reply: Mar 31, 2016 6:39 AM by jchessman RSS

Using NetProfiler to Monitor DHCP Address Distribution

jchessman

NetProfiler can provide many tools beyond just network monitoring. One useful tool is the ability to see how many network addresses are currently in use on specific subnets. With some straightforward SteelScript and Python work it is simple to put together a script that will allow you to leverage NetProfiler's network-wide visibility to see how many addresses are in use during specific time periods.

 

Here is a basic script (discussion to follow):

 

#!/bin/python
"""
IP Management script
Read a list of subnets from a config file and query NetProfiler to see if there is
traffic on different IP's. Alert when the number of used IP's drops below a fixed value

Configuration is read from a file config.xml (controlled via a variable below currently).
The format for this XML file is:
    <config>
        <netprofiler></netprofiler>
        <username></username>
        <password></password>
        <threshold></threshold>
        <duration></duration>
        <resolution></resolution>
        <subnet></subnet>
    </config>

    netprofiler is the IP address or DNS name of the NetProfiler to query
    username is the username to use when accessing the device
    password is the base64 hash of the password (base64.b64encode("<pwd>")
    threshold is the percent above which a subnet should be considered full (and an alert generated)
    duration is how far back to look in minutes (i.e. if you are running this script every hour duration would be 60)
    resolution is the data resolution to try and use (default is auto)
    subnet are one or more entries with subnets to monitor:
        these can be either CIDR blocks (192.168.1.0/24) or ranges (192.168.1.0-192.168.1.55)
        you can have as many subnet blocks as you need
     
Currently there is no external alerting (the script generates console output) and no validation that IP's returned
by NetProfiler are actually used but you could add both of those relatively easily.

"""
# Imports
import base64
import xmltodict
import xml.etree.ElementTree as ET
import time
import netaddr
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
from steelscript.netprofiler.core import NetProfiler
from steelscript.common.service import UserAuth
from steelscript.netprofiler.core.filters import TimeFilter
from steelscript.netprofiler.core.filters import TrafficFilter
from steelscript.netprofiler.core.report import TrafficSummaryReport

def calcTotalIPs(subnet):
    # Calculate how many IP's are in the subnet
    subPart=int(subnet.split('/')[1])
    return totIPS[subPart]

# Variables
config_file='config.xml'
headers = {'Accept': 'application/json', 'Content-Type': 'application/json; charset=UTF-8' }
timeout = 5
# Create a dictionary with total possible number of subnets. These assuming you are starting at .0 (i.e. 192.168.1.0) through .255
totIPS = {1:2147483648, 2:1073741824, 3:536870912, 4:268435456, 5:134217728, 6:67108864, 7:33554432, 8:16777216, 9:8388608, 10:4194304, 11:2097152, 12:1048576, 13:524288, 14:262144, 15:131072, 16:65536, 17:32768, 18:16384, 19:8192, 20:4096, 21:2048, 22:1024, 23:512, 24:256, 25:128, 26:64, 27:32, 28:16, 29:8, 30:4, 31:2, 32:1}

# Sshhh
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

# First read the config
print "Reading config file"
try:
    config = ET.parse(config_file)
except:
    print "No config file " + config_file + " found."
    exit(1)

# Parse the returned XML file
settings = config.getroot()

# General settings, passwords, etc.
password = base64.b64decode(settings.find('password').text) #password = base64.b64encode("")
username = settings.find('username').text
netProfiler = settings.find('netprofiler').text
threshold = int(settings.find('threshold').text)
duration = int(settings.find('duration').text)
resolution = str(settings.find('resolution').text)
auth = UserAuth(username, password)

# Get the list of subnets and put them in a list.
subnets = list()
for elem in settings.findall('subnet'):
   subnets.append(elem.text)

# Run a query for each subnet

# For each subnet run the query
endTime = int(time.time())
startTime = endTime - duration

# Create a NetProfiler object
print "Connecting to NetProfiler " + str(netProfiler)
p = NetProfiler(netProfiler, auth=auth)
columns = [p.columns.key.host_ip,
           p.columns.value.avg_bytes,
           p.columns.value.total_conns_active,
           p.columns.value.network_rtt]
sort_column = p.columns.value.avg_bytes
timeRange = "last " + str(duration) + "m"
timefilter = TimeFilter.parse_range(timeRange)

# For each subnet
for x in subnets:
    totalIPs = 0
    flip = 0
    if x.find('-') == -1:
        # Subnet not a range
        print "Processing subnet " + str(x)
        trafficExpression = 'srv host ' + x
        totalIPs = int(calcTotalIPs(str(x)))
    else:
        # Range not subnet
        print "Processing range " + str(x)
        ips = x.split('-')
        cidrs = netaddr.iprange_to_cidrs(ips[0], ips[1])
        for y in cidrs:
            print "Processing subnet " + str(y)
            totalIPs = totalIPs + int(calcTotalIPs(str(y)))
            if flip:
                trafficExpression += ' or host ' + str(y)
            else:
                trafficExpression = 'host ' + str(y)
                flip = 1

    trafficFilter = TrafficFilter(trafficExpression)
    # initialize a new report, and run it
    print "Querying NetProfiler..."
    report = TrafficSummaryReport(p)
    report.run('hos', columns, timefilter=timefilter, trafficexpr=trafficFilter, resolution=resolution, sort_col=sort_column)

    # grab the data, and legend (it should be what we passed in for most cases)
    data = report.get_data()

    # once we have what we need, delete the report from the NetProfiler
    report.delete()

    # We don't do any validation that the hosts are "real". You could run through the entire returned list and look at the
    # number of active connections or average bytes (both included columns to see if they indicate a real host or not. Here we just assume they all are and count
    # the length of the list

    usedIPs = int(len(data))

    if int(totalIPs) > 0:
        percentUsed = float(float(usedIPs)/float(totalIPs) * 100)
    else:
        percentUsed = 0
     
    print "Using " + str(usedIPs) + " (" + str(int(percentUsed)) + "%) out of " + str(totalIPs) + " IP's in " + str(x)

    # Generate your alerts here
    if percentUsed > int(threshold):
            # We are using more IP's then we want - generate an alert
            print str(x) + " is using more then " + str(threshold) + "% of it's IP space"
    elif totalIPs == 0:
        print "No IP's available in this subnet or range, check your settings"
     




 

So what are we doing here? A bunch of things happen and I will go through the code in chunks to try and describe them.

 

First we have to import a bunch of libraries that we use to perform various operations:

# Imports
import base64
import xmltodict
import xml.etree.ElementTree as ET
import time
import netaddr
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
from steelscript.netprofiler.core import NetProfiler
from steelscript.common.service import UserAuth
from steelscript.netprofiler.core.filters import TimeFilter
from steelscript.netprofiler.core.filters import TrafficFilter
from steelscript.netprofiler.core.report import TrafficSummaryReport



 

These are different libraries to allow us to use tools that do not necessarily exist within the base Python code. Some allow us to manipulate time (import time), some allow us to manipulate network addresses (import netaddr) and some allow us to use the SteelScript libraries to easily query the NetProfiler.

 

Next we have a small function that is used within the program to calculate how many IP's are in the passed subnet.

def calcTotalIPs(subnet):
    # Calculate how many IP's are in the subnet
    subPart=int(subnet.split('/')[1])
    return totIPS[subPart]


 

This function is used further down in the script to figure out how many IP's are available in a given subnet - information that is then used to calculate the percentage of the subnet that is in use.

 

Next we define some variables that are used throughout the script:

# Variables
config_file='config.xml'
headers = {'Accept': 'application/json', 'Content-Type': 'application/json; charset=UTF-8' }
timeout = 5
# Create a dictionary with total possible number of subnets. These assuming you are starting at .0 (i.e. 192.168.1.0) through .255
totIPS = {1:2147483648, 2:1073741824, 3:536870912, 4:268435456, 5:134217728, 6:67108864, 7:33554432, 8:16777216, 9:8388608, 10:4194304, 11:2097152, 12:1048576, 13:524288, 14:262144, 15:131072, 16:65536, 17:32768, 18:16384, 19:8192, 20:4096, 21:2048, 22:1024, 23:512, 24:256, 25:128, 26:64, 27:32, 28:16, 29:8, 30:4, 31:2, 32:1}

# Sshhh
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)


 

The first three commands simply define some variable that are used elsewhere. Some of these (specifically config_file and timeout) may make sense to make command line options at some point though for the complexity of this script leaving them here makes almost as much sense. The fourth command defines a Python dictionary with a key that represents the subnet value (/3, /18, /26) and a value that represents the number of IP's in that subnet.The fifth command tells Python not to generate a warning message if the certificate of the server is not secure.

 

Next we read in and parse the configuration file:

# First read the config
print "Reading config file"
try:
    config = ET.parse(config_file)
except:
    print "No config file " + config_file + " found."
    exit(1)

# Parse the returned XML file
settings = config.getroot()

# General settings, passwords, etc.
password = base64.b64decode(settings.find('password').text) #password = base64.b64encode("")
username = settings.find('username').text
netProfiler = settings.find('netprofiler').text
threshold = int(settings.find('threshold').text)
duration = int(settings.find('duration').text)
resolution = str(settings.find('resolution').text)
auth = UserAuth(username, password)

# Get the list of subnets and put them in a list.
subnets = list()
for elem in settings.findall('subnet'):
   subnets.append(elem.text)




 

Three things are done here:

  1. We try (and hopefully succeed) in reading the config file specified in config_file and put the data in the variable config
  2. We extract the specific variables (username, password, etc) as well as build the authentication variable we will need when connecting to the NetProfiler
  3. We recurse through the subnet blocks and create a list (subnets) of each range or CIDR block

 

Now that we are done with the preliminaries we are ready to actually start talking to the NetProfiler.

 

First some housekeeping - namely figuring out the time (in UNIX epoch time) for the query. We do this by creating a variable with the current time as the end time and subtract the correct value (specified number of minutes from the config file multiplied by 60):

endTime = int(time.time())
startTime = endTime - (duration * 60)

 

Next we need to build the NetProfiler object and open a connection:

print "Connecting to NetProfiler " + str(netProfiler)
p = NetProfiler(netProfiler, auth=auth)

 

:And then setup the information for the query:

columns = [p.columns.key.host_ip,
           p.columns.value.avg_bytes,
           p.columns.value.total_conns_active,
           p.columns.value.network_rtt]
sort_column = p.columns.value.avg_bytes
timeRange = "last " + str(duration) + "m"
timefilter = TimeFilter.parse_range(timeRange)

 

We setup four different variables:

  1. columns consists of a list of the columns we want data for. In this case we are using references to the actual column numbers though you could have also used numbers.
  2. sort_column is the column to sort the data on and just like the columns variable consists of a named reference
  3. timeRange is simply a string with the duration we care about
  4. timefilter is the timeRange after being run through the SteelScript code that converts a timeRange object into one that the NetProfiler can understand

Next we build the trafficFilter we are going to use to make sure we get data on only the subnets we want as well as create some other variables:

for x in subnets:
    totalIPs = 0
    flip = 0
    if x.find('-') == -1:
        # Subnet not a range
        print "Processing subnet " + str(x)
        trafficExpression = 'host ' + x
        totalIPs = int(calcTotalIPs(str(x)))
    else:
        # Range not subnet
        print "Processing range " + str(x)
        ips = x.split('-')
        cidrs = netaddr.iprange_to_cidrs(ips[0], ips[1])
        for y in cidrs:
            print "Processing subnet " + str(y)
            totalIPs = totalIPs + int(calcTotalIPs(str(y)))
            if flip:
                trafficExpression += ' or host ' + str(y)
            else:
                trafficExpression = 'host ' + str(y)
                flip = 1

    trafficFilter = TrafficFilter(trafficExpression)

 

We have to do this for each subnet so we use a for loop to loop through each subnet). We want to see if we have a range or a CIDR and since a range uses a hyphen to separate the two sides we just look for it in the variable. If there is no hyphen (the find command returns a -1) then we have a subnet. Here we build a simple traffic expression (since it is a single subnet) consisting of the keyword host and the CIDR block. We also use the calcTotalIPs function defined earlier to figure out how many IP's are in the subnet.

If we have a range it is a little more complex. First we split the range into two sides in an array. We then use the iprange_to_cidrs function in the netaddr library (imported earlier) to convert the range to a list of CIDR blocks. Finally we go through each CIDR block and create the traffic expression as well as add up all the possible hosts in all the CIDR blocks. The iprange_to_cidrs function returns the minimum number of CIDRs needed to match the range meaning that a range could consist of a single CIDR or multiple CIDRs depending on what makes the most sense to the library.

Finally we use the SteelScript function TrafficFilter to convert the trafficExpression variable we just created into something a NetProfiler can understand (similar to what we did above with the time range).

    print "Querying NetProfiler..."
   report = TrafficSummaryReport(p)
   report.run('hos', columns, timefilter=timefilter, trafficexpr=trafficFilter, resolution=resolution, sort_col=sort_column)

   # grab the data, and legend (it should be what we passed in for most cases)
   data = report.get_data()

    # once we have what we need, delete the report from the NetProfiler
   report.delete()

 

First we tell the script to build a traffic summary report from the NetProfiler object and then to run that report. The report.run command has a few options including:

  • The centricity of the report - is it run from the host perspective or the network interface perspective - in this case we use host
  • The columns we want (as defined earlier)
  • The time range we want (as defined earlier)
  • The traffic expression we want (as defined earlier)
  • The data resolution to use (as defined in the config file)
  • The sort order for the returned data (as defined earlier)

After telling the report to run we wait for it to complete and then get the data returned from the query.

Finally we clean up after ourselves by deleting the query.

 

usedIPs = int(len(data))

    if int(totalIPs) > 0:
        percentUsed = float(float(usedIPs)/float(totalIPs) * 100)
    else:
        percentUsed = 0
        
    print "Using " + str(usedIPs) + " (" + str(int(percentUsed)) + "%) out of " + str(totalIPs) + " IP's in " + str(x)
    
    # Generate your alerts here
    if percentUsed > int(threshold):
            # We are using more IP's then we want - generate an alert
            print str(x) + " is using more then " + str(threshold) + "% of it's IP space"
    elif totalIPs == 0:
        print "No IP's available in this subnet or range, check your settings"

 

The final section of this is to go through the returned data, print some basic statistical information, and figure out if the returned percentage of used IP's exceeds the threshold specified in the config file. If it does then we print a message to that effect.

 

Here is a sample config file:

<config>
    <netprofiler>eng-profiler.lab.nbttech.com</netprofiler>
    <username>admin</username>
    <password>YWRtaW4=</password>
    <threshold>20</threshold>
    <duration>60</duration>
    <resolution>auto</resolution>
    <subnet>10.38.14.1/24</subnet>
    <subnet>10.38.129.1-10.38.129.128</subnet>
</config>


 

Now let's look at the config file.

The config file consists of seven types of statements of interest within the <config> block. Most of these are self explanatory:

  • <netprofiler> is the DNS name or IP address of the NetProfiler to be queried.
  • <username> is the username to use when querying that NetProfiler
  • <password> is the bas64 hash of the password for the above username
    • import bas64
    • password = base64.b64encode("<insert password here>")
  • <threshold> is the percentage of available addresses within the subnet that are used above which messages or alerts should be generated
  • <duration> is how far back to look (i.e. how often are you running the script) in minutes
  • <resolution> is the data resolution to use
  • <subnet> are one or more blocks describing the DHCP subnets in use; these can be in either CIDR format (10.38.14.1/24) or an address range (10.38.129.1-10.38.129.100)

 

A few notes on this script:

  • There is no validation of the returned data. This means that while unlikely it is possible there are IP's being included in the result set that did not see sufficient traffic. The NetProfiler works very hard to keep that from happening but it is a possibility, especially depending on the sources of data (garbage in, garbage out)
  • There is no alerting in the script except via the console messages. Fortunately Python does provide extensive libraries for many types of alerting (e-mail, SNMP, syslog, etc.) and updating the script to reflect that should not be a problem.
  • Each CIDR for a subnet must be specified on it's own line. If you have a DHCP zone with multiple CIDR's the script will need updating to handle those properly.
  • As always this is provided as-is and I will try and update this if or when I find errors or make any enhancements.