Monitoring SRX Chassis Cluster

Just finishing off a few things at work this week. We’ve got a few sites around the place where we have HA internet powered by two Juniper SRX100’s. The Two SRX100’s operate in a Chassis Cluster and peer with our ISP using BGP across both active/passive devices.

Below is a little Nagios check script that I wrote to hook into our in-house Nagios monitoring platform. It makes sure the chassis cluster has not failed over operating in a degraded state, and makes sure that there are two BGP peers connected.

NOTE: I was aiming for simplicity in this setup, if you’ve got a bigger environment or require instant notifications you might wish to set up snmp traps to get instant notifications.


# Bash script to check the status of a SRX cluster.
#  Works by SSHing into cluster to check "show chassis cluster status" command and SNMP walking to make sure BGP peers
#  are both in a connected state

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3

clusterAddress=$1
privateKey=$2
clusterStatus=`ssh nagios@$clusterAddress -i $privateKey "show chassis cluster status"`

declare -i primaryCount
declare -i secondaryCount
declare -i failoverCount
declare -i activeBgpPeers

activeBgpPeers=`snmpwalk -Os -c public -v 1 $clusterAddress .1.3.6.1.2.1.15.3.1.2 | grep "INTEGER: 6" | wc -l`
primaryCount=`echo "$clusterStatus" | grep primary | wc -l`
secondaryCount=`echo "$clusterStatus" | grep secondary | wc -l`
failoverCount=`echo "$clusterStatus" | grep "Failover count: 0" | wc -l`

if [ $primaryCount -ne 2 ]
then
  echo "No two primary redundancy groups"
  echo "$clusterStatus"
  exit $STATE_CRITICAL
fi

if [ $secondaryCount -ne 2 ]
then
  echo "No two secondary redundancy groups"
  echo "$clusterStatus"
  exit $STATE_CRITICAL
fi

if [ $failoverCount -ne 2 ]
then
  echo "SRX has fallen over on a redundancy group"
  echo "$clusterStatus"
  exit $STATE_WARNING
fi

if [ $activeBgpPeers -ne 2 ]
then
  echo "NOT 2 Active BGP Peers"
  exit $STATE_CRITICAL
fi

echo "OK, 2 peers.  OK: Chassis Cluster status OK"
echo "$clusterStatus"
exit $STATE_OK