Hello,
This guide will show you how to install and configure FastNetMon to be used with MikroTik and also as a bonus how to integrate it with Slack and Grafana, the first one is used to get reports about DDoS and the second one to have a really great reporting tool that will allow you to check PPS and Throughput as a whole and per IP address.
Many thanks to Pavel Odintsov for the tool and helping out with the questions I had.
Background
For those of you that work in Data Center and ISP environments you're probably used to fearing about being subject to a DDoS attack or being used as a tool for one. Attacks are getting increasingly dangerous and complex. There are ways to mitigate such attacks, namely:
- Firewalling IPs
- Absorbing the attack by having enough bandwidth
- Redirecting traffic to scrubbing centers (like Incapsula or Psychz)
- etc
For this there's a neat solution that's called FastNetMon, it's got two version a free version "do-it-yourself" and recently a paid version, we're going to use the free version here, in short FastNetMon can:
- Detect whether there's an attack in place
- What's the direction of said attack (incoming or outgoing)
- What IPs are involved
- Metrics about the attack, such as PPS, Mbps, type, protocols, an extract of the attack flow data
- Can integrate with other tools to take different actions in case there's an attack, for instance can be used to signal devices to firewall an IP, or to broadcast a prefix via BGP that in turns black-hole traffics or triggers scrubbing centers, etc. There's even a plugin to interact with RouterOS via the API and make it program the routers to do something when it detects an attack
- It can log data into Influx DB, which in turns can be used with Grafana to create really nice traffic reports
- Can detect attacks either by using flow protocols (such as IPFIX, sFLOW, Netflow) and by directly inspecting mirrored traffic
- Can detect a lot of attack vectors, like different types of amplification attacks
This guide will first cover the installation of all the components, namely FastNetMon, InfluxDB and Grafana and then we'll cover the corresponding configuration. In the end we'll talk about the issues we've found with the MikroTik's Implementation of Netflow (since this is the method of detection we'll use).
Required Hardware
We currently have this service in production using a VPS (with ESXi as the hypervisor) with the following characteristics:
- 8 GB of RAM
- 500 GB HDD
- 2 vCPUS (xeon e5450)
- Ubuntu Server 16.04 LTS 64 bits
- A single vNIC 1 Gbps
As for the router we only use CCR1036, but this works on any MikroTik device, our routers are basically idling in terms of CPU and RAM, even in the one with the highest load, after enabling Netflow v9 we saw an increase at most of 5 - 7% cpu usage, take into account this, Netflow is a resource heavy protocol and the resources it uses depend a lot on the traffic load the device has.
Sending traffic towards the FastNetMon server (again, using Netflow v9) consumes around 1 Mbps of bandwidth with our traffic profile, the more traffic/flows the router has to process the bigger this number will be, so also take this into account when considering where to place the server and what resources you should reserve.
Note: it's important that you guarantee the needed bandwidth towards the FastNetMon machine, Netflow is a stateless protocol, meaning, if traffic is lost in transit it won't recover it.
1.- Installing the components and configuring the basics
1.1.- FastNetMon
This is actually quite easy, I've done this a lot of times and never had a single issue, the full guide is here: https://github.com/pavel-odintsov/fastn ... INSTALL.md
In short all you gotta do is:
- download the automatic installer using wget: Code: Select all
wget https://raw.githubusercontent.com/pavel-odintsov/fastnetmon/master/src/fastnetmon_install.pl -Ofastnetmon_install.pl
- execute it Code: Select all
sudo perl fastnetmon_install.pl
Once it's done there are two main files we'll be concerned with:
- Networks List, located at /etc/networks_list (create it if it doesn't already exist), we need to list here in CIDR format all of our networks, one per line, if our networks were 1.1.1.0/24 and 1.2.1.0/24 then the file would end like:
Code: Select all1.1.1.0/24 1.2.1.0/24
- fastnetmon.conf located at /etc/fastnetmon.conf, this is the main configuration file, we'll talk more about this later
I followed this guide: https://diyprojects.io/influxdb-tutoria ... Z9vxlHyi03 which is quite easy, all you have to do is:
curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -
source /etc/lsb-release
echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
sudo apt-get update
sudo apt-get install influxdb
[[graphite]]
# Determines whether the graphite endpoint is enabled.
enabled = true
database = "graphite"
retention-policy = ""
bind-address = "127.0.0.1:2003"
protocol = "tcp"
consistency-level = "one"
# These next lines control how batching works. You should have this enabled
# otherwise you could get dropped metrics or poor performance. Batching
# will buffer points in memory if you have many coming in.
# Flush if this many points get buffered
batch-size = 5000
# number of batches that may be pending in memory
batch-pending = 10
# Flush at least this often even if we haven't hit buffer limit
batch-timeout = "1s"
# UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
# udp-read-buffer = 0
### This string joins multiple matching 'measurement' values providing more control over the final measurement name.
separator = "."
### Default tags that will be added to all metrics. These can be overridden at the template level
### or by tags extracted from metric
# tags = ["region=us-east", "zone=1c"]
### Each template line requires a template pattern. It can have an optional
### filter before the template and separated by spaces. It can also have optional extra
### tags following the template. Multiple tags should be separated by commas and no spaces
### similar to the line protocol format. There can be only one default template.
# templates = [
# "*.app env.service.resource.measurement",
# # Default template
# "server.*",
# ]
templates = [
"fastnetmon.hosts.* app.measurement.cidr.direction.function.resource",
"fastnetmon.networks.* app.measurement.cidr.direction.resource",
"fastnetmon.total.* app.measurement.direction.resource"
]
sudo service influxdb start
1.3.- Grafana
The guide I followed is this one: http://docs.grafana.org/installation/debian/ I used the APT Repository.
First open (with vim, vi or nano) the file /etc/apt/sources.list and then paste the following at the end:
deb https://packagecloud.io/grafana/stable/debian/ jessie main
curl https://packagecloud.io/gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install grafana
sudo service grafana-server start
sudo update-rc.d grafana-server defaults
2.- Configuring the Router
You need to first determine what routers should be sending the traffic reports to FastNetMon, depending on your network topology must be the ones that aggregate the most traffic (have the most visibility of the network), in our case these were our edge routers, then you need to define what are the interfaces to be inspected.
To us the interfaces were sfp1 and sfp2, so we just did the following:
#activating netflow and setting up the interfaces
/ip traffic-flow
set active-flow-timeout=1m cache-entries=8M enabled=yes interfaces="sfp1,sfp2"
#adding the flow-target (FastNetMon)
/ip traffic-flow target
add dst-address=FastNetMon-IP port=1234
3.- Configuring services
3.1.- FastNetMon
We need to check the fastnetmon.conf file, the values we are concerned about are:
process_incoming_traffic = on
process_outgoing_traffic = on
threshold_pps = 25000
threshold_mbps = 500
threshold_flows = 3500
netflow = on
average_calculation_time = 60
netflow_port = 1234
netflow_host = 0.0.0.0
graphite = on
graphite_host = 127.0.0.1
graphite_port = 2003
graphite_prefix = fastnetmon
monitor_local_ip_addresses = on
notify_script_path = /usr/local/bin/notify_about_attack.sh
enable_connection_tracking = on
Also the threshold values depend a lot on your traffic profiles, you need to tweak those, to us the values stated here work.
There's a lot of configuration parameters here, but some of them don't apply to our setup because they're for when FastNetMon inspect traffic mirrored, read the conf file carefully, there's also a section related to exaBGP and goBGP which I won't touch just yet for reasons I'll explain later, but basically those sections would allow you to interface FastNetMon with BGP to automate blackholing prefixes or redirecting traffic to a scrubbing center.
The notify script is used to take custom actions when an attack is detected, when the details of the attack are gathered and when the attack has stopped, this is a simple bash script but due to how it works the possibilities are huge. This example here is only to notify our slack channel about the attack when it start/stops and the details.
#!/usr/bin/env bash
#
# Hello, lovely FastNetMon customer! I'm really happy to see you here!
# Pavel Odintsov, author
#
#
# Instructions:
#
# Copy this script to /usr/local/bin/
# Edit /etc/fastnetmon.conf and set:
# notify_script_path = /usr/local/bin/notify_with_slack.sh
#
# Add your email address to email_notify.
#
# Add your Slack incoming webhook to slack_url.
# slack_url="https://hooks.slack.com/services/TXXXXXXXX/BXXXXXXXXX/LXXXXXXXXX"
#
# Notes:
# hostname lookup requires the dig command.
# Debian: apt-get install dnsutils
# Redhat: yum install bind-utils
#
# For ban and attack_details actions we will receive attack details to stdin
# if option notify_script_pass_details enabled in FastNetMon's configuration file
#
# If you do not need this details, please set option notify_script_pass_details to "no".
#
# Please do not remove the following command if you have notify_script_pass_details enabled, because
# FastNetMon will crash in this case (it expect read of data from script side).
#
if [ "$4" = "ban" ] || [ "$4" = "attack_details" ]; then
fastnetmon_output=$(</dev/stdin)
fi
# This script will get following params:
# $1 client_ip_as_string
# $2 data_direction
# $3 pps_as_string
# $4 action (ban or unban)
# Target hostname
hostname=`dig -x ${1} +short`
# Email:
email_notify="root,please_fix_this_email@domain.ru"
# Slack:
slack_url="YOUR SLACK URL GOES HERE"
slack_title="FastNetMon Alert!"
slack_text="IP: ${1}\nHostname: ${hostname}\nAttack: ${2}\nPPS: ${3}\nAction: ${4}\n\n${fastnetmon_output}"
slack_action=${4}
function slackalert () {
if [ ! -z $slack_url ] && [ "$slack_action" = "ban" ]; then
local slack_color="danger"
elif [ ! -z $slack_url ] && [ "$slack_action" = "attack_details" ]; then
local slack_color="warning"
elif [ ! -z $slack_url ] && [ "$slack_action" = "unban" ]; then
local slack_color="good"
else
return 0
fi
local slack_payload="{\"attachments\": [ { \"title\": \"${slack_title}\", \"text\": \"${slack_text}\", \"color\": \"${slack_color}\" } ] }"
curl --connect-timeout 30 --max-time 60 -s -S -X POST -H 'Content-type: application/json' --data "${slack_payload}" "${slack_url}"
}
if [ "$4" = "unban" ]; then
# Slack Alert:
slackalert
# Unban actions if used
exit 0
fi
if [ "$4" = "ban" ]; then
# Email Alert:
echo "${fastnetmon_output}" | mail -s "FastNetMon Alert: IP $1 blocked because of $2 attack with power $3 pps" $email_notify;
# Slack Alert:
slackalert
# You can add ban code here!
# iptables -A INPUT -s $1 -j DROP
# iptables -A INPUT -d $1 -j DROP
exit 0
fi
if [ "$4" = "attack_details" ]; then
# Email Alert:
echo "${fastnetmon_output}" | mail -s "FastNetMon Analysis: IP $1 blocked because of $2 attack with power $3 pps" $email_notify;
# Slack Alert:
slackalert
exit 0
fi
Here's an example of the reports you get:
FastNetMon Alert!
IP: 192.168.1.230
Hostname:
Attack: outgoing
PPS: 25363
Action: attack_details
IP: 192.168.1.230
Attack type: udp_flood
Initial attack power: 25363 packets per second
Peak attack power: 26611 packets per second
Attack direction: outgoing
Attack protocol: udp
Total incoming traffic: 5 mbps
Total outgoing traffic: 15 mbps
Total incoming pps: 13681 packets per second
Total outgoing pps: 25363 packets per second
Total incoming flows: 1 flows per second
Total outgoing flows: 0 flows per second
Average incoming traffic: 5 mbps
Average outgoing traffic: 15 mbps
Average incoming pps: 13681 packets per second
Average outgoing pps: 25363 packets per second
Average incoming flows: 1 flows per second
Average outgoing flows: 0 flows per second
Incoming ip fragmented traffic: 0 mbps
Outgoing ip fragmented traffic: 0 mbps
Incoming ip fragmented pps: 0 packets per second
Outgoing ip fragmented pps: 0 packets per second
Incoming tcp traffic: 0 mbps
Outgoing tcp traffic: 0 mbps
Incoming tcp pps: 60 packets per second
Outgoing tcp pps: 84 packets per second
Incoming syn tcp traffic: 0 mbps
Outgoing syn tcp traffic: 0 mbps
Incoming syn tcp pps: 60 packets per second
Outgoing syn tcp pps: 84 packets per second
Incoming udp traffic: 5 mbps
Outgoing udp traffic: 15 mbps
Incoming udp pps: 13586 packets per second
Outgoing udp pps: 25241 packets per second
Incoming icmp traffic: 0 mbps
Outgoing icmp traffic: 0 mbps
Incoming icmp pps: 0 packets per second
Outgoing icmp pps: 0 packets per second
Average packet size for incoming traffic: 50.2 bytes
Average packet size for outgoing traffic: 82.0 bytes
2017-08-24 15:53:16.000000 192.168.1.230:62790 > 178.32.213.22:80 protocol: tcp flags: syn frag: 0 packets: 6 size: 1026 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:16.000000 159.192.222.24:62455 > 192.168.1.230:34030 protocol: udp frag: 0 packets: 4055 size: 188665 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:16.000000 171.4.51.23:50213 > 192.168.1.230:34522 protocol: udp frag: 0 packets: 69 size: 3077 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:16.000000 192.168.1.230:34522 > 171.4.51.23:50213 protocol: udp frag: 0 packets: 69 size: 4722 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:16.000000 192.168.1.230:34522 > 49.228.235.65:61322 protocol: udp frag: 0 packets: 134 size: 9191 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:16.000000 192.168.1.230:62791 > 178.32.213.22:80 protocol: tcp flags: syn frag: 0 packets: 6 size: 534 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:16.000000 192.168.1.230:62792 > 178.32.213.22:80 protocol: tcp flags: syn frag: 0 packets: 6 size: 558 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:16.000000 192.168.1.230:34042 > 1.196.216.91:60593 protocol: udp frag: 0 packets: 2664 size: 206563 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:17.000000 178.32.213.22:80 > 192.168.1.230:62758 protocol: tcp flags: syn,ack frag: 0 packets: 4 size: 392 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:17.000000 178.32.213.22:80 > 192.168.1.230:62770 protocol: tcp flags: syn,ack frag: 0 packets: 4 size: 392 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:17.000000 178.32.213.22:80 > 192.168.1.230:62787 protocol: tcp flags: syn,ack frag: 0 packets: 4 size: 406 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:17.000000 192.168.1.230:62793 > 178.32.213.22:80 protocol: tcp flags: syn frag: 0 packets: 7 size: 1425 bytes ttl: 0 sample ratio: 1
2017-08-24 15:53:17.000000 101.109.58.2:27667 > 192.168.1.230:34046 protocol: udp frag: 0 packets: 2804 size: 147814 bytes ttl: 0 sample ratio: 1
sudo service fastnetmon start
/opt/fastnetmon/fastnetmon_client
3.2.- InfluxDB
There's nothing extra that needs to be done in here
3.3.- Grafana
Go ahead and log into your Grafana server (http://FastNetMonIP:3000 default user/password is admin), you can follow the tutorial which will take you in the processes of adding a new data source and dashboards.
First select "Add Data Source" and then, assuming you installed Grafana in the same machine FastNetMon and InfluxDB are fill in the following information:
Name: FastNetMon
Type: InfluxDB
Url: http://localhost:8086
Access: proxy
InfluxDB Details
Database: graphite
- Press over "New Dashboard"
- You'll get to a Dashboard creation page, in this page there's the phrase "New Dashboard" in the upper left corner, press it
- It'll deploy a new set of options, then press "Import Dashboard" at the right.
- Here, one by one, you either "upload .json file" or copy/paste the contents of the two .json files attached to this post (inside the FastNet Dashes.rar file)
- Either way you'll be asked for the data souce (just below "name"), select the data source you've just created
- Repeat the process for the remaining .json file
Here are the Grafana files in github so you can also download them from there:
Total transit traffic dashboard
Total transit traffic per host
You should now have both dashboards installed and fully operational
"Total Transit Traffic" shows you all the traffic as an aggregate, as in, all the traffic collected by all the routers
"Total Transit Traffic per host" will show you the traffic stats for a specific IP address collected by all the routers, there's a small "issue" here, you can write the "Host's IP address" using dots, gotta be underscores, as in: 192.168.1.1 becomes 192_168_1_1 this is because of how they're getting stored in the Influx DB.
Preview of Dashboard #1 (total transit traffic):
Preview of Dashboard #2 (total transit traffic per host):
Note: I didn't make the dashboards, the original ones can be found here https://grafana.com/dashboards/1605, however didn't work for me since the naming used for the data tables in InfluxDB didn't match, I updated the versions uploaded here so they work with the latest version of FastNetMon, if someone got any other idea to make a better dashboard please share
4.- Additional notes, reasons to not be using FastNetMon integration with BGP
First off, there seems to be an issue with Netflow in version of RouterOS prior to 6.38, this was related to the order of the counters and as such FastNetMon didn't get reliable information, the v6.38 changelogs shows:
*) traffic-flow - fixed flow sequence counter and length;
FastNetMon to us has been quite reliable once we started tweaking its configuration file, but there's still an issue with Netflow and MikroTik, somehow from time to time, it'll send information wrong thus FastNetMon will start screaming that all your IPs are being DDoSed and also being DDoSers and the attack power is overs tens of gigabits and millions of PPS. This seems to be something that has been reported to MikroTik but is not yet fixed (we keep using always the latest BFO images).
Here's a thread about the issue: https://github.com/pavel-odintsov/fastnetmon/issues/620
We have the capacity to redirect traffic to a scrubbing center and we know that this can be automated using FastNetMon, either by the notify script or the integration with exaBGP, however we are afraid of doing it right now because of this, so we use the "manual" approach, we get a notification in our Slack channel, and someone checks out whether it makes sense, if it does we activate our defenses, if it doesn't we just ignore it, this has happened to us around 3 times this month.
Here's an example:
IP: 192.168.3.3
Hostname:
Attack: outgoing
PPS: 9284913
Action: attack_details
IP: 192.168.3.3
Attack type: udp_flood
Initial attack power: 9284913 packets per second
Peak attack power: 9284913 packets per second
Attack direction: outgoing
Attack protocol: udp
Total incoming traffic: 2176 mbps
Total outgoing traffic: 6722 mbps
Total incoming pps: 5587609 packets per second
Total outgoing pps: 9284913 packets per second
Total incoming flows: 1152 flows per second
Total outgoing flows: 1113 flows per second
Average incoming traffic: 2176 mbps
Average outgoing traffic: 6722 mbps
Average incoming pps: 5587609 packets per second
Average outgoing pps: 9284913 packets per second
Average incoming flows: 1152 flows per second
Average outgoing flows: 1113 flows per second
Incoming ip fragmented traffic: 0 mbps
Outgoing ip fragmented traffic: 0 mbps
Incoming ip fragmented pps: 0 packets per second
Outgoing ip fragmented pps: 0 packets per second
Incoming tcp traffic: 99 mbps
Outgoing tcp traffic: 659 mbps
Incoming tcp pps: 72172 packets per second
Outgoing tcp pps: 105773 packets per second
Incoming syn tcp traffic: 99 mbps
Outgoing syn tcp traffic: 658 mbps
Incoming syn tcp pps: 72099 packets per second
Outgoing syn tcp pps: 105699 packets per second
Incoming udp traffic: 2076 mbps
Outgoing udp traffic: 6063 mbps
Incoming udp pps: 5514902 packets per second
Outgoing udp pps: 9178771 packets per second
Incoming icmp traffic: 0 mbps
Outgoing icmp traffic: 0 mbps
Incoming icmp pps: 431 packets per second
Outgoing icmp pps: 349 packets per second
Average packet size for incoming traffic: 51.1 bytes
Average packet size for outgoing traffic: 94.9 bytes
Those numbers make no sense in our environment, for starters we don't have (nearly) the amount of bandwidth reported in the attack, and physically it's imposible as well since that specific host had only 1 GigE NIC.
Hopefully this will work for everyone, and if it not, well we can always share knowledge over here. I'm also really interested in fixing the issue I just mentioned!