Here is a script that I found on the forums here and modified to reset what I’m calling a “blackhole” condition because I came across that term in my above research and it sounded cool. I have not confirmed if my condition is the same as described elsewhere under that name.
The email feature from source script is just commented out here and I also have not cleaned up this script since roughly modifying it.
Usage: add your interface names under first line. Don’t modify the =0 part. Schedule to run as desired (I have mine set to run 3 and 8 min after boot but I think I stagger it on a redundant and cross-connected switch so they are not fighting each other). I am also allowing it to run every 15 or 20 min (even though I don’t really think I need it) and it doesn’t seem to be having any unwanted effects.
The script will…
(1) first check to see if the interface SHOULD be working (showing “link-up”) and then will set the interface = 1 if so inside ifaceList.
(2) Then it goes though interfaces again looking for any with under 2 pps RX/TX and then marks it with 911 inside ifaceList (911 being the emergency telephone number in the US). My error condition results in one side of a link flapping between 0 and 1 pps and seems to be never more than that.
(3) it will reset those interfaces with a delay
(4) check to see if fixed, and if not goto 3 and repeat a couple times
:local “ifaceList” { “host1”=“0” ; “host2”=“0” ; “host3”=“0” ; “switch1”=“0” };
:local iname;
:local monitor;
:local logMsg "ifacecheck: [info] [Port PPS Check] (tx/rx): ";
:local speedRX;
:local speedTX;
:local targetSpeed 2;
:local cycleNumber 3;
:local downtime 3;
:local sleepBetween 5;
:local problems 0;
:local actions 0;
:local trying false;
#Define variables for sending email
#:local mailServerName “PUT_YOUR_MAIL_SERVER_NAME_HERE”;
#:local mailServerIp [:resolve $mailServerName];
#:local mailServerPort PUT_YOUR_MAIL_SERVER_PORT_HERE;
#:local mailFrom “PUT_MAIL_FROM_HERE”;
#:local mailTo “PUT_MAIL_TO_HERE”;
#:local mailSubject “WRITE_YOUR_MAIL_SUBJECT_HERE”;
#:local mailUser “PUT_YOUR_MAIL_USER_HERE”;
#:local mailPass “PUT_YOUR_MAIL_PASSWORD_HERE”;
#:local mailBody “PUT_MAIL_BODY_HERE”;
define sendMail function
#:global sendMail do={
/tool e-mail send server=$mailServerIp port=$mailServerPort from=$mailFrom to=$mailTo subject=$mailSubject body=$mailBody user=$mailUser password=$mailPass;
#}
\
INITIAL CHECK ALL INTERFACES
:foreach ethName,failCount in=$ifaceList do={
:local loopCounter 0;
verify iface is enabled/showing OK before running checks
:set $monitor [/interface ethernet monitor $ethName as-value once];
:if ($monitor->“status” = “link-ok”) do {
:set ($ifaceList->$ethName) “1”;
}
:set $monitor [/interface monitor-traffic $ethName as-value once];
:set $currentSpeed ($monitor->“rx-packets-per-second”);
:set $speedRX ($monitor->“rx-packets-per-second”);
:set $speedTX ($monitor->“tx-packets-per-second”);
CHECK THE INTERFACE
:if (($speedRX < $targetSpeed || $speedTX < $targetSpeed) && $ifaceList->$ethName = “1”) do {
:log warn “ifacecheck: <41> critical Port $ethName.supafab.com current pps (tx/rx): $speedTX/$speedRX, target pps > $targetSpeed”;
:set ($ifaceList->$ethName) “911”;
:set $problems ($problems + 1);
}
:set $logMsg ($logMsg . "$ethName.supafab.com: $speedTX/$speedRX; ");
#end of single interface check
}
:log info $logMsg;
REMEDIATION FOR FAILED (if any)
:while (($loopCounter < ($cycleNumber - 1)) && ($problems > 0)) do={
:set $loopCounter ($loopCounter + 1)
:if (!trying) do {
:log error "ifacecheck: Starting interface reset procedure >>> ";
}
\
DISABLE ANY BAD
:foreach ethName,failCount in=$ifaceList do={
#:log debug "ifacecheck: the array value for $ethName is ";
#:log debug ($ifaceList->$ethName);
:if (($ifaceList->$ethName) = 911) do {
:log warn “ifacecheck: [critical] disabling bad interface $ethName.supafab.com - try $loopCounter”;
/interface ethernet disable $ethName
:set $actions ($actions + 1);
}
}
end disable
WAIT
#:log debug “ifacecheck: [debug] pausing for $downtime before re-enable”;
:delay $downtime;
RE-ENABLE
:foreach ethName,failCount in=$ifaceList do={
:if (($ifaceList->$ethName) = 911) do {
:log warn “ifacecheck: warning - re-enabling interface $ethName.supafab.com - end of try $loopCounter”;
:set $ethName disabled=no;
/interface ethernet enable $ethName
}
}
end re-enable
:delay ($downtime * 3);
RECHECK
:local logMsg "ifacecheck: [info] [Port PPS Check] (tx/rx): ";
#run through ifaces again
:foreach ethName,ethStatus in=$ifaceList do={
#if this interface was reset then do stuff
:if ($ethStatus = 911) do {
:set $monitor [/interface monitor-traffic $ethName as-value once];
:set $currentSpeed ($monitor->“rx-packets-per-second”);
:set $speedRX ($monitor->“rx-packets-per-second”);
:set $speedTX ($monitor->“tx-packets-per-second”);
:log debug “ifacecheck: recheck $ethName.supafab.com RX: $speedRX pps TX: $speedTX pps”;
:set $logMsg ($logMsg . "$ethName.supafab.com: $speedTX/$speedRX; ");
DID WE FIX IT?
:if ($speedRX > $targetSpeed && $speedTX > $targetSpeed) do {
:log info “ifacecheck: Interface target pps $targetSpeed restored for $ethName.supafab.com”;
:set ($ifaceList->$ethName) “1”;
:set $problems ($problems - 1);
} else {
:log warn “ifacecheck: [critical] $ethName.supafab.com is still bad in check loop $loopCounter of $cycleNumber”;
}
end of did we fix it if else
}
end of if failcount > 0
}
end of foreach recheck loop
:set $trying true;
:if ($loopCounter = $cycleNumber) do { }
:log info $logMsg;
:log debug “ifacecheck: end of remediation round $loopCounter”;
:delay $sleepBetween;
}
end big remediation loop
\
EMAIL NOTIFICATION
:if ($actions > 0) do {
:log info “ifacecheck: [info] Trying to send email alert - residual problems: $problems; reset attempts: $actions”;
$sendMail;
} else {
:log debug “ifacecheck: debug - woo hooo we checked and there were no blackhole interfaces!”;
}