I have a RB5009 since some weeks
Eth1 is connected to the ISP ONT (2.5 Gbps, limited at 1 Gbps), Vlan, PPPoE, etc. it works just fine
Then, to have a backup, there is a k6160 USB LTE stick in the USB port.
The distance metric of the LTE is greater than the metric of the fiber, so, if the fiber is OK, everything passes on the fiber.
In case of fault the traffic goes through the LTE.
I know that this “switching” could be improved.
As soon as the USB LTE stick is inserted and the APN configured, I got my public IP on che LTE interface
after 14400 seconds (I can see the logs) without traffic on the LTE (I mean ZERO bites trnsferred) the LTE interface looses the IP
This ara exactly 4 hours. The mobile operator reset the connections every 4 hours in case of no data, it seems.
using this stick on Windows or in other 2 routers (ADB and Sercom), after 4 hours, once the connection is reset,
all of them re-establish it, but not the MikroTik RB5009.
MikroTik support says that the RB5009 is OK, looking at the logs.
Now, I have in mind 3 solutions
A) a script that periodically checks if the LTE has an IP and, if not, disable and re-enable the LTE
B) a script that every 3 hours disable and re-enable the LTE regardless the presence of the IP on the LTE
C) Since the connections is reset when there is no traffic in the last 4 hours… some way to send something through the LTE
Cannot say if to keep the link some pinging is enough, but you could set a script that just pings the (I presume there is one) the DNS that the LTE provider gives you once every (say) hour or so.
This could be a netwatch script or a scheduled one.
If you use another DNS normally you could set a route through the LTE for that specific DNS IP only and it will be pinged no matter which “main” 0.0.0.0 gateway route is active.
The “right” way should be “A” in your list, but the script needs to be carefully written, you cannot rely on the fact that disabling the LTE, waiting a few seconds, and re-enabling it is guaranteed to bring up the interface, you need to add a (sensible) delay before next attempt and some counter, otherwise you could find yourself with the LTE interface flapping in an endless loop.
IMO, its a bug if it doesn’t come back if other OSes do recover…
But to force ping out LTE while fiber is the active route in main requires using a routing-table. To create a new seperate routing table that only goes over LTE, it’s a couple lines for that:
This is not used unless some rule or mangle causes traffic to NOT use the main. To use rules… you can add a /routing/rule to send a specific destination IP to the new route-table with only LTE created above. If we assuming you want to Cloudflare secondary DNS as host to check and route via LTE (and even if fiber is active), it a routing rule & a netwatch that will then ping that same IP in the /routing/rule
@Amm0
Excuse me, but wouldn’t a simple rule in “main” pointing to that specific IP do?
Having an added routing table and a routing rule isn’t over-complicating it?
Sure, that work too. But you don’t have further ability to limit to just the router (which I don’t show above, but /routing/rule let you exclude LAN IP from using LTE for the destination of 1.0.0.1 – a main route for 1.0.0.1 applies to all src-address)
A separate routing table keeps things clean IMO. And /routing/rule requires a table to use, which I think is cleaner than messing with main routing table directly… I presume the idea is that even if fiber is active a continuous ping runs over LTE, which keep the LTE carrier from dropping the APN session.
Certainly mangle rule without routing table work too, but that’s actually more complex since lte likely has a dynamic address and mangle cannot deal with interface routes (e.g. gateway=lte1), so you have to know the gateway IP address for mangle.
/ip/firewall/mangle/add chain=postrouting action=route route-dst=<lte_network_address>
If OP config roughly default it’s 4 lines of config… And OP is setup to add other rules that “override” main to force LTE connection via additional /routing/rule’s…
Totally possible carrier may separately force a drop…so ping may not help.
Other than marginally increasing data usage… in general, it’s like best to keep a continuous ping going even if LTE interface correctly recovers. This ensure the tower allocates at some “resources” running to keep the ping going.
e.g. other than data usage costs, a ping don’t hurts you and likely advantageous. It’s other other future users on same LTE tower that may be effected. e.g. tower will reject new admissions at some point if congested…but if ping was going you’d already have a connection.
I am sure that for an experienced member your proposed approach is easy and clean , I was only doubting that it is easy for a newcomer.
We don’t know how complex the OP’s routing table is, but if it consists of just two gateways, one through the ISP ONT and one through the LTE, or - depending on the way the failover is implemented - another two for recursive, adding another route seems to me not that bad, of course if there are many routes in main, a separate routing table would be much cleaner.
First thing would be anyway to check if pinging periodically is enough to keep the LTE alive.
In any case, the critical step I find needing to be taken into account is what to do when - for whatever reason - the main link is down AND the LTE is not responding,
If the timeout is 4 hours there is (unless I am missing something) no need to ping every n seconds, a set of a few pings (let’s say ten) every hour or so should do.
The idea of the disabling/reenabling periodically the LTE interface (besides some way to check if it actually worked) should be conditional, i.e. not happen if there was traffiic in the last three hours, maybe the script could check some counter on the LTE interface and not run if it is not 0 (or only run if it is 0).
many many thanks for your very very precious suggestions.
I personally have years of experience in IT but I am really new with MikroTIk.
I will read all your answers and will come back.
One additional point:
as I said, this LTE stick is a k5160, some kind of Huawey 3372. forget for a while how it works on WIndows… but… the old routers (the ADB and Sercomm) are both linux based and both equipped with a USB port for the backup LTE stick.
The blinking red led on the stick means: 4G network available, not conntectd
the solid red led means: connected to the 4G
On the two ADB and Sercomm, if the primary WAN \ fiber is UP, then the stick is NOT connected, the led is blinking and of course there is no IP on that port
As soon you unplug the fiber, then in 10 seconds the routers connect to the 4G, the led becomes solid red the IP is there
Mybe I need a script on the MikroTik to enable the LTE stik just in case and avoid to have it always on with no traffic…
Am I the only one with such problem?
Why the rb5009 can not reconnect by itself?
By default, there is no check on the distance=1 /ip/route (e.g. fiber). So simply unplugging is not going to cause a failover immediately.
If fiber is a static route, you should add a check-gateway=ping on the 0.0.0.0 default route in /ip/route. If the fiber using DHCP client to get the fiber WAN IP, then it more complex and need a script that run on the /ip/dhcp-client that will set check-gateway=ping when a DHCP address is acquired. In either case, connection are caches, so it takes a bit for things to timeout and reconnect too.
On the modem disconnect… I’d collect some additional logs by adding /system/logging add topics=lte,!packet,!raw. Then in a netwatch’s “on-down” script disable/enable the modem in netwatch same script. And if you add /system/sup-output to same netwatch on-down script, that generate a “supout.rif” needed for Mikrotik support (and will have the additional logging you added from topics=lte,!packet,!raw) – as this should give support more clues at why your modem isn’t trying to reconnect.
I’d also try the latest 7.15rc if you haven’t – often these LTE problems do get resolved in new versions. Also worth checking you’ve update the firmware to match RouterOS, this is done in /system/routerboard and upgrade (you may want to enable auto-update in /system/routerboard/settings, which will upgrade the firmware after reboot automatically).
good suggestion! thanks!
I received the Rb5009 with 7.8 ROS and then upgraded to 7.14 but I was not aware about the need to upgrade the firmware, so the firmware is still 7.8 while the ROS is 7.14
Anyway I already provided tons of logs to Mikrotik support and their position is that the LT stick is faulty.
Question: having the LTE stick plugged in the Windows PC, I can very simply use the CONNECT \ DISCONNECT function. The same on the other two old ADB and Sercom router from their GUI.
on the RB5009 the stick automatically connects when plugged in, but… support is saying there is no way to force CONNECT or DISCONNECT, so I have to enable and disable it.
Are you aware of any way\command to connect \ diconntect the LTE?
There are two ways. one is to disable and enabled the lte interface (via “/interface/lte lte1”). The other is power cycling the USB (via “/system/routerboard/usb/power-reset”).
Using netwatch script is the way to do this (e.g. when ping fails). See https://help.mikrotik.com/docs/display/ROS/Netwatch .
In winbox, scripts can be attached to any /tool/netwatch in the “Down”/etc tabs.
In most cases, disable/enable the lte interface will fix a problem. So you can add something like this to the “Down” script for a netwatch:
You add multiple netwatch if you want to get more sophisticated. Each netwatch has an interval, so you can have a 2nd netwatch the does power-reset of USB to reboot modem. But use a longer interval=. e.g. if you set the interval for 1st netwatch to 1m (00:01:00), then you have a 2nd use 2m (00:02:00). The power-reset one use:
Looking around on the board, there are several scripts revolving around the two possible approaches Amm0 explained, triggered either by Netwatch or running at fixed time intervals via scheduler, with varying levels of complexity.
And there are a few reports of ISP’s that do force this disconnection every 4 h, so you are not alone.
Some scripts go even further, and if connection is not re-established proceed to reboot the router.
Since in your case the LTE is only used for failover, I wouldn’t be too aggressive with the frequency of the check.
If the basic ping of netwatch is enough to keep the connection alive, when run every - say - 5 minutes or so, the “down” script won’t ever run, but if anything else happens, it will reset the interface, which should take only a few seconds.
Coincidentally I migrated a site from a hEX (RB750Gr3) to a RB5009 with LTE backup using Huawei stick, and now I also have a problem…
It may however be a different issue. The link was up for over a week, including some provider disconnects, however then suddenly the link disconnected, the LTE1 device disappeared, and the USB info does not show the stick anymore.
I tried a USB power reset but it does not solve it.
I will have to go to the site to further investigate, e.g. plug the stick into a laptop, plug another USB device in the router, etc to see what went broken.
Maybe the USB port does not work anymore, maybe it got overloaded by the power consumption of the stick?
Yesterday night I configured the netwatch and the static route through the LTE as suggested
It worked well.
Great! Thanks
Unfortunately This evening the LTE was again down regardless the netwatch.
There is anther problem… that needs a different solution. I already mentioned about this problem and also reported to support
and this is the same problem here blow
This evening there was no LTE interface in the list!!!
The stick is inserted, of course, in the USB port, the red led blinks (means: 4G available, not cooected).
of course I can not enable\disable somthing that according to the ROS does not exist.
Trying USB power reset… no effect
I had to reboot the RB5009.
As I said, I reported this to support with several logs and the answer is: your LTE stick is faulty.
I do not agree since it works fine in the other 2 Linux based routers
Moreover, have a look at the path:
a) start from a freshly booted RB5009, with the LTE stick plugged
b) after hours or 1 day or 2 days… it is not predicatable… the LTE disappears from interface list
c) USB power reset, no effect
d) unplug the LTE and plug it again, NO EFFECT !!!
f) ok let’s do this now:…
g) unplug the LTE
h) plug a stipid USB pendrive, RB5009 DOES not see anything attached to it
i) reboot the RB5009… everything again is OK
According to me, step d) and h) are the proof of that there is something strange in the RB5009
Yeah something is fishy with RB5009 and/or USB, seemingly with Hueweis. There’s been a few posts.
About the only thing a user can do, is try the stable and beta/rc and/or even older V7 to see if those fix. Specifically 7.15rc has a fix to always leave the LTE interface around, so that worth trying.
*) lte - make interface persistent (unused interface configs can be removed, allow to export and examine current configuration without the device present);
As noted above, this includes RouterOS version, /system/routerboard firmware, and LTE firmware. If it an external stick as here, you might want to hunt to any upgrades, but most of Hueweis here are older so likely are already at latest. Perhaps even netinstall might help if there is something related to past upgrades that’s messing this up.
But these older modem that use ECM mode (not newer MBIM protocol) don’t provide a lot of feedback on what’s causing the failures since they more appear as USB dongle than newer modem that have an API.
Thanks.
Will try the release you suggested.
Right now there is the 7.14.3 stable.
I can say that with this release things are different than the previous, not better, but different
This LTE card is a MBIM one.
Anyway… I can not pretend to have this stick working, even if I think that also the ROS has its bugs… since when the RB5009 can no more recognize the stick, if I put any other device… it is the very same.
What can I do? What can use to add failover on LTE?
other suggested stick?
an external LTE modem? Maybe from MikroTik?
Any idea is appreciated, but… something working, please
Probably already done this, but I’d make sure the APN is right - those log look like it’s not getting an IP address, which could mean some specifical APN setup may be required.
If you’re in Europe, another option be hAPaxLite6-LTE: https://mikrotik.com/product/hap_ax_lite_lte6#fndtn-specifications . Not the fast modem modem on market, but good value and seemingly good reports. That also give you a backup router as well and/or perhaps management Wi-Fi network for the routers. Since connect to RB5009 via ethernet, so you can get something like a hAPLite in perhaps better location for signal than a USB stick modem in the RB5009 USB port too.
Yes yes… the APN is correct. I have several of such SIM card inserted in alarm systems, boilers, etc.
I know that hAPaxLite6-LTE… I had a look at it… just need to know if this unit will suffer the same problems I have on the RB5009.
This night it rebooted by itself for no reason. The log says
“router rebooted without proper shutdown, probably power outage” but no power outage at all. I have a UPS and Cisco AP POE connected to the RB5009 and they did not experienced power issue.
RB5009 is very very nice, but … Mmmmmmmhhhhh, some doubts in my mind.