Setup RB450gx4 router
Two WANIPs, primary is 1gib fibre
In between DGS-1100-24 UPS
All on UPS
I have on two separate vlans, two users who play poker online.
They describe short time period outages, enough to lose packets, and they see slowdowns and on some sites, this is enough to boot them off the site, others are more forgiving (better servers).
They noticed that it happened at the same time (both vlans, and each was using diff servers) which points to a COMMON source.
Since this is seemingly sporadic/random, I am inclined to think what would cause such intermittent delays that are more prevalent during the day than at night.
THe first thing I am going to do is put them on separate WANIPs to eliminate the primary ISP from being the culprit.
If they both still simultaneously experience the issue, then its not the ISP and that leaves the router and the switch (and the UPS).
Are switches prone to such intermittent traffic interruptions?
Can using the router as DNS server ([bv]ache on router)[/b] cause such issues?
Can a UPS cause such issues?
Any other ideas?
that I am getting easily 30% or more packet loss to sites in question. (not good)
that both ISPs are showing me a 5-10% packet loss to the gateway of each ISP (bad)
that traffic flow on one of the ISP has significant gaps simultaneously in both TX/RCVE but the routers logs do not show the connection down (very bad).
CPU load is always around 0-1%
On the bad ISP connection, I will try a different cable to the modem, and a different ethport on the router.
Overall will try a replacement router when able, to see if I can quickly test traceroute to the gateway - obviously looking for 0% packet loss which should be expected.
That would be a first good step.
I am afraid it may not be that easy. Ethernet flow control packets are usually processed by the hardware itself on a very low level, so it is a challenge to even capture them, let alone processing them using switch rules. It may depend on the particular switch chip used whether it is possible or not.
The manual talks about the 8327 switch chip (which is used on the OP’s RB450Gx4) using flow control towards the CPU (so a slow speed negotiated on one port can affect sending packets via other ports), but it doesn’t mention whether it itself handles ethernet flow control frames arriving from outside (so also X-off received from one uplink could affect transmission through the other one).
Hmm. it’s just an Ethernet frame.
In the first link above there is an image showing a Wireshark screenshot of an Ethernet “Pause” frame. It’s about that very frame packet.
Since Wireshark is a PC application, then one could wonder how it was able to capture that frame…
tx-flow-control (on | off | auto; Default: off) When set to on, the port will generate pause frames to the upstream device to temporarily stop the packet transmission. Pause frames are only generated when some routers output interface is congested and packets cannot be transmitted anymore. auto is the same as on except when auto-negotiation=yes flow control status is resolved by taking into account what other end advertises.
rx-flow-control (on | off | auto; Default: off) When set to on, the port will process received pause frames and suspend transmission if required. auto is the same as on except when auto-negotiation=yes flow control status is resolved by taking into account what other end advertises.
The fact that Wireshark can dissect a pause frame says nothing about how easy or difficult it is to actually capture it, using Wireshark/dumpcap/tcpdump. The key is the ability of the network card to let the pause frames through to the software layer rather than handling them internally. https://osqa-ask.wireshark.org/questions/56214/best-nic-for-detecting-pause-frames
Here’s a screenshot. At least for the WAN port the “Tx Flow Control” and “Rx Flow Control” should be set to “Auto” or “Yes”.
On my device I’ve set them all to Auto.
Auto Negotiation is by default enabled.
Of course such packets (in and out) must not be blocked by a firewall rule, meaning these have to be accepted…
Both are off on mine, I changed it to auto for both on my vlan bell cconnection and there was no change in packet loss to the gateway of the ISP.
After running for about 1.5 hours, both were sitting at about 50%
Then I think iperf is your best friend…
I think I would get rid of VLAN and use pure IP routing instead, and also use an MT switch with RouterOS instead of the DL switch, since with ROS you have much more control.
Sorry, can’t help any further, I just tried. Maybe @sindy has some more ideas.
Much appreciated, right now I just connected another router (spare hex) to the bell fibre connection so am checking out any differences. By the way my connection on the hex is via a vlan with vlan bridge filtering.
Good news is that on the hex, after 10 minutes not a single failure on traceroute to the Bell gateway,whereas on the rg450g was around 50% on bell and 20% on eastlink.
here is a winmtr report to reach google on the hex via bellfibre
I was doing about 350-450 mbps and uptp 500.000 pps packets (~ 80% UDP and 20% TCP) on a Linux router with old kernel (2.6…) with 200 Euro hardware with 0% packet loss.
Dont waste time with a non working mikrotik hardware. Just replace it (maybe with other cheap vendor) and get rid of problems.
A netinstall was advised to rule out some nand corruption etc…
in any case the hex is up and running and I am seeing 0-.1 percent loss on traceroute to the gateways of each ISP, so I am somewhat relieved.
Sad to have to junk the rb… Looking back it was always quirky but never had enough proof… Out of warranty too.
Hi CZFAN, I did exhaustive testing with someone far more knowledgeable than me, so not to worry I wasn’t bumbling around like a complete fool.
Suffice to say, even if a miracle occurred and the 450G was resurrected from insidious hell, it would remain led lights off as I am going on a new voyage with a CCR1009.
Yes, I have graduated to the next level. Perhaps Normis will send me an MT t-shirt.
I just completed a neinstall and I will post the trace route to the ISP gateway, and winmtr to google.ca results as well as the gap image from earlier testing, which shows one ISP with no Gap and the other with a huge gap.
No change! Originally using 6.46.6 firmware, now uising 6.45.9 firmware - no difference!
Not sure if I missed anything, but I have not seen any evidence in this thread that indicates any problems on the 450. Changing things from the default, i.e. flow control, etc is going to make your environment more complicated and more prone to problems.
You are welcome to throw money at it and buy a CCR1009, but I dont think there are any guarantees that it will solve the problem.
So lets start with the basics, work from the RB450 outwards, and provide:
a diagram / more info on how this connects to the ISP network
Hi CZFAN, I have been testing a netinstall version of the 450 using a backup config and a default config with interesting results.
Suffice to say, that it may be something in the config and will confirm tomorrow.