Community discussions

MikroTik App
 
swapnilsonawane81090
just joined
Topic Author
Posts: 24
Joined: Sat Feb 08, 2020 5:33 am

RX Drops on SFP+

Tue Mar 08, 2022 4:42 pm

Facing This issue on v6. also same on v7
now upgraded on v7.1.3 but still facing RX drops on the interface.
Image

This is Dell 620
16 Gig Ram 2 Xeon Processors (24 Cores) 2.5Ghz
2 SFP+ Ports
RouterOS is on SSD
Image
Multi Cpu = Checked
x86-64 = Checked
Connected using DAC Cable
Already Cleaned 10G LAN Card, tried to shift it to another slot, also tried to replaced it with newer one, But no luck

please help us to stop this packet dropping issue.
Last edited by swapnilsonawane81090 on Wed Mar 09, 2022 3:42 am, edited 1 time in total.
 
Zacharias
Forum Guru
Forum Guru
Posts: 3459
Joined: Tue Dec 12, 2017 12:58 am
Location: Greece

Re: RX Drops on SFP+

Tue Mar 08, 2022 9:12 pm

How are those SFP ports connected ?
Using a DAC or AOC cable ?
 
User avatar
TomjNorthIdaho
Forum Guru
Forum Guru
Posts: 1518
Joined: Mon Oct 04, 2010 11:25 pm
Location: North Idaho
Contact:

Re: RX Drops on SFP+

Tue Mar 08, 2022 11:15 pm

If you are using fiber , check your optical receive signal strengths ( on both sides ) then verify the sfp modules are within spec of the received signal strength. Also check the distance your sfps are rated for.
 
Zacharias
Forum Guru
Forum Guru
Posts: 3459
Joined: Tue Dec 12, 2017 12:58 am
Location: Greece

Re: RX Drops on SFP+

Tue Mar 08, 2022 11:24 pm

If it is fiber, also check if you use the correct SFP modules along with the Fiber type used...
 
swapnilsonawane81090
just joined
Topic Author
Posts: 24
Joined: Sat Feb 08, 2020 5:33 am

Re: RX Drops on SFP+

Wed Mar 09, 2022 3:41 am

Connected Via DAC Cable
 
swapnilsonawane81090
just joined
Topic Author
Posts: 24
Joined: Sat Feb 08, 2020 5:33 am

Re: RX Drops on SFP+

Wed Mar 09, 2022 6:24 pm

*Also, it has tx drops on pppoe users *
every time the user logins 2 packet drops on tx.
 
r00t
Long time Member
Long time Member
Posts: 674
Joined: Tue Nov 28, 2017 2:14 am

Re: RX Drops on SFP+

Wed Mar 09, 2022 7:51 pm

If it's really directly related to user logging it, then it looks like software issue.
Probably IO blocking causing issue in ROS... I can think of something like writing to flash on user login somehow stalls all IO or something like that... but that's just speculation.
But if it's really related to traffic, there is zero chance this is hardware issue (hardware packet loss would be more or less random).
Try to collect as much info from your router as possible and contact Mikrotik support... I think that's your only hope.
 
swapnilsonawane81090
just joined
Topic Author
Posts: 24
Joined: Sat Feb 08, 2020 5:33 am

Re: RX Drops on SFP+

Thu Mar 10, 2022 4:18 pm

has anyone faced this issue in the latest ROS?
 
swapnilsonawane81090
just joined
Topic Author
Posts: 24
Joined: Sat Feb 08, 2020 5:33 am

Re: RX Drops on SFP+

Wed Mar 16, 2022 3:42 pm

Mikrotik is totally irresponsible.
 
msatter
Forum Guru
Forum Guru
Posts: 2929
Joined: Tue Feb 18, 2014 12:56 am
Location: Netherlands / Nīderlande

Re: RX Drops on SFP+

Wed Mar 16, 2022 7:05 pm

SFP(+) just drops a lot also under v6. Many topics about that and we just live with it.
Screenshot_20220316_180257.jpg
You do not have the required permissions to view the files attached to this post.
 
swapnilsonawane81090
just joined
Topic Author
Posts: 24
Joined: Sat Feb 08, 2020 5:33 am

Re: RX Drops on SFP+

Thu Apr 21, 2022 4:43 am

Facing This issue on v6. also same on v7
now upgraded on v7.1.3 but still facing RX drops on the interface.
Image

This is Dell 620
16 Gig Ram 2 Xeon Processors (24 Cores) 2.5Ghz
2 SFP+ Ports
RouterOS is on SSD
Image
Multi Cpu = Checked
x86-64 = Checked
Connected using DAC Cable
Already Cleaned 10G LAN Card, tried to shift it to another slot, also tried to replace it with a newer one, But no luck

please help us to stop this packet dropping issue.
@mikrotik please reply
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 3074
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: RX Drops on SFP+

Thu Apr 21, 2022 4:56 am

maybe try changing interface queue for a big pfifo queue type
 
NickOlsen
Member Candidate
Member Candidate
Posts: 131
Joined: Wed Feb 13, 2008 9:30 pm

Re: RX Drops on SFP+

Mon Apr 25, 2022 6:37 pm

It's very likely that you are seeing buffer overflow. The packet is coming in that interface, attempting to enter the queue to be picked up by the kernel and cannot enter it because it's full (so it must be dropped). Multiple hardware queues help balance this over multiple cores better. Make sure your ethernet card has multiple hardware queues (System > Resources > IRQ). If it doesn't, get a different NIC that does.

Assuming it does have multiple hardware queues, you can also add a bigger buffer in the OS. I'd recommend setting the interface to "Multi-Queue-Ethernet-Default" under Queue > Interface Queue.

Then raise the queue size under Queue Type by ~100 packets until the drops stop. Do note however that the tradeoff will be latency for some packets. Which the buffer gets really large (Because the CPU can't pick the packets up fast enough) the packets that are in the buffer will incur a higher latency. But you may not notice or even be able to measure it decently without serious lab equipment.

This is one of the downfalls of MT and ultimately is why we're moving to big iron soon. MT fits certain needs. But not all.
 
swapnilsonawane81090
just joined
Topic Author
Posts: 24
Joined: Sat Feb 08, 2020 5:33 am

Re: RX Drops on SFP+

Sat Apr 30, 2022 6:14 am

It's very likely that you are seeing buffer overflow. The packet is coming in that interface, attempting to enter the queue to be picked up by the kernel and cannot enter it because it's full (so it must be dropped). Multiple hardware queues help balance this over multiple cores better. Make sure your ethernet card has multiple hardware queues (System > Resources > IRQ). If it doesn't, get a different NIC that does.

Assuming it does have multiple hardware queues, you can also add a bigger buffer in the OS. I'd recommend setting the interface to "Multi-Queue-Ethernet-Default" under Queue > Interface Queue.

Then raise the queue size under Queue Type by ~100 packets until the drops stop. Do note however that the tradeoff will be latency for some packets. Which the buffer gets really large (Because the CPU can't pick the packets up fast enough) the packets that are in the buffer will incur a higher latency. But you may not notice or even be able to measure it decently without serious lab equipment.

This is one of the downfalls of MT and ultimately is why we're moving to big iron soon. MT fits certain needs. But not all.
Changed to this ("Multi-Queue-Ethernet-Default" )
Still facing drops.
 
User avatar
chechito
Forum Guru
Forum Guru
Posts: 3074
Joined: Sun Aug 24, 2014 3:14 am
Location: Bogota Colombia
Contact:

Re: RX Drops on SFP+

Sat Apr 30, 2022 5:22 pm

what percentage of the total received frames are rx-drop ??

rx-drop are permanent/periodic or only when there is a surge in traffic ?
 
swapnilsonawane81090
just joined
Topic Author
Posts: 24
Joined: Sat Feb 08, 2020 5:33 am

Re: RX Drops on SFP+

Sat Apr 30, 2022 6:20 pm

what percentage of the total received frames are rx-drop ??

rx-drop are permanent/periodic or only when there is a surge in traffic ?
No.
They are random, & now we are also having Rx Errors increasing 2 or 3 every 3rd or 4th sec.
 
User avatar
z4ki
just joined
Posts: 4
Joined: Thu Dec 01, 2016 8:03 pm
Location: Bangladesh
Contact:

Re: RX Drops on SFP+

Sat Apr 01, 2023 10:22 am

Hello, did you manage to solve this?
I'm facing similar issue.
 
PortalNET
Member Candidate
Member Candidate
Posts: 151
Joined: Sun Apr 02, 2017 7:24 pm

Re: RX Drops on SFP+

Thu Jun 20, 2024 6:44 pm

Hiya..

me too on the same boat.. the only difference on our testing was.. put a switch between and the RX-errors will stop in my case


we did make all hardware tweaks possible and changed and swapped different hardwares to try and find the best working scenario, but we keep getting RX-ERRORS if we plug 2 servers x86 directly..

tried dac cable
tried intel gbic sfp+ 850nm multimode
tried intel gbic sfp+ 1310nm 10km single mode
tried mellanox nic
tried bios tweaks disable SR-IOV

allways keep getting rx-errors... on the interfaces.. soon as we places a huawei switch s6730 before link arriving in BGP server.. the error count stopped.

zero errors..


tried the same placing a switch crs317 in the middle of BGP and PPPOE server and the RX-ERRORS stopped also.. which indicated that this is some kind of software issue on x86 rOS..

and yes we did check huawei switch and crs switch logs.. and we dont see any TX/RX errors on the stats windows on the assigned ports where machines connect.. which is the most odd and strange part.. soon as we plug both machines direclty with the cables sfp gbic it starts counting the RX-ERRORS..


EDIT - We did further testing, and we have realized that the RX-ERRORS have stopped once we enable TX Flow Control and RX Flow Control on both servers interface connected each other so far.. running tests in the last 30minutes and no RX-ERROR showed up yes.. but not sure if its wise to leave Flow control enabled as it could interfere with latency and other aspects also ?
 
changeip
Forum Guru
Forum Guru
Posts: 3833
Joined: Fri May 28, 2004 5:22 pm

Re: RX Drops on SFP+

Thu Jun 20, 2024 9:07 pm

Run torch or packet sniffer and see if they go away while running. The RX Drops are also caused by unknown vlans coming into an interface. If you run packet sniffer or torch then all vlans are allowed in the kernel and it wont drop them. Just a way to test if thats the issue.
 
PortalNET
Member Candidate
Member Candidate
Posts: 151
Joined: Sun Apr 02, 2017 7:24 pm

Re: RX Drops on SFP+

Thu Jun 20, 2024 10:46 pm

Run torch or packet sniffer and see if they go away while running. The RX Drops are also caused by unknown vlans coming into an interface. If you run packet sniffer or torch then all vlans are allowed in the kernel and it wont drop them. Just a way to test if thats the issue.
Thanks i will make some further testing as we do have 1 vlan running on the interface.. but again.. why does it work with switch in between 2 servers without errors?
 
changeip
Forum Guru
Forum Guru
Posts: 3833
Joined: Fri May 28, 2004 5:22 pm

Re: RX Drops on SFP+

Fri Jun 21, 2024 12:06 am

Is it managed switch and its stripping all unknown vlans?
 
PortalNET
Member Candidate
Member Candidate
Posts: 151
Joined: Sun Apr 02, 2017 7:24 pm

Re: RX Drops on SFP+

Fri Jun 21, 2024 12:47 am

Is it managed switch and its stripping all unknown vlans?

its a CRS317 switch.. in between BGP and PPPOE x86 servers.. this way i get no RX-ERRORs on both sides..

if i remove the CRS317 mikrotik switch and connect direct both servers x86 bgp and pppoe with fiber cable or dac , with intel, mellanox, mikrotik gbic..

i allways get RX-ERROR..

mikrotik CRS317 is set in SWOS mode..

on the other interface on BGP.. we have it connected to a huawei S6730 on the Carrier ISP that supplies our smaller ISP internet Link.. and again on this network interface port we have no RX-ERROR or TX-ERROR..

but now what we started testing this morning was.. TX and RX flow control enabled on both interfaces from both servers with cable connected direclty.. and so far for the last 10hours we had no RX-ERROR or TX-ERROR or any other kind of queue drops.... we are still testing.. to see if we get any increase on latency to our main services acessed.. gaming etc... because in the past we tested TX and RX flow control on another device.. and after few days enabled.. traffic started behaving crazy.. with high latency on some services.

now the question .. shall we keep flow control enabled on the interfaces or not ? whats the pros and cons of flow-control ?
 
User avatar
mkx
Forum Guru
Forum Guru
Posts: 12297
Joined: Thu Mar 03, 2016 10:23 pm

Re: RX Drops on SFP+

Fri Jun 21, 2024 8:43 am

... but not sure if its wise to leave Flow control enabled as it could interfere with latency and other aspects also ?

In theory flow control kicks in when receiver's buffers get nearly full.
  1. With flow control enabled the receiving device will ask sender to pause transmission for a while. This makes possible for receiver to gracefully flush the buffer by sending buffered frames forward. OTOH this makes sending device to buffer the frames in the mean time. Alas, this device can ask it's sending peer to pause as well and this cascade can go all the way up to original sender if needed. With additional delay incurred it is likely that TCP stack would throttle down Tx speed.
  2. With flow control disabled, sending device will continue to send further frames and receiver will have to drop frames. TCP stack will hurt a lot, speeds tend to crash down to very low values with packets lost ... but in mid-term this also helps with L2 buffer overflow.

Now, option a) causes delay jitter, option b) causes dropped packets. The former hurts gaming performance and to a lesser extent "normal" TCP streams. The later hurts everybody (including gamers). UDP as a protocol doesn't care about packet loss, it's up to application to handle it. Some apps don't care about lost packets, some experience some issues. For the apps that experience big issues due to packet loss it's better to use TCP in the first place.

IMO option a) is way better.

Option a) has a nasty side effect: consider situation where there's fast upstream port and a few slower access ports. One slow access port, struggling with transmissions (and filling up buffer) will cause stall of the faster upstream port and that will hurt throughput of other (non-congested) access ports. The reason is that flow control can not signal any details about which downstream port is having issues ... and upstream switches don't care either (even if in theory they could if pause frames would include list of destination MAC addresses for which pause is needed).
 
PortalNET
Member Candidate
Member Candidate
Posts: 151
Joined: Sun Apr 02, 2017 7:24 pm

Re: RX Drops on SFP+

Sun Jun 23, 2024 7:56 pm

... but not sure if its wise to leave Flow control enabled as it could interfere with latency and other aspects also ?

In theory flow control kicks in when receiver's buffers get nearly full.
  1. With flow control enabled the receiving device will ask sender to pause transmission for a while. This makes possible for receiver to gracefully flush the buffer by sending buffered frames forward. OTOH this makes sending device to buffer the frames in the mean time. Alas, this device can ask it's sending peer to pause as well and this cascade can go all the way up to original sender if needed. With additional delay incurred it is likely that TCP stack would throttle down Tx speed.
  2. With flow control disabled, sending device will continue to send further frames and receiver will have to drop frames. TCP stack will hurt a lot, speeds tend to crash down to very low values with packets lost ... but in mid-term this also helps with L2 buffer overflow.

Now, option a) causes delay jitter, option b) causes dropped packets. The former hurts gaming performance and to a lesser extent "normal" TCP streams. The later hurts everybody (including gamers). UDP as a protocol doesn't care about packet loss, it's up to application to handle it. Some apps don't care about lost packets, some experience some issues. For the apps that experience big issues due to packet loss it's better to use TCP in the first place.

IMO option a) is way better.

Option a) has a nasty side effect: consider situation where there's fast upstream port and a few slower access ports. One slow access port, struggling with transmissions (and filling up buffer) will cause stall of the faster upstream port and that will hurt throughput of other (non-congested) access ports. The reason is that flow control can not signal any details about which downstream port is having issues ... and upstream switches don't care either (even if in theory they could if pause frames would include list of destination MAC addresses for which pause is needed).


hiya @mkx

first of all thnks for your input, as surely if it doesn´t work, at we least we will find another road path on how not to approach things hehehhehe

ok so both x86 servers are Dell R620s..

both have 160GB RAM DDR3
2 CPus E5-2697v2 (on mikrotik we see 48cores beeing used, but the actual physical cores are only 24c/48t) so mikrotik uses threads as cores ...

our traffic average is 3.5gbps to 5gbps max peak times.. under system/resources/hardware we have enabled Multi-cpu and allow x86_64..

average max peak on single core is around 10% and global cpu showing from 3% to 6% at peak traffic times os 5gbps..

ou setup is both servers have Dell Intel x710-da4 mezzanine card with 4 ports sfp+10 + on higher riser card mellanox connect-x4 MCX24241 dual sfp+ 10G port and on the other riser card x16 we have mellanox Connect-x3 pro MCX354ECAT dual port 40gb.

Both BNG/BRAS PPPOE running same exact hardware and BGP running same hardware..

ATM only ports enabled on both ends as our ISP carrier internet link from operator is a 10G link running on 10G sfp+ port. out of a mellanox mcx2424 port..

on the bgp side on the intel x710-da4 we have our vmwares on 1 port our dns servers on another port.. connected straight with intel 850nm gbics to our other dell servers running ERP, DNS servers, monitoring system etc.. all running good without any errors at all. zero rx-errors..


on our BNG/BRAS/PPPoe server side.. we have 3 ports of the Intel X710-DA4 card also connected to 3 different fiber OLTs in our datacenter again with short 3mts 850nm multimode cables and intel sfp+ 850nm 330mts GBIC cables... both OLTs have tx- and rx flow control disabled and on the ethernet interfaces showing on the x86 server also disabled flow control and we have 0 rx-errors.. on port 4 connecting BGP directly to PPPOE server we have again 3mts cable 850nm multimode cable with intel sfp+ gbics 850nm multimode gbics ... and this is the only ethernet interface were we do get errors.. on the RX- only.. we have incresed mkpfifo up to 50000 packets now.. we still get errors about 6000 at each 24h running..

we have same MTU setup on both interfaces, sema VLAN mtu size setup on both servers.. we even tried swapping ports on the intel x710, we even managed to tryout on the mellanox interfaces on both servers.. and still get errors.. swapped out gbics.. etc.. different vendors.. and we still get errors.. just on the RX-ERROR.. and funny enough BGP side interface normally has around 1k more errors that on the PPPOE interface side on the RX-errors..


i will try to remove the vlan.. on both machines to try out direct routing without vlan between both interfaces.. even do the other interfaces connected to OLTs and other servers do have VLAN setup also same way and we do not have errors showing up on the interfaces...


the only thing that seem to be working is rx-tx flow control enabled on both sides.. or placing a Switch in between both interfaces wich is even more odd and strange.. as the CRS317 switcOS mode, does not have tx-rx flow control enabled. its switched off on the config.

one other thing that confused me was on the IRQ site... it shows eth0,eth1,eth2,eth3 when i disabled ether1 and ether3 on the interfaces intel x710-da4 card.. i40e eth0 and i40e eth2 disappear letting just i40e eth1 and i40e eth3 counting the mellanox card where our main link is arriving its not displaying on the irq but this one we have no errors on it.

i then understood that i40e eth0 = ether1 i40e eth1=ether2 i40e eth2= ether3 and i40e eth3=ether4 on the interfaces the irq aparently starts splitting traffic from ether1 on i40e-eth0 as shown on the pic bellow seems like its splitting traffic across the cpu cores..the mellanox also are getting traffic split across the cpus even do it states that the only active cpu is 0 on the cpu count..
pic2.png
pic1.png





on the mellanox card irq it displays various counts cpus active..
mellanox por3.png
like pictures below
mellanox por1.png
mellanox por2.png
You do not have the required permissions to view the files attached to this post.
 
User avatar
ID
newbie
Posts: 34
Joined: Tue Dec 26, 2006 10:36 pm

Re: RX Drops on SFP+

Mon Jun 24, 2024 12:03 am

Hard to say what was the cause but did you connect intel to mellanox card between servers and see behaviors?
 
PortalNET
Member Candidate
Member Candidate
Posts: 151
Joined: Sun Apr 02, 2017 7:24 pm

Re: RX Drops on SFP+

Tue Jun 25, 2024 6:08 pm

Hard to say what was the cause but did you connect intel to mellanox card between servers and see behaviors?
Hi..

i didn´t make that testing yet..

but we do have other ports of the intel card also running connected directly to the OLT fiber devices.. and no errors on those ports..

we have vlans on the ports also.. and no errors on the vlan interfaces


even the intel interface connected directly to the BGP also has a vlan setup , and no tx/rx-erros on the vlan.. only on the physical interface we see the errors. around 4k packets every 24hours this gives an average 2,77 packet loss per minute (but we go get odd couple of packets increase after couple of hours.. sometimes it stays around 4 hours with no packte count, then all the sudden around 100 packets increase.... on average traffic of 3.5gbps download and 250mbps upload passing through the interface directly... its the internet wan gateway interface that connects the PPPOE router to the BGP.. disabled auto-negotiation also, let if fixed on 10G SR LR but also get errors.. the MTU on the interface is 1580 also..aldo the L2 mtu allayws stays 0 we cannot change it on..
 
PortalNET
Member Candidate
Member Candidate
Posts: 151
Joined: Sun Apr 02, 2017 7:24 pm

Re: RX Drops on SFP+

Tue Jun 25, 2024 6:17 pm

Hiya

as an update whilst googling i found ukrainian site where people were stating the same issue with intel nics on x86 version rOS 6.4xx

and i will give it a tryout later on, even do its an issue related only on the VLAN port on his side..
teste3.png
test2.png
test1.png
You do not have the required permissions to view the files attached to this post.
 
PortalNET
Member Candidate
Member Candidate
Posts: 151
Joined: Sun Apr 02, 2017 7:24 pm

Re: RX Drops on SFP+

Wed Jul 17, 2024 5:24 pm

Run torch or packet sniffer and see if they go away while running. The RX Drops are also caused by unknown vlans coming into an interface. If you run packet sniffer or torch then all vlans are allowed in the kernel and it wont drop them. Just a way to test if thats the issue.
ok did the test.. running torch it still counts on rx-errors but slower..... on packet-sniffer it counts the RX-ERRORS alot faster..


now the strange part... i have on the CPE client side.. under pppoe connection a few windows open with ping -t xxxx.xxx.x tó diferent places...

and not even a single packet loss. after hours running.. so aldo it counts around 20k RX-ERRORS per day we don´t see any PING loss on the pppoe-servers.. strange

Who is online

Users browsing this forum: No registered users and 12 guests