I have three devices in my setup currently, a windows 10 pc with a mellanox connectx-3 10gbe nic, a proxmox server with another mellanox connectx-3 nic, and a CRS310-1G-5S-4S+ switch. The proxmox server has a pfsense installation virtualized on it for providing dhcp, with the LAN network being connected to the bridge the uses the mellanox nic. The switch is set up with SFP+1 being a VLAN 20 access port going to the windows pc and SFP+4 being a trunk port going to the proxmox server. It seems to be set up correctly, as the windows pc gets an address from the IP range associated with VLAN 20, and it can access any other devices on that same subnet just fine. However, if I run iperf between the proxmox server and the windows pc, I’m maxing out at around 300mbit/s, although it jumps around like crazy. Both the windows pc and proxmox server are connecting to the switch with a 10gbit link, the 10G leds on the switch are lit up. I believe the issue is the switch because if I connect the windows pc directly to the proxmox server and manually set the VLAN tag to 20, I get an address from the same subnet as before and can access devices like before, but now an iperf gives pretty consistent 5.5gbit/s (which is still slower than expected, but fast enough). Below are the commands I used to configure the switch:
I don’t know of any config files or “view layout” commands that I could paste here to give more details, I barely managed to get VLANs working at all, but if there is anything I could add to this to give more information, I’ll be happy to append it.
For the life of me I cannot tell if this post hasn’t been approved by the mods yet (it’s been two days now and I still see approve and disapprove buttons), if I simply didn’t provide enough info, or if this is a freak problem and no one has any ideas for troubleshooting.
It is easy to tell if other people can see your post. Just log out of the forum, and if you can see your post when you are not logged in, then other people can see it too.
I don’t have the equipment you have, so made no comment. But since you are wondering if your post was muted and want any feedback you can get, I will provide what I would do if I were in your shoes.
Is that really your complete config? Do you have more than a single bridge configured? Do the bridge ports show up as HW offload?
If you have another 10G pair of SFP+ modules (or DAC), you could do a better comparison by using the following config on the switch (note sfp2 removed from vlan 20, just to see if that is affecting performance, is there anything connected to sfp2? Also this is making vlan 20 tagged on both sfp-sfpplus1 and sfp-sfpplus4. This will allow you to use the exact same config on the PC as when you used just a single link directly between proxmox and PC (with PC configured to use tagged vlan 20).
Yeah the post was sometimes showing as unavailable like it had been deleted, but sometimes showing as normal, and I could sometimes see it listed in the forum and sometimes not.
Thank you for at least letting me confirm that this is still here.
Yep that’s the whole thing, I started out with a much more complex configuration, but started removing parts and eventually did a full factory reset, after which I removed all interfaces but ether1 from the default bridge, then ran the commands I showed. I have the default bridge on ether1 only because I didn’t want to lose the management interface. I’m not sure about the HW offload, I’ll have to check that, but I don’t think it was running through the CPU because the CPU usage would barely change when running iperf through the switch. I assume that with the small CPU on the switch, if the bridge was going through it, 300mbps would use a noticeable amount of CPU power.
I will definitely try this and report back, thank you for the suggestion.
If you look at the CRS310-1G-5S-4S+IN block diagram everything should be done by the 98DX226S SoC with integrated CPU and line rate switch ASIC. That includes tagging/untagging on vlan 20.
There is always the issue of compatibility between different vendors chip sets. For example some people had issues with the 2.5G port on the RB5009, while others did not, based on what device was connected.
Anytime that there is a big rate change, e.g. from 10G to 1G a 1G output port can become a bottleneck.
If the tagging/untagging of vlan 20 is being done by the switch ASIC, CPU usage should be low, and that’s what I understood you to say.
In the OP you stated “if I run iperf between the proxmox server and the windows pc, I’m maxing out at around 300mbit/s, although it jumps around like crazy.”
To cover all the configuration bases, what SFP+ modules or DAC cables are being used?
I reset the switch again and tried the slightly modified configuration suggested, this time I was getting around 600mbit/s instead of 300, on a 10G link that’s not much better.
For the connections, I don’t have a DAC cable that I can try, but I have tried several combinations of fiber cables with FS SFP-10GSR-85 and CISCO SFP-10G-SR modules. I’ve tried mixing them in different ways, matching them on the same cable (FS and FS on each end of the same cable), and going all CISCO. Unfortunately I only have three of the FS modules, so I can’t test all FS, but with the 3 FS modules and 5 CISCO modules I have, I’ve tried every combination of those. I also tried using one of the 10G RJ45 NICs on the proxmox server and using a CAT6A cable with a MikroTik S+RJ10 transceiver. All combinations had roughly the same results.
I think I have enough parts laying around to build another machine that I can throw another NIC into for more testing, I’ll try to do that this weekend.
I tried resetting it yet again, not touching the configuration at all, and manually setting up untagged IPs on the proxmox server and windows pc, so no vlans involved at all. With this setup I’m able to get about 2-2.5gbit/s with iperf, which is still way below what I’d expect, but I mean the connectx-3s aren’t exactly brand new, so maybe they’re doing something funny. Either way that means that introducing VLANs is what’s causing the slowdown.
What happens if you switch it into UDP mode? That lets you dial in a target bandwidth, -u -b. Without TCP’s retransmission help, it will fall apart and start losing packets at some point; where?
Thank you for the suggestion, I will come back to this at some point and try this out. For now I’ve bought a used Brocade switch that was easy to setup and simply worked the first time. It does use a lot more power than the MikroTik, so eventually, when I’ve had a break from troubleshooting the MikroTik for a while, I’ll come back to it.