Bridge -> Bond -> 2x Ethernet MTU Setting?

trying to setup a Bond with increased MTU.

The reason behind the MTU change is just to reduce the load on my poor CPUs on the Proxmox Server. On Proxmox I am using OVS for bridge and Bond.

I would like to set my MTU to 9000 as recommended by redhat for Ceph Storage.

# model = CRS310-8G+2S+
/interface bridge
add name=bridge1 vlan-filtering=yes
/interface ethernet
set [ find default-name=ether1 ] l2mtu=9092 mtu=3000
set [ find default-name=ether2 ] l2mtu=9092 mtu=3000
set [ find default-name=ether3 ] l2mtu=9092 mtu=3000
set [ find default-name=ether4 ] l2mtu=9092 mtu=3000
set [ find default-name=ether5 ] l2mtu=9092 mtu=3000
set [ find default-name=ether6 ] advertise=10M-baseT-half,10M-baseT-full,100M-baseT-half,1G-baseT-full,2.5G-baseT \
    l2mtu=9092 mtu=3000
/interface vlan
add interface=bridge1 name=vlan-10 vlan-id=10
add interface=bridge1 name=vlan-20 vlan-id=20
add interface=bridge1 name=vlan-99 vlan-id=99
/interface bonding
add mode=802.3ad mtu=3000 name=bonding10 slaves=ether1,ether2 transmit-hash-policy=layer-3-and-4
add mode=802.3ad mtu=3000 name=bonding11 slaves=ether3,ether4 transmit-hash-policy=layer-3-and-4
add mode=802.3ad mtu=3000 name=bonding12 slaves=ether5,ether6 transmit-hash-policy=layer-3-and-4
/interface list
add name=webfig
/interface bridge port
add bridge=bridge1 interface=ether8 pvid=99
add bridge=bridge1 interface=ether7 pvid=99
add bridge=bridge1 interface=bonding10 pvid=99
add bridge=bridge1 interface=bonding11 pvid=99
add bridge=bridge1 interface=bonding12 pvid=99

MTU is L3 setting … which means at least these two things:

  1. switches (as L2 entities) don’t have much to do with it, they just have to be able to pass those jumbo frames (L2MTU has to be at least MTU+ethernet overhead+VLAN overhead isf used)
  2. whole IP subnet has to use same MTU … all devices and router. It’s OK if router doesn’t do fragmentation, but ICMP packets have to be able to get back to sender of jumbo frames. And router has to be able to geberate those, I’m not sure if L3HW offloaded routes can do it (if my fears are real, then CRS is not a feasible router).

So using anything but industry standard MTU of 1500 bytes can be a real PITA.
If using standard MTU overloads “your poor CPUs on the Proxmox Server”, then you won’t notice the difference anyway :wink:

Please clarify what is the issue you encounter - I can see you have set the (L3) MTU to 3000 on all your Ethernet interfaces as well as the bond itself. Do you get an error when you try to set 9000, or do the packets not actually pass through when you set 9000? The L3 MTU must fit into the L2 MTU less the basic Ethernet header less the VLAN tags; the max-L2-mtu of CRS310-8G+2S+ is 10218 bytes so L3 MTU of 9000 bytes fits with a generous margin.

Plus if you dedicate a VLAN for the “backend” Ceph traffic (the data synchronisation between physical storages), only the Ceph machines themselves need to send IP traffic via that VLAN, so the L3 MTU on the switch is irrelevant for that VLAN, it is only necessary to set the L2 MTU on the ports to 9018 if I count properly (L3 MTU 9000 + 14 byte basic Ethernet header + 4 byte single VLAN tag). And to avoid the PITA that @mkx has mentioned, you can set the (L3) MTU to the conservative 1500 for all the other VLANs.

Other than that, Ceph also recommends that you use a dedicated physical interface for the “backend” traffic where data is synchronized between storages, so that it wouldn’t have to compete for bandwitdh with the general networking traffic of the VMs, and if the storage is not colocated with the VMs, it is even better if the “frontend” Ceph traffic (the VMs access to the virtual disks) is also physically separated from the general network traffic of the VMs. So in my case, each physical host has 4 Ethernets, two of which are used to connect it to the switches and the two remaining ones are used to set up a ring of 3 members where IP routing is used for failover rather than any L2 mechanism, because the goal is that the direct path between any pair of hosts was used if it is available, whereas neither OVS nor Linux bridge support MSTP or any flavor of mesh, and to use bonding, MLAG would have to work flawlessly plus some other limitations would kick in.

I struggle with the MTU Calculation.

I got it working setting L2MTU to 9200 and actual MTU is 9000.

So I don’t have to pay attention to the bonding interface in terms of calculating L2MTU? If so i would set it (ethernet interfaces) to 9018 right?

Bonding does not add any headers to the frames, it is just a dispatcher that chooses which physical path to use for a particular physical frame, based on the contents of the existing headers. So the L2MTU is the same like the one of its physical member interfaces (all of which must be identical). I.e. L2MTU 9018 should indeed be enough for VLAN frames carrying 9000 byte IP packets to pass through.

Thanks to you guys its working without problems.

What keeps me wondering is the actual MTU doesn’t seem to have an infulence in terms of stopping everything.
Let me explain: As I was playing around I set the Actual MTU to 9000 and my servers where configured similarly. After everything was working I couldn’t resist trying to brake everything. So MTU got configured to 3000 on the Servers and Actual MTU set to 2000 but it worked flawless (L2MTU was ofcourse 9018). Why did’t it break my ping command ‘‘ping -M do -s 2972 10.0.10.1’’?

Sorry @:fox:, non capisco.

First, I cannot see how do you set actual-mtu as to me, it is a read-only value.
Second, the MTU is basically an informative parameter, the interface informs the IP stack that it has to create packets smaller than that (and eventually transform it into an information for the remote conversation peer on application protokol level, e.g. into TCP MSS). So if the actual MTU of an interface shows 2000, indeed a ping -M do -s 2972 should respond ping: local error: Message too long, mtu=2000.

To which interface is the IP address attached, to the bonding? And on which interface have you set the 2000 and how exactly?