Dears!
Twice a year, our club hosts an event for pc gaming enthusiasts. We provide the network infrastructure for 160-200 users for a whole weekend.
In our setup, we use a bunch of CSS326 with SwOS as the access switches, two CRS326-24S+2Q+RM as aggregation switches (with SwOS for our latest event, now with ROS) and a stacked Aruba 2930F as a core switch.
We use VLAN network segmentation for our different services (switch management, server, wifi access points, devices, users, ...). The networks are routed through the Aruba switch. However, due to the nature of our event, we decided to put the users in a VLAN together with a /23 netmask. This has the advantage that the users will "see" each other in the server lists of their games and can host their own games in the LAN because they are all part of the same broadcast domain. That means that the user's VLAN (VLAN 11) has a lot of broadcasts.
The CSS326 are configured with all copper ports untagged in the user's VLAN (VLAN 11). The two SFP+ ports per switch are LACP bonded and provide the uplink to the CRS326 SFP+ aggregation switches which have a 2x10 GBit/s LACP bonded uplink to the Aruba switch as well.
With SwOS software versions 2.4 - 2.13 we experienced a serious network outage as soon as a few users were on the network. The Aruba shut down the uplink interfaces due to broadcast storm. The only way to restore the network was to cut the LACP bonding by disabling one of the interfaces on every switch and running the uplink through only one cable.
We experimented with Flow Control on/off on the CSS switches, but got no different results. Since the default configuration is to have the TX Flow Control on on the CSS switches, we ended up leaving it on. As long as the LACP bonding was up and a few users were using the network, all of the switches stopped working eventually. We double checked all of the cables but found no error.
We discovered that downgrading the SwOS versions to version 2.3 allowed us to use the LACP bonding uplinks again. With this software version, we were able to utilize the switches to their full potential without any further issues.
There seems to be a software bug introduced in SwOS with version 2.4 which is still present in the later versions and even in the current version 2.13. We believe that there is a broadcast storm happening as soon as LACP bonding is used.
Can you give us information on what the problem with the newer software versions could be? We are happy to provide further diagnosis and investigation to help fix the issue, if you could tell us what data you need.
Cheers,
the IT team of SaarLAN e.V.