GRE tunnel and L3 hardware offloading feature on CRS317-1G-16S+

We think about using a third-party DDoS prevention service that would forward us our scrubbed inbound traffic via a GRE tunnel on a 10 Gbps port. No encryption, just plain GRE encapsulation. And so now we’re looking for a low-cost option that would allow us to terminate this GRE tunnel and achieve (close to) 10 Gbps line speed.

Now with RouterOS 7 the CRS317-1G-16S+ supports L3 hardware offloading and so can achieve close to line speed for simple routing. But what happens in case of a GRE tunnel? Can the L3 hardware offload also “unwrap” the GRE packets in hardware? Or would the CPU need to do this and then the hardware offloading feature cannot be used anymore as everything would have to be done in software (with a severe performance penalty then, so only throughputs achievable that will be far below line speed)?

I seem to be unable to find an answer to this question in the RouterOS documentation. While there’s a list of features that are known to work, partially work or not work with the L3 hardware offloading, GRE tunnels seem not to be mentioned. The Marvell Prestera datasheets do mention something about a forwarding engine that can handle also GRE packets but I don’t know if this is applicable or if Mikrotik’s implementation takes advantage of it.

So can the CRS317-1G-16S+ terminate a GRE tunnel in hardware? And if not, is there any other similarly inexpensive switch model that can do it (at 10 Gbps port speed)?

Thanks a lot!

According to the Marvell Prestera documentation, the switch chip of CRS317-1G-16S+ supports GRE tunneling. Unfortunately, the feature is not implemented in RouterOS 7 yet, meaning that tunneled traffic gets processed by the CPU (slowly). I have created a ticket to investigate and evaluate L3HW offloading of GRE tunnels. However, I wouldn’t expect this feature in near future.

Thanks for your reply. That’s a pity because it would be a very useful feature - especially when the hardware is actually capable of doing it. So hope you’ll be able to add support for it soon.

I’m almost certain that GRE tunnel offloading will be implemented. However, I wouldn’t expect it soon. First, we need to stabilize RouterOS v7.1. And the next big feature is IPv6 hardware routing support. Only then can we evaluate GRE L3HW support and put it on the roadmap.

+=10000000000

And fast-track for IPv6 I hope …

GRE offloading would be great, but I find the real deal would be EoIP offloading since you can bridge it to easily deploy PTP links (or L2TP if the other end isn’t MikroTik).

VPLS would be ideal, but it has many limitations (in the form of expecting every node in between to not only be participating in MPLS but be able to encapsulate into the L2MTU zone).
EoIP ends up being way more practical while also being way easier to configure and troubleshoot.

Has it been implemented since?

Not yet

Are there any tunneling options available for CRS317-1G-16S+ that can be hardware offloaded? I have a similar need. VXLAN, EoIP, whatever…

checking in on this. Obviously not implemented as I don’t see anything in change logs. Is this on the distane future todo list or something a little bit higher priority?

Unfortunately, I cannot give you any estimates at this moment.

The current main focus is on the core L3HW stabilization and QoS Hardware Offloading (the same team is working on both projects). After that, we will reevaluate the priorities.

Thanks for the update Raimonds.

I know there’s a lot of excitement around the potential for an EVPN service and hardware accellerated vxlan is a bit of a prerequisite there. not the GRE/GRE6 requested in this thread but I imagine some of the same groundwork needs laid to enabled hardware GRE and hardware vxlan. In the datacenter, vxlan is for sure the top of the list for hardware offload, probably top of the list of any ‘whats next’ option in that context.

I’m a bit mixed on which I’d prefer more honestly. GRE/GRE6 is simpler to build basic pipes with but vxlan seems lighter. Right now I can push quite a bit more over vxlan on hap ax2 hardware, 2-3x as much pretty easily. vxlan and I can do ~650x650Mbps white running btest onboard vs no more than half that on GRE.

This really highlights my case though. hap ax2 is a pretty nice piece of hardware with a fast chip. Software encapsulation just doesn’t cut it for modern service providers from *ISP to MSP/IT companies needing tunneled endpoints.

I’m a multi-tech operator and we also have a business class fiber deployment. I would really love hardware vxlan (on ipv6..).

Again, for me this is at the core of an ipv6 native network design. I’m getting into the weeds here, but one major problem with fiber deployments is that they tend to be long runs with many miles of threat for pole downs or backhoe hits. We want to do wirespeed ipv6 and instead of xgspon splits or ethernet home runs we would route which would allow us to push those packest down multiple paths and even wireless links without the pitfalls of rstp. Still need it to look like a layer2 service so hw offload vxlan is the key or in a pinch, hardware GRE6. SPB would also be really welcome but I’ve already veered too far off topic here.

thanks.

how much Throughput can be feasible to achieve with a CHR with a high Clock Rate (5GHZ) 8 core CPU dedicated for this task?

well, the request here is hardware offload which would mean multi-gig speeds on supported hardware.

CHR would always be software. I’ve pushed a few gigs on CHR over GRE on a 3Ghz 9th gen intel. You just have to keep in mind that each GRE tunnel is going to get stuck in 1 CPU so more cores doesn’t really help with single tunnel throughput.

Funny thing is that GRE endpoints on ‘plain’ linux are very fast. 10 gigs between two low end dell optiplex with i710 cards is easy.

Cambium’s cnwave e2e controller terminates GRE6 traffic from their nodes and can do 15Gbps or so on a 10th gen 2.4Ghz i5-7xxx in a VM with maybe 40% CPU load.

I’m here because I’m not able to get anywhere near 1G on hap ax2 devices on the IPQ6010 quad core CPU.

rb5009 to rb5009 can push about 2.5Gbps aggregate between a pair on a 10G port.
vxlan ipv6 signaled can pass about 2.7Gbps ‘FDX’ / 4.5Gbps aggregate on the same hardware.
8.9Gbps on the same test ‘direct’.

so we do have a ~4Gbps capable (reliably) option in vxlan but GRE is half that on the same hardware.