I'm trying to configure a CloudCore Router with 6.38.1 SW to act as a Gateway for a OpenContrail cloud SDN in a lab environment, where buying a supported router is not feasible.
Contrail uses MPLS over GRE for encapsulating tenant traffic.
To create a gateway between physical networks and SDN virtual networks, a router is provided with VPN4 routes via BGP.
It then directly transmits traffic to the "vrouters" - routing instances running on each compute host - using the same encapsulation.
I've managed to replicate the required settings but for one thing: The correct remote address for GRE links.
If I only configure the BGP/VPN4-half of things, I get promising packets on the compute node when pinging a VM from an external computer:
- Ethernet -> MPLS -> IPv4 src: external IP, dst: VM IP -> ICMP
- Ethernet -> IPv4 src: Compute2, dst: Compute1 -> GRE -> MPLS -> IPv4 -> ICMP
When following the manual on GRE tunnels, you have to provide 2 seperate sets of IPs:
- those for the underlying, routed network, and
- those for within the tunnel.
Hence I added a secondary IP on the same interface on the compute node, and set this as the GRE tunnel target.
I moved the router's interface that connects to the cloud's data network to a seperate vrf so there is no longer a direct route to the compute node,
and statically added routes for the controllers that communicate via BGP and the secondary IP in the main routing table.
While the protocol stack does now fit my requirements
- Ethernet -> IPv4 src: Router, dst: Compute1secondary -> GRE -> MPLS -> IPv4 -> ICMP
While I could try to fix this by employing a NAT between the router and the Computenodes, it seems to be a rather hacky solution, especially since I cannot do a destination NAT for output packets on the same router (since they never get to the prerouted chain), requiring an additional router just for NAT (you cannot use iptables on Contrail compute nodes).
Another rather inelegant solution would be to MitM the BGP requests and change the VPN4 Gateways.
I therefore went ahead and used the same IP for both tunnel remote end and tunnel traffic route.
Understandably, the tunnel begins to flicker active/inactive, since once the tunnel becomes active, the GRE packets will be routed through the tunnel, which is obviously not possible.
To correctly route the GRE packets, I try to employ policy based routing. Since the packets are generated by the router, they belong to the Output chain, and are not prerouted (according to the manual on packet flow).
I create a route in the output chain that checks for GRE packets and adds a routing marker for the VRF where I've put the interface connecting to the cloud infrastructure.
In my understanding, this should:
- send GRE packets to the host directly, since the only entry in the infrastructure VRF is the dynamically created one targeting the interface and matches the nodes IP
- send every other packet into the GRE tunnel, since they won't be tagged, and the main routing table contains a static route pointing at the tunnel
- Ethernet -> IPv4 src: Router, dst: Compute1 -> GRE -> IP src: Router dst: broadcast -> UDP 5678
After the MNDP packet is sent, the tunnel goes down again, so I cannot test if the MPLS payloads work.
Code: Select all
Dst. Address Gateway Distance Routing Mark Pref. Source
AS 0.0.0.0/0 10.1.2.1 reachable ExternalIF 1 vm_ext
AS 0.0.0.0/0 10.1.0.1 reachable AdminIF 1
DAC 10.1.0.0/24 AdminIF reachable 0 10.1.0.250
DAb 10.1.1.13/32 10.1.4.10 reachable GRECompute24 200 vm_ext
DAb 10.1.1.14/32 10.1.4.11 recursive AdminIF 200 vm_ext
DAC 10.1.2.0/24 ExternalIF reachable 0 vm_ext 10.1.2.250
DAC 10.1.4.0/24 CloudFabricIF reachable 0 fabric 10.1.4.250
AS 10.1.4.1/32 CloudFabricIF reachable 1
AS 10.1.4.2/32 CloudFabricIF reachable 1
AS 10.1.4.3/32 CloudFabricIF reachable 1
DAC 10.1.4.10/32 GRECompute24 reachable 0 10.1.4.250
10.1.0.0/24 being the admin network
10.1.1.0/24 being the vm network
10.1.2.0/24 being the external network that is supposed to have connectivity to the vm network
10.1.4.0/24 being the cloud infrastructure network, where the encapsulated traffic is transported
10.1.4.1-3 being the controller nodes acting as BGP peers,
10.1.4.10 being the compute node where the VM 10.1.1.13 runs
10.1.4.11 being the compute node where the VM 10.1.1.14 runs (no GRE configured yet)
10.1.x.250 each being the CloudCore router
Is there a way to accomplish this? I've thought about setting the route for the ComputeNode to a bridge-loopback IP, hoping to be able to do the distinction in a prerouting chain and circumventing the reachable tests, but the route simply gets marked "recursive" and no connection is available at all.