We have a layer 2 private network supplied by our ISP. Each site is connected to this network via an ADSL modem, which is a layer 2 bridge device. No configuration / routing required, just each router behind the modem is connected to the ‘bridged’ network.
We have three sites in this network, each that I am connecting to the network via a Mikrotik RB750.
I have configured OSPF on the devices for the bridge network subnet and each branch subnet. OSPF works perfectly, with the neighbours discovering, the OSPF routes being deployed, and operation is perfect.
Now, the spin. I have two of these ADSL modem connections at each site that we would like to use for redundancy. I have configured an additional IP on the secondary link at each site within the OSPF network, and although OSPF takes everything correctly, and I see the OSPF routes updating as required, the local routers are obviously not happy with two ethernet ports in the one subnet, and the local routing table does not update to reflect what OSPF is seeing in the event one of the link goes down.
I have created a different subnet for the secondary links within the OSPF network, and that works fine, however if the second link is down at one site, and the first link is down at another site, then those two sites cannot communicate due to the different subnets, which is not the final configuration I was after.
I have attached a diagram of what we have, and what I basically wanted to achieve. I was hoping someone could advise me of a way to do this, or to modify my current OSPF configuration to suit.
If I understand whats going on correctly (the middle piece is your provider, who is transparently bridging together all of your DSL links at layer 2, and they are providing no layer-3 service in that bridge? the 192.168.0.0/24 is purely your construction? And your OSPF is only between your routerboards and nothing in the center ring is running OSPF?)
One way you could do it is to bridge the ether1 and ether2 of each router together (make sure you enable stp or rstp), then apply an IP address to the bridge (and not to the underlying ethernets), and run ospf on the bridge. And failover will occur at layer-2 rather than layer-3. This may not be the way to go if you want to utilize the full bandwidth when both links are up (assuming there aren’t bandwidth limitations in the center bridge).
Another way is you could create vlans and simulate point-to-point or point-to-multipoint links. If you do it right you should be able to maintain optimal load balancing should any combination of the DSL links go down (assuming all your DSL links are the same capacity).
That is correct. They are doing no layer 3 for us, and the 192.168.0.0/24 is purely a subnet I created on those interfaces. OSPF is only running on the RB750’s with no further OSPF running anywhere else..
Load balancing is not a requirement if it cannot happen, automatic fail-over would be the best scenario. If some traffic could be pushed via the secondary DSL link, and some via the primary (based on marking), that would be ideal, but not an exact requirement.
The DSL links are not all the same speeds, with higher quality links as primary, and slower links as backups.
In your VLAN suggestion with point to point, or point to multipoint, I would need to create individual links to each router, correct? I was looking forward to future wans / upgrades that will have a large number of sites, and thought in this configuration it would be a much messier / require a lot of work to add another site.
ospf can load-balance if the costs are the same, but since this isn’t the case, load-balancing would be a bit more difficult to get set up.
I suggest for now you just try the bridging suggestion and see how it works:
On each routerboard, create a bridge with stp or rstp, put ether1 and ether2 into that bridge (from this point on ignore ether1 and ether2) put 192.168.0.x/24 on that bridge, add the bridge to ospf interfaces, and let layer-2 handle the failing over. Use the priority/path cost settings in the bridge ports to ensure traffic defaults to the faster path. Hopefully, the central bridge is stp or rstp aware, or there may be problems (but it might work anyway as long as its the only piece that isn’t doing stp/rstp, and it passes it through).
You’ll have to read up on the bridge to see what options there are that might help.
If you’re going to be growing this to something much larger, that central bridge may become a nightmare, or it might be fine, depending on what else is going on and what you are doing ultimately. It appears you’re hiding everything behind the routerboards, acting as routers, and only bridging the routerboards themselves together; so it should be able to grow rather large and probably be fine, as long as its up to the bandwidth requirements.
The vlans would become unwieldy with very many more sites, though you could mitigate it somewhat by making virtual islands of sites, but its probably not worth the trouble. (or real islands, if your DSL provider would cooperate).
Also, when you’re testing, failover may take anywhere from a few seconds to upwards of a minute to occur at layer-2 (it should be quite fast if you use rstp rather than stp). If ospf then goes down as a result of that, there will be an additional delay while ospf comes back up. you can play with the ospf timers, but you’ll probably be able to use rstp, which will probably be fast enough for ospf to not notice most of the time.
Actually one thought, how will layer2 handle the fail over, in the event that sync drops on the modem, and the physical connection on that interface to the modem is still connected?
Have you tried this yet? I believe stp/rstp should be able to handle the loss of connection with the ethernets to the dsl modems still being up, but stp may take upwards of 2 minutes to notice/correct.
Another possiblity is to get your layer-2 provider to actually provide you with two bridges, an A bridge, and a B bridge. Then at every site one of your DSL modems and corresponding router interface are in the A bridge, and the other dsl modem and ethernet port are in the B bridge, and you use 192.168.0.x/24 for all interfaces in the A bridge, and 192.168.1.x/24 for all interfaces in the B bridge. In this case you would NOT bridge the interfaces together in the routerboards, and you would rely on OSPF (possibly in conjuction with BFD if you want it to be transparent) to handle failover, and there won’t be any stp/rstp issues. Make sure the OSPF costs reflect the actual bandwidth of your interfaces. If you use nbma, you can specify a different cost for each other remote site, to accurately reflect the end-to-end bandwidth to each specific site.
I should have thought of this before, but I’m not used to thinking in Layer-2.
Logically, you would have every site connected to every other by two separate redundant networks, and it simplfies the configuration. (and it would be nice if you could get your layer-2 provider to actually provide the bridges on separate equipment on their end, but that depends on how big they are (if they even have the capability) and on getting the phone company (presumed DSL last mile provider) to create the PVCs appropriately). Or you could get two separate layer-2 providers.
Thanks for your assistance. We are going to use the two subnets, one on each link, without breaking up the large ‘bridge’. This will allow everything to function correctly, but the only downside is in the event of a failure on WAN1 at one branch and WAN2 at another, there will be an extra hop to get where it needs.
I suspect this would be a rare scenario, and if it does it still allows communication regardless.
Thanks for your help.
One other question, do you know if you give WAN1 a lower cost than WAN2, and the routing table updates to suit so WAN1 has priority, if you can still manually mark and route traffic over WAN2 while OSPF is in place?
I’m not an expert on this, but you could create a /30 network between each router on each port. It increases your route table size and complexity, but it would provide for each router being able to use either link to reach any router on either link.
They would be load balanced as all costs would be equal, but to prefer the bigger bandwidth link, using the smaller one as fail-over, you’d just assign a greater cost to the slower link.
Hi, sorry about late reply. Since you still have the one bridge in the middle, you may want to create an “A” VLAN and a “B” VLAN for the two subnets, which will eliminate broadcast processing on the “wrong” subnet (assuming the bridge in the middle will carry VLANs).
Also, you could easily move the VLAN to the other WAN interface as a temporary workaround (without having to reconfigure anything else) should one of the WANs go down, to eliminate the extra hop in your one failure scenario.
You should be able to do the manual mark/route, it may have problems if the link its depending on fails.
In any case, I assume you’ve done something by now, how’s it working?
p.s. Some OSPF implementations can do TOS based routing, but I don’t think RouterOS’s is one of them.
p.p.s. RE: /30s: Creating /30s between all the routers/ports is one alternative (as briefly discussed earlier, but we didn’t get into the details).