load balancing 3 lines in 2.28

I have 3 T-1 lines. Each with a class /27 (30 IPs) on each.

I have a gateway address for each of the T-1s.

All 90 IPs can be routed down any of the 3 T-1 lines.

All gateways have different IP addresses. In my case all three gateways have the same MAC address as well.

All three T-1 lines go through cisco 2600 routers, then go to a cisco switch, then to the MT router.

The MT has two interface cards, one for the internal LAN and one for the Internet IP space. On the Internet side the NIC card has three public IPs that point to the three T-1 lines. Internal NIC (LAN) has a single IP address.

Works well - all servers that are NAT’d (both source and destination) seem to have no difficulty getting to the Internet, users have no problems getting to the servers from the Internet…

Masq or src NAT’d only - this is where this setup runs into trouble. I have source Masq’d (past) and now src NAT’s about 50 stations behind the MT and have src NAT’d them to a 10 IP block range, i.e. xxx.xxx.xxx.10 - xxx.xxx.xxx.20.

This will work as long as the loading on the three T-1 lines is relatively low (30% or less), at higher say 40% capacity, the src NAT’d only stations seem to hang, or take a VERY long time to load web pages…

I can go to any of the fully NAT’d stations and this does not happen - though intermittantly I will get a momentary ‘hang’, very momentary as in a second or two, and only at T-1 loading above 75% (I have to make the lines push ths much data using a bandwidth tester) and then it won’t ‘hang’ again - this I expect is from the DNS taking a moment to get the request in and out before actually going to the IP address because of the loading of all T-1 lines…

CPU is a 2.4Ghz, 512MB memory, 40GB HD.

I see CPU loading of the MT at max of 25%, normally the MT ‘rides’ at about 3% with peaks from 6-8% from time to time.

I have run external mtr (Linux Multi Trace Route) program and get less than 1% packet loss at ANY load ratio on the T-1s, and on the three addresses on the MT router Internet NIC.

So - any of you smart folks out there have any ideas 'cause I am fresh out…

Thom

Are all the T1 lines with the same company?
If the are, have you talked to them about bonding the T1’s so you only have to do bandwidth throttleing with the MT.

How about bypassing those cisco’s and terminating the T1’s right into the MT?

It may not be a solution but would definately make things more compact and fast. 2600s are slow as molasses IMO.

Sam

I forgot to add for you sticklers on the manual… I have RTFM. I have been using MT for about two years and only recently went to three lines (Apr 2005) which appeared to work very well on the surface at first…

All the T-1s are from the same company… Bonding is not an option at this time for these lines.

Also I don’t believe that terminating the lines at the MT will resolve the issue either - it would makes ‘things’ neater though in the server room…

I have used routing policy rules to help effect a some order to what gets routed where as well as bandwith queing. These are stop gap at best and don’t allow the full use of the available bandwidth.

I have seen mention of this working for other folks, but I have never seen a clear definitive answer - only I solved it, or thanks to “WHO” I got it working right, etc…so if someone actually does have the answer how about posting it or send it to me via email - thom@airwhidbey.com

If you want suggestions on how to clean up the firewall chains and bandwidth queueing to make it more efficient then post some configs for review. Without seeing how you have it setup it’s a shot in the dark to try to help determine the cause of the problem.

"I have source Masq’d (past) and now src NAT’s about 50 stations behind the MT and have src NAT’d them to a 10 IP block range, i.e. xxx.xxx.xxx.10 - xxx.xxx.xxx.20. " ← do you have 1-1 nat or other? If you only have 10 ips and 50 users fighting for 1-1 it might be part of the problem.

Sam

No it’s not a 1-1 NAT. it is strictly a SRC NAT with a range of 10 IPs to choose from. I have the same issue when they are Masq’d.

As I said earlier this setup worked very well with two lines but with the third line the ‘round-robin’ outbound gateway is causing loading issues more on one line than the others. As in my outbound requests appear to be getting lost and/or returning connections can’t ‘find’ the initial connection…

Config is simple Internal 192.168.0.0/16
Internet one NIC - xxx.xxx.146.98/24 gw xxx.xxx.146.1
xxx.xxx.94.18/30 gw xxx.xxx.94.17
xxx.xxx.80.110/30 gw xxx.xxx.80.109

NAT - I have several 1-1 NATS (both SRC and DST) for servers behind MT - there appears to be no problem there…

SRC NAT only (or Masq’d) 192.168.2.XXX to xxx.xxx.80.20 - .30 This is where the problem is…

Firewall - simple set only a few banned IP blocks and ports - same ones I have had for two years…

I can post a config if the above is insufficent…

Thom

galaxynet, I think that policy routing won’t help you too much as you must very carefully separate different kinds of traffic and send them thru three lines. I can’t imagine situation how someone could excellently and perfectly estimate that traffic and utilize three lines on their maximum without either overloading them/some of them or underloading them/some of them.

Question : did you try to use multiple default gateways (ECMP) ? I have never tried this personally, but the time is slowly coming. If you do something like /ip route, add gateway=first-gateway,second-gateway,third-gateway, mikrotik should send your traffic round-robin balanced thru these lines equally. I don’t know ho good this is / should be for you, at least you can try it. And probably report to us :slight_smile:

thnx, mp3turbo.

If you want “packet perfect” load balancing you might want to replace the 2600’s with a bigger unit with three serial interface cards (assumed) and let the cisco do the load balancing and leave shaping/nat to the routeros.
RouterOS is based on linux, afaik fastforwarding and forwarding is the same on linux which does not leave you with packet granularity when balancing using routing. Oh and remember that your latency will never be better than a single T1 while a true 4,5 mbit connection has the latency of that 4,5 mbit connection. Which will probably make it extremely difficult to reach efficiency above 75% (especially on a fastforwarded link!)

Thanks guys - I see now that I ‘can’t get there from here’ so I will have to work with telco provider to change how I route data in/out. This will be painful…

I understand about the latency - at the rate we’re growing here we’ll be up to a T-3 in 6 months so that would probably resolve the issue as far as seperate lines anyways - but we have to get to that point first!