I have a MikroTIk that serves a wireless user community with static private addresses on the LAN side. I use PCQ queue trees and mangle to separate and throttle traffic into system (unlimited), standard (2MB/0.5MB) and premium (3MB/0.5MB) classes depending on each user’s contract, like so:
One of my users is hearing-impaired and uses a Sorenson videophone. One of these phones calls another by contacting specific ports at the recipient’s public IP address. Since my users don’t have public IP addresses, and since this person is the only subscriber with such a requirement, I have accommodated the device by arranging dst-nat rules such that any contact that comes into my router’s public address on any of those uncommon ports will be forwarded to her private address, like so:
[/size]
This all works fine, too, as far as the device being able to run properly.
What doesn’t work fine is that this forwarding is apparently bypassing the traffic limitations in the queue. This device is running at full system speed.
What rules to I have to add to get this traffic back into the queueing mechanism?
Destination NAT happens before the forward chain, at the end of prerouting. You packet mark in the forward chain, so by the time you’re marking the packet has the private IP address you’re basing your marks on. For return traffic, destination NAT is undone after the forward chain, at the end of postrouting, so the packet still has the private IP address you are basing your marks on when you do the marking.
Now, that doesn’t rule out that there are other issues with your queuing stategy (you’re only showing download, and we don’t know the content of your address lists and that customer’s IP address), but it rules out NAT as the cause of things not working.
That’s bad news. I spent some time last week figuring out why upload limitation was failing on this particular router (the upload queue tree was accidentally attached to the wrong interface), and by the time I left, all my experimentation showed the three classes of address were having their bandwidth limited just right. But yesterday at this customer’s place it was obvious that way too much bandwidth was available in both directions. RIght now, the address list on this router has only one premium user in it, and it isn’t this customer. The only thing special about this customer was the forwarding rules, so I hoped I had found the problem.
The way I see it, you have a couple of choices, modify the rule above to remove the connection-sate=new, this will potentially increase the CPU load since now it has to check every packet against that rule instead of just new connections. Another choice is add in another mark connection rule, but instead of src-address-list=prem-subscribers, replace that with dst-address-list=prem-subscribers.
This is a very good proposal. However, upon reviewing all my mangle rules (I didn’t provide them all initially) it seems to me that this angle may already happen to be covered:
0 ;;; System connection
chain=forward action=mark-connection new-connection-mark=CM-admin passthrough=yes
connection-state=new src-address=192.168.64.0/26
1 ;;; System packet
chain=forward action=mark-packet new-packet-mark=PM-admin passthrough=no
connection-mark=CM-admin
2 ;;; Premium connection mark
chain=forward action=mark-connection new-connection-mark=CM-prem passthrough=yes
connection-state=new src-address-list=prem-subscribers
3 ;;; Premium parket mark
chain=forward action=mark-packet new-packet-mark=PM-prem passthrough=no
connection-mark=CM-prem
4 ;;; Basic connection mark
chain=forward action=mark-connection new-connection-mark=CM-std passthrough=yes
connection-state=new
5 ;;; Basic parket mark
chain=forward action=mark-packet new-packet-mark=PM-std passthrough=no
connection-mark=CM-std
[/size]
The rules are cascaded so that anything that isn’t a special connection defaults to a basic connection. So this user’s service should be subject to the basic connection bandwidth rules regardless of whether the connection originates internally or externally, shouldn’t it? Or is there something I am misreading?
That last rule should cover it, so not sure why you’re seeing that behavior. Maybe try moving it up higher in the list to see if it starts to work, or bunch your 3 connection mark rules together to rule out one of the packet mark rules getting in the way maybe. Is that the extent of the mangle rules? If you have more, especially mark connection rules, there might be a chance that the connection mark is getting overwritten.
One thing you can do to determine if the customers connection is getting marked properly is check the connection tracking table and check for entries with the customer address and what connection mark they are or aren’t getting.
Are you just routing, or are you running other services on the router like Hotspot?
I’m just routing, no hotspots. That’s the entirety of my mangle rules.
I took your advice and enabled tracking on that address, and it looks like I will have to wait until someone phones into that customer because there’s nothing showing right now. (I’m really not that familiar with Winbox, since I’m a Mac user and usually just use the router’s CLI.)
I tried manually telnetting into that port from a laptop in my central site, and the connection showed up with the appropriate connection mark on it.
I just realized I’ve been barking up entirely the wrong tree. The speed problem has nothing to do with this special device, because it wasn’t measured on that device – I measured it with my own laptop, from the customer site, using a commercial speed test website (speakeasy.net). So there is something very basic wrong here.
Here are all the queue types and trees (except for the default Mikrotik ones) I have set up. These ought to work, right? This user isn’t just getting premium service, she’s getting system level (unlimited) service, and yet her address is nowhere in the reserved system range.
Set values for max-limit and limit-at. That is required for priorities, which you also attempt to be using, and at least one must be set for PCQ to work. Otherwise, how does the router how to shape people down when total bandwidth is being exceeded? If you don’t have any real values to set, set all queues to your maximum WAN bandwidth for max-limit, and half of it for limit-at.
Thanks, I did this. That takes care of the fairly-shared degraded bandwidth issue, which I’m certain I was having, but is nowhere near as visible and so wasn’t at the forefront of my list of issues today. Limit-at and max-limit don’t seem to aid me in throttling any single connection below an artificial ceiling.
Max-Limit is the maximum that queue can consume at any given time. Total, this should be approximately 90% of your total available bandwidth. You can try playing around with other values, but if you set it too high the QoS scheme never kicks in and and doesn’t work. This is because once an internet link reaches 85-90% of capacity things start to break down on that link because it is saturated.
Limit-at is supposed to be the guaranteed rate you want to offer that queue when a link is saturated. This isn’t strictly needed, but it does help to define it.
I believe we have determined the reason for this failure.
In this neighborhood, because of terrain problems that block our central AP from many users, we installed a fast and dirty relay tower currently composed of one Engenius bridge and another Engenius AP, run through a dumb switch. In order to evade the “L2 bridging” problem, these units apparently perform a trick whereby they replace the customer’s IP with their own when speaking to the central AP. While this makes the traffic magically “work,” this would entirely defeat bandwidth queuing limitations since the traffic now looks like it is coming from an unlimited administrative radio. This hypothesis is further supported by noticing that all the customers where bandwidth limitation has been failing are customers using this relay AP instead of the central AP.
The solution involves completely reorganizing our IP layout, at which point we can replace our dumb relays with MikroTiks in Stationboxes and do things properly. Thanks to everyone who helped.
Before you run off and re-ip your whole system, I think I might see part of the problem. You are applying PCQ to the interface queue on the same router that is performing NAT. The interface queue occurs after NAT has been performed, so the PCQ filter sees all upload traffic as coming from a single IP (your public NAT IP), and therefore can’t restrict it properly.
The fix for this is to apply your queue trees to Global-out. The problem with this is that both directions pass through global-out, so you need to differentiate between upload and download data on your own. You will need to double up your packet marking rules and create one to mark the upload packets in a connection, and another to mark the download packets.
This should also take care of your issue with DNAT traffic not being picked up properly.
Let us know if that clears up some of your issues and if there is anything that still doesn’t work quite right after making the changes.
An intriguing speculation, but I don’t see how that jibes with the observation that the queueing limits are working on subscribers who aren’t using the relay boxes, and failing only on those who are. But I’ll certainly keep it in mind if the other work fails to produce the correct results.
The whole system has to be re-IPed for entirely separate reasons, so no great loss there.