and here’s normal queue work:
sometimes (I think, some computers are flooding with big number of packets) number of queued packets extremely increases:
during this time, every user that passes the queue has high RTT and packet loss, even with very low traffic
is it known problem of 5.4, were there any fixes regarding PCQ in later versions?..
I’ll preface this by saying I’m far from a PCQ expert, but my initial thought would be your queue is WAY too large. During your trouble times, you have approximately 15k or 18k packets queued up, but you specifically allow for up to 64k packets in the queue before discarding. Technically your queue appears to be working as programmed.
Intuitively, we would think that a bigger buffer is always better. Packet loss is the enemy, so the more data we can queue up, instead of dropping, the better, right? Unfortunately TCP doesn’t work that way.
Large queue lengths have a very adverse effect on latency and overall throughput. TCP has built-in congestion-avoidance mechanisms which are designed with the understanding that all routers along the way simply drop packets when overloaded. If you buffer everything instead of dropping, TCP keeps sending data and never backs off, causing more congestion. in your screenshots you had as much as 2MB worth of data queued up. I’m not sure what your link-speed is, but 2MB in queue for a router seems like a lot.
If you figure you have 2MB worth of packets queued up, and a 10Mbit/s line, then it would take over 1600ms to clear the queue. That means the last packet in that queue has a 1600ms delay added to it before you send it. Now imagine if every router along the way was doing the same thing? Everyone adds another 10-20ms of delay due to excessive queue-length, and pretty soon the delay is enough to cause the TCP packets to time out.
This problem is collectively known as Buffer Bloat. Jim Getty has done a very nice job of documenting the effects of it on his blog: http://gettys.wordpress.com/bufferbloat-faq/
In the systems I have worked with, I typically pick a pcq-limit of around 20-75 (20 works fine for smaller offices / homes, 75 for a lot of “bursty” traffic) then a PCQ-total-limit of PCQ-limitmax users80% (queue can hold 80% of the maximum expected users all at max PCQ-limit). I usually try to keep PCQ-total-limit under 10000 to avoid the bufferbloat problem. Janis talks a bit about this in his QoS presentation (Page 26 & 27): http://mum.mikrotik.com/presentations/US08/janism.pdf
As I said before, I’m not incredibly familiar with the finer workings of PCQ, nor do I have much carrier-level QoS experience, but this would be my best educated guess. Turn down your PCQ-limit and PCQ-total-limit and the latency and timeout issues should clear up. Best of luck.
Thanks for the nice post, I’ll have to study Buffer Bloat deeper, a bit later…
as for my problem, PCQ is designed to divide bandwidth equally (according to its description), so it should work like many PFIFO queues of ‘limit’ size. during trouble times, I see that subqueues with almost no traffic are… kind of ‘congested’?.. and other PCQ queues works normally. also, the number of queued packets increased from 15000 to 18000 just in real time, in about 15-20 seconds. I have only two screenshots, but in reality it was increasing, increasing, increasing…
for me, it looks like a bug in PCQ implementation, because I don’t believe that all users together began to flood packets. btw, I have Max-Limit set to 40M for that queue, so even 2MiB of queued data should not be a problem
have just make one test with ROS v5.6:
/queue type
add kind=pcq name=pcq-upload pcq-classifier=src-address pcq-rate=128k pcq-limit=50 pcq-total-limit=200
/queue tree
add name=queue1 packet-mark=flood-pkt parent=global-out queue=pcq-upload
then I generate 10000 packets and send them to that queue:
Is it possible to provide the server spec and approximate bandwidth/packets ?
Actually i afraid from x86 network adapter bottleneck , IRQ Balancing , blabla…
Maybe i can use your experience to choose an x86 server for QOS .
it’s some SuperMicro with 2x1G, 2x10G Intel NICs (we use one 10G with VLANs) and Xeon CPU E5-2609 v2 @ 2.50GHz
in evenings it passes about 2000 Mbps/180kpps with ~55-60% CPU load (queues, nat, ipoe)