Too many bugs

I’ve been testing RouterOS for about a month and there are just too many bugs.

  1. Ping

Sending ICMP packets from Mikrotik to a host on same network always report packet loss but pinging from the host reports no packet loss.
Mikrotik 4.10, 4.11, 5.0rc1 all same result. Icmp rate limit sysctl was set to 0.

From Mikrotik to a server

ping count=100 interval=10ms xxx.xxx.xxx.xxx
[omitted]
100 packets transmitted, 88 packets received, 12% packet loss
round-trip min/avg/max = 0/1.0/8 ms

From the server to Mikrotik

ping -q -A -c1000 xxx.xxx.xxx.xxx

PING xxx.xxx.xxx.xxx (xxx.xxx.xxx.xxx) 56(84) bytes of data.

— xxx.xxx.xxx.xxx ping statistics —
1000 packets transmitted, 1000 received, 0% packet loss, time 1007ms
rtt min/avg/max/mdev = 0.285/0.627/6.103/0.258 ms, pipe 2, ipg/ewma 1.008/0.687 ms


2. Incorrect millisecond conversion
Creating a “Dst. Limit” criteria mangle rule with expire 5000ms from Winbox turns into 50s, not 5s when exported from CLI.
Same result on Mikrotik 4.10, 4.11, 5.0rc1.


3. PPPoE - incorrect radius accounting counters
When a PPPoE user connects and disconnects frequently, often previous session’s Acct-Input-Octets/Acct-Output-Octets value never resets and it gets carried to next session. Raw accounting packets can be provided if needed. Tested on 4.10 and 4.11.


4. Mangle chain
Mangle rules completely not working under version 5.0rc1.


Anyone having any of above issue?

Lastly, is there RAW table in RouterOS? I could not find in Wiki nor any of documentation.
Not all IPs require conntrack and under some circumstances such as DDoS attack, setting NOTRACK for certain IPs can ease CPU utilisation.

Don’t you think you’re being a bit harsh ?

Finding 4 bugs (there are certainly more) in something as complex as ROS is quite a small number really.

  1. Ping is a very blunt tool, and if it doesn’t work perfectly, i personally would not put it at the top of my ‘to fix’ list. Having said that, ICMP echo/reply is very easy to code, so i can’t imagine what’s wrong there.

  2. Incorrect millisecond conversion - as you know about this, it’s easy to workaround, and would also not be a Top Priority to fix.

  3. PPPoE - incorrect radius accounting counters - that is worth reporting as it’s a more important thing to fix. No point having Accounts if the dats is wrong.

  4. Mangle chain … 5.0rc1 - Release Candidate - now you’ve reported it, hopefully it will get fixed.

Did you report all of these findings before posting to the forum ?

I was more frustrated with the ping issue than radius accounting issue. How many network admins assume ping could be reporting inaccurate result? Ping is the first tool that I use for any network issue. Except some firewall environment, it’s still a very useful tool to diagnose and usually a packet loss indicates that there is an issue such as congestion, duplex mismatch etc. If such a basic tool goes wrong, even basic network setup can take substantial time.

Incorrect conversion - I was not sure whether Winbox or CLI was wrong because netfilter hashlimit module accepts millisecond value, not second.

I’ve just reported millisecond conversion and radius accounting issue.

Any idea for RAW table?

Regarding the ping issue, I noticed you used a 10ms interval, which in the case of routeros means any replies that take longer than 10ms will be counted as loss. (and will also not show up in the max column, which I notice was 8, so its possible that 12% of the packets had a longer than 10ms response, rather than actually lost).

Try 20ms or 30ms and see what happens. You can look for both loss, and the maximum response time, then go from there to try to figure out if the host just isn’t always responding promptly or if its routeros, but some devices do not consider responding to ICMP echos to be a top priority.

I haven’t noticed issues 2, 3, and 4. I would consider 2 minor, 3 is major and I will have investigate that one to see if we’re seeing it; 4 would be major for those relying on it, but we don’t use mangle much here.

We’re also not having conntrack related issues.

Yes, it seems icmp replies with RTT more than interval are discarded. 20ms show no packet loss. Thanks!! :slight_smile:

Here’s details for radius issue, basically previous session’s in/out octet counters are carried over to next session. So each time this user gets disconnection for some reason, usage doubles plus new usage. This continues until manual disconnection from CLI. Anyone had similar experience?


Wed Sep 15 10:24:57 2010

Acct-Session-Time = 23
Acct-Input-Octets = 859413130
Acct-Input-Gigawords = 0
Acct-Input-Packets = 2789375
Acct-Output-Octets = 1640199704
Acct-Output-Gigawords = 0
Acct-Output-Packets = 3421348
Acct-Status-Type = Stop
Acct-Terminate-Cause = User-Request



+---------------------+---------------------+-----------------+-----------------+------------------+
| AcctStartTime       | AcctStopTime        | AcctSessionTime | AcctInputOctets | AcctOutputOctets |
+---------------------+---------------------+-----------------+-----------------+------------------+
| 2010-09-10 15:05:01 | 2010-09-10 16:50:21 |            6321 |       108820555 |         36598477 |
| 2010-09-10 16:50:21 | 2010-09-10 16:51:05 |              44 |       108837673 |         36725816 |
| 2010-09-10 16:51:05 | 2010-09-10 17:32:39 |            2493 |       125510165 |        102159293 |
| 2010-09-10 17:32:39 | 2010-09-10 17:33:55 |              76 |       126694585 |        109152007 |
| 2010-09-10 17:33:55 | 2010-09-10 18:54:29 |            4834 |       189536885 |        180852063 |
| 2010-09-10 18:54:29 | 2010-09-10 19:06:12 |             703 |       197507554 |        183972443 |
| 2010-09-10 19:06:12 | 2010-09-10 20:24:37 |            4705 |       247332500 |        221474604 |
| 2010-09-10 20:24:37 | 2010-09-10 20:33:55 |             558 |       250784020 |        224538605 |
| 2010-09-10 20:33:55 | 2010-09-10 22:22:19 |            6504 |       306139799 |        300129415 |
| 2010-09-10 22:22:19 | 2010-09-10 22:23:29 |              70 |       306140608 |        300134055 |
| 2010-09-10 22:23:29 | 2010-09-10 22:30:53 |             444 |       306705188 |        300884914 |
| 2010-09-10 22:30:53 | 2010-09-10 22:42:02 |             669 |       310963501 |        301676859 |
| 2010-09-10 22:42:02 | 2010-09-11 10:19:23 |           41841 |       477801410 |        573071565 |
| 2010-09-11 10:19:23 | 2010-09-11 10:20:18 |              55 |       477801845 |        573072090 |
| 2010-09-11 10:20:18 | 2010-09-11 10:20:50 |              32 |       477802280 |        573072615 |
| 2010-09-11 10:20:50 | 2010-09-11 10:22:28 |              98 |       477802976 |        573073611 |
| 2010-09-11 10:22:28 | 2010-09-11 14:28:43 |           14775 |       487630032 |        648852312 |
| 2010-09-11 14:28:43 | 2010-09-11 14:31:06 |             143 |       487634569 |        648856222 |
| 2010-09-11 14:31:06 | 2010-09-11 17:30:01 |           10735 |       490406922 |        663470864 |
| 2010-09-11 17:30:01 | 2010-09-11 21:26:03 |           14161 |       494535348 |        682785057 |
| 2010-09-11 21:26:03 | 2010-09-11 21:31:35 |             333 |       494556012 |        682805647 |
| 2010-09-11 21:31:35 | 2010-09-12 00:17:46 |            9970 |       502058766 |        722623341 |
| 2010-09-12 00:17:46 | 2010-09-12 00:19:03 |              77 |       502059537 |        722624947 |
| 2010-09-12 00:19:03 | 2010-09-12 01:18:51 |            3588 |       536482253 |        747873766 |
| 2010-09-12 01:18:51 | 2010-09-12 01:54:54 |            2163 |       536524557 |        747953383 |
| 2010-09-12 01:54:54 | 2010-09-12 12:46:49 |           39115 |       542087084 |        797432874 |
| 2010-09-12 12:46:49 | 2010-09-12 12:48:07 |              78 |       542098061 |        797441035 |
| 2010-09-12 12:48:07 | 2010-09-12 17:34:02 |           17154 |       549801682 |        828062355 |
| 2010-09-12 17:34:02 | 2010-09-12 17:37:14 |             192 |       549810427 |        828071599 |
| 2010-09-12 17:37:14 | 2010-09-12 17:43:29 |             375 |       549992295 |        828554779 |
| 2010-09-12 17:43:29 | 2010-09-12 17:49:14 |             345 |       550104309 |        828828895 |
| 2010-09-12 17:49:14 | 2010-09-12 18:02:32 |             798 |       551090445 |        836915385 |
| 2010-09-12 18:02:32 | 2010-09-12 18:03:49 |              77 |       551182988 |        837215802 |
| 2010-09-12 18:03:49 | 2010-09-12 18:06:41 |             172 |       551688412 |        840473887 |
| 2010-09-12 18:06:41 | 2010-09-12 23:37:18 |           19837 |       728830414 |       1167514439 |
| 2010-09-12 23:37:18 | 2010-09-13 00:13:29 |            2171 |       728853334 |       1167558326 |
| 2010-09-13 00:13:29 | 2010-09-13 12:24:24 |           43855 |       769556713 |       1229949070 |
| 2010-09-13 12:24:24 | 2010-09-13 12:25:30 |              66 |       769557148 |       1229949659 |
| 2010-09-13 12:25:30 | 2010-09-13 16:01:03 |           12932 |       770234465 |       1231713817 |
| 2010-09-13 16:01:03 | 2010-09-13 20:23:43 |           15760 |       783928275 |       1419777334 |
| 2010-09-13 20:23:43 | 2010-09-13 20:27:25 |             222 |       784310166 |       1421331934 |
| 2010-09-13 20:27:25 | 2010-09-13 22:09:48 |            6142 |       790906601 |       1444854769 |
| 2010-09-13 22:09:48 | 2010-09-13 22:16:14 |             386 |       790971761 |       1445142001 |
| 2010-09-13 22:16:14 | 2010-09-13 22:16:39 |              25 |       790972138 |       1445142194 |
| 2010-09-13 22:16:39 | 2010-09-13 22:17:45 |              66 |       791008433 |       1445342071 |
| 2010-09-13 22:17:45 | 2010-09-13 22:22:09 |             264 |       791019538 |       1445354026 |
| 2010-09-13 22:22:09 | 2010-09-13 22:26:25 |             256 |       791028906 |       1445366541 |
| 2010-09-13 22:26:25 | 2010-09-14 11:40:19 |           47634 |       824059762 |       1502636857 |
| 2010-09-14 11:40:19 | 2010-09-14 11:47:19 |             419 |       824896953 |       1503296569 |
| 2010-09-14 11:47:19 | 2010-09-14 17:29:17 |           20518 |       834409926 |       1520520969 |
| 2010-09-14 17:29:17 | 2010-09-14 21:36:57 |           14860 |       855808304 |       1612130964 |
| 2010-09-14 21:36:58 | 2010-09-14 21:37:26 |              29 |       855808679 |       1612133447 |
| 2010-09-14 21:37:27 | 2010-09-14 21:38:52 |              85 |       855815482 |       1612140240 |
| 2010-09-14 21:38:52 | 2010-09-14 22:43:38 |            3886 |       857977383 |       1629235466 |
| 2010-09-14 22:43:38 | 2010-09-14 22:59:21 |             943 |       857985648 |       1629250201 |
| 2010-09-14 22:59:21 | 2010-09-15 10:23:22 |           41040 |       859412051 |       1640197485 |
| 2010-09-15 10:23:22 | 2010-09-15 10:24:34 |              72 |       859412822 |       1640199436 |
| 2010-09-15 10:24:34 | 2010-09-15 10:24:57 |              23 |       859413130 |       1640199704 |
| 2010-09-15 10:24:57 | 2010-09-15 10:27:49 |             172 |       859415346 |       1640203132 |
| 2010-09-15 10:27:49 | 2010-09-15 10:29:10 |              81 |       859555402 |       1640696219 |
| 2010-09-15 10:29:10 | 2010-09-15 12:14:31 |            6320 |       901537705 |       1679197095 |
| 2010-09-15 12:14:31 | 2010-09-15 13:39:54 |            5124 |       901592442 |       1679289763 |
+---------------------+---------------------+-----------------+-----------------+------------------+

DOH ! Sorry to miss such an obvious thing like the RTT = 10ms.

The Radius accounting records might not be any problem at all, depending on how the server side maths are done.

The accounting is done base on the ‘username’ in the database, and also on the diference between the last Accounting record and the one that has just arrived, unless an Accounting Stop and then and Accounting Start request are made.

So, if the session stops at say 1,000,000 bytes, the client disconnects, reconnects, and the byte counter is still at 1,000,000 bytes for the next accounting record, then difference will be Zero - no extra bytes registered.

If this happened 50 times, the difference would still be zero ‘new’ bytes.

The next proper accounting record might be 1,100,000 which would mean 100,000 ‘new’ bytes to account for.

I might be wrong, but it seems less of an issue if this is how something like Usermanager does it.

Now that i think about it, it would be more of a disaster is the Accounting/Stop and Accounting Start were missed and the byte counters were zero - that would signify either a rollover of Acct-Gigawords + Acct-Bytes or a ‘info total garbage’ situation.

I just took a quick look at some accounting records for PPPoE and they seem to be starting over from 0 for each connection just fine, from a 5.0beta6 router.

Are these from PPPoE connections or some other type? Also, it appears you’re using freeradius. Double-check to make sure its not configured to accumulate, instead of record absolute values.

You could packet-capture, or enable radius logging to see what the router is actually sending.

Re-reading the previous, is this only happening sometimes (when PPPoE user disconnects/reconnects rapidly?)

Yes, PPPoE only. No, it’s not configured to accumulate. As posted earlier, below is a radius accounting packet sent by the mikrotik router, it clearly shows that the radius server is not the problem here.

Wed Sep 15 10:24:57 2010

Acct-Session-Time = 23
Acct-Input-Octets = 859413130
Acct-Input-Gigawords = 0
Acct-Input-Packets = 2789375
Acct-Output-Octets = 1640199704

Acct-Output-Gigawords = 0
Acct-Output-Packets = 3421348
Acct-Status-Type = Stop
Acct-Terminate-Cause = User-Request

It only happens to a few sessions per day, usually less than 5 second gap between each session. I’ve tried to replicate but no success. Changes to Interim-Update interval and one-session-per-host made no difference.

It seems under some specific circumstances, pppoe sessions do not get disconnected properly and the counters get carried to the next session.

Well done to You for identifying a Serious problem.

It is Bad that the Problem is there.

It is Excellent that you have provided a lot of data, because now the programmers can find the problem and fix it.

Most of the time people just say ‘X does not work’.

Wha you have done is to provide the information needed to fix it.

Wel done.

it’s Community forum. please write to support@mikrotik.com, to developers =)

we’ll be glad to see here the result, we can’t fix it by ourselves

It’s logged and got an auto reply from Mikrotik support on 1/10/10 but no further response yet =(

Finding the problem in the coding is hard.

Much harder when there is more than 1 programmer on it, cos everyone has a different style and way of thinking.

Give them some time, because each problem you found was in a different area of code.

I sure as shi’ite couldn’t find, then fix (100% sure) all of those in 1 month.

Doh ! then they gotta be tested before being released.

More than 1 month basically.

Once again, you’re brilliant in that you have Hard Data to help pinpoint where the code is failing.
Without that, it is almost impossible to find where the problem is.

Apparently they cant either… Which leaves me with a choice of using openvpn server which clobbers itself, ipsec/l2tp which has non-functional nat-t, or sstp which leaks memory like no other forcing me to reboot the router every few days or risk it running out of memory. SSH server that crashes all the time, idle timeouts that dont function right as any traffic that is destined to go through the interface is counted as traffic, even though the connection is actually down and not passing traffic, an RB1100 with several ports that wont work on 4 different devices that I have tried, responses from support saying that it is fixed in 5.0rc1 when the supouts I send demonstrating the problem clearly identify themselves as 5.0rc1.

crossposting to other topics will not help at all. please write to support with link to this thread, short and accurate description of the problem, and supout.rif file

Apparently they cant either…

That’s not quite fair.

They haven’t yet fixed the bugs is more like it.

Maybe it’s a sad case of sucess - they’ve got so many different platforms and have added so many features that it is getting too hard to keep up.

OverMoaning about it on their forum got me permanently banned from Ubiquiti’s site, and then they fixed what i was moaning about (6 months later).

Your Frustration comes thru loud and clear - MT would be wise to take that seriously.

However, you’ll not get anywhere with any manufacturer by publicly pointing out bugs and weaknesses - better to contact them directly.

I think its been going on a year now w/ no fix.

Works for the big boys when you do full disclosure on bugs and exploits

I think its been going on a year now w/ no fix.

Ah. That’s a big Oooops then.

bugs and exploits

Bugs maybe. Exploits certainly.

I think its been going on a year now w/ no fix.

I don’t believe that you have an unresolved ticket with support about this for a year. If so, let me know the ticket number and I will check why nobody has solved it for you.

I get no answer for 2 days about this one: #2010101366000154, that is the same like: #2010092966000323 (30/09/10).

You tell me explain in the mail, but I think it’s easy and fast to reproduce it at any routerboard.