"No such items" when taking any action under /ip firewall connection in rOS7

Hello,

This is a general issue I've noticed in rOS7, regardless of specific version, including latest 7.23. While sometimes having similar issues in rOS6, it's generally more robust.

We have multiple systems routing services that usually have ~200k+ connections.
Some times some of these, like SIP connections, stall and I remove them instead of waiting for them to timeout, using something like /ip/firewall/connection remove [find src-address=10.20.1.20 src-port=5060]. 100% of the time this fails half-way through with a "No such item" error.
My "solution" is to run it a bunch of times, at which point it finally cleans up all related connections.

Another issue is that, in rOS6, if connection table was especially big and causing issues I could do a loop :

local continue true
while ($continue) do={
  do {
    /ip/firewall/connection remove [find src-address=10.20.1.20 src-port=5060]
    set continue false
  } on-error={}
}

This fails in rOS7 with an "interrupted" message, along with the usual "no such item".

I appreciate the fact that going through 200k connections is possibly intensive, but I can't just be flushing the whole connection tracking table whenever I need to fix 5 connections for a specific host.

Has anyone else noticed this and is there a fix?

This seems to be a robust approach:

@infabo Thanks...

This command do not work as many person think…

even with explanation:

Adapted:

/ip fire conn
:foreach idc in=[find where timeout>60 and src-address=10.20.1.20 and src-port=5060] do={
 remove [find where .id=$idc]
}

200k?... Increase "60" seconds for the time that is needed for the speed of the machine...

Hi rextended,

I know that [find] will first build a list of connections matching the find filter, then, if some expire while the command is running, it will fail.
However the issue here is that it fails while printing too :

[sin3vil@MASTER] > ip firewall/connection/print where reply-src-address=10.222.3.124     
Flags: E - expected; S - seen-reply; A - assured; C - confirmed; D - dying; F - fasttrack; H - hw-offload; s - srcnat; d - dstnat 
 #           PROTOCOL SRC-ADDRESS     SRC-PORT DST-ADDRESS     DST-PORT TCP-STATE   TIMEOUT       ORIG-RATE   REPL-RATE ORIG-PACKETS REPL-PACKETS      ORIG-BYTES      REPL-BYTES
217234  SAC      udp      192.168.189.132     5060 10.222.3.124        5060             59m44s             0bps        0bps          441          289         238 824         144 309
217235    C      udp      10.99.0.173        26210 10.222.3.124        5060             40m37s             0bps        0bps           22            0          18 194               0
217236    C      udp      10.99.0.173        47870 10.222.3.124        5060             20m9s              0bps        0bps           33            0          27 280               0
217237    C      udp      10.99.0.173        38882 10.222.3.124        5060             14m49s             0bps        0bps           33            0          27 269               0
217238    C      udp      10.99.0.173        36123 10.222.3.124        5060             30m44s             0bps        0bps           33            0          27 280               0
217239  SAC      udp      10.99.0.173        22168 10.222.3.124        5060             12m3s              0bps        0bps          872          504         413 687         268 579
217240  SAC      udp      10.99.1.243        20118 10.222.3.124        5060             59m45s             0bps        0bps          501           15          42 147           8 393
no such item (4)

There's also this weird issue, where while my tracking setup is like this :

                   enabled: auto  
               active-ipv4: yes   
               active-ipv6: no    
      tcp-syn-sent-timeout: 5s    
  tcp-syn-received-timeout: 5s    
   tcp-established-timeout: 1d    
      tcp-fin-wait-timeout: 10s   
    tcp-close-wait-timeout: 10s   
      tcp-last-ack-timeout: 10s   
     tcp-time-wait-timeout: 10s   
         tcp-close-timeout: 10s   
   tcp-max-retrans-timeout: 5m    
       tcp-unacked-timeout: 5m    
        loose-tcp-tracking: yes   
      liberal-tcp-tracking: no    
               udp-timeout: 30s   
        udp-stream-timeout: 3m    
              icmp-timeout: 10s   
           generic-timeout: 10m   
               max-entries: 843776
             total-entries: 543931
         total-ip4-entries: 543931
         total-ip6-entries: 0

I get some udp connections with 60m timeout, like above.

BTW, why timeout>60? Theoretically stalled connections will have a timeout value < 60.

If some connection are closed (removed) between the start and the end of the command, it do error and stop working, leaving open some connections.

Because those close automatically in 60 seconds...
and I care about closing the older ones...
So if the script finishes within a minute,
it doesn't throw errors because the ones after 60 seconds close automatically...

tcp-established-timeout=1d.... 1 Day???
With over 16000+ effective users (multiple datacenter, not start all from only one unique position...),
no one have a single problems with defaults modified with:

/ip firewall connection tracking tcp-established-timeout=30m

I don't know of anything that my customers need that leaves the TCP connection hanging for nothing, without even passing a keepalive packet for a whole day...

Thanks for pointing that out. This machine generally serves UDP connections, so I haven't actually checked TCP settings.

@rextended what about do { } on-error={} failing in this context? Any theories?

I don't use them; for me, using them is a programming failure.
I really need to be forced to use them (like :resolve).

The proper method (and why it's needed), together with other methods are discussed in this thread:

Yeah, the surely compatible version, as discussed in the thread,

foreach var in=$someArray do={ do { $something } on-error={ $somethingElse } }

does not work here, at least in rOS7. on-error doesn't seem capable of eating the error and continuing the loop.

What you wrote matches neither anserk's or my syntax.

As discussed there, only the first (my) version is officially documented.

(I would expect both to work.)

You said discussed in this thread, so I assumed you meant all posts.
This form is the one that yearly7100 says works in both rOS6 and rOS7 and isn't for and foreach specific.

The for/foreach specific error handling is not really applicable here as the find itself seems to fail :

[sin3vil@MASTER] > local items [/ip firewall connection find src-address=10.222.3.124 src-port=5060];put "items collected!";foreach item in=$items do={ /ip firewall connection remove $item } on-error={}

interrupted
no such item (4)

The loop itself also doesn't handle it :

[sin3vil@MASTER] > for i from=0 to=5 do={ /ip firewall connection remove [find src-address=10.222.3.124 src-port=5060] } on-error={}
interrupted
no such item (4)

And :onerror also seems to fail:

[sin3vil@MASTER] > for i from=0 to=5 do={ /ip firewall connection remove [find src-address=10.222.3.124 src-port=5060] } on-error={}    

interrupted
no such item (4)

I think you missed the whole point: find gives you a static result set of internal ids. In the meantime any of these connections in your result set could timeout, meaning they vanish while you're looping over the result set. that's why where timeout>60 is in the sample. and if removing/iterating your connections (200k+) takes longer than 60s, just increase it to be within processing time.

thanks @infabo

The wole point is:

My script is closer to the solution (fix RouterOS).

Or it work, or not (and must be specified why), nothing is perfect,
but any proposed solution must be really one valid alternative.

I explained the steps, as I always try to prevent errors, not relying on the shitty on-error-resume-next way of programming,
which sometimes in some particular cases (the classic :resolve that doesn't find the domain[1]) I'm forced to use when I'm not programming for myself.


  1. Still stop script on 7.23 ↩︎

I'm sorry, but I genuinely don't get the difficulty.

> foreach i in={10000; 30000} do= { /ip firewall connection remove $i } 
no such item

> foreach i in={10000; 30000} do= { :onerror e in= { /ip firewall connection remove $i } do= { :put "Skipped $i" } }             
Skipped 10000
Skipped 30000

> foreach i in={10000; 30000} do= { /ip firewall connection remove $i } on-error= { :put "Skipped $i" }          
Skipped 10000
Skipped 30000

These are from 7.20.8. I also tested it on 7.22beta5. Same result.

One with find, just for you:

> foreach i in= [ /ip firewall connection find protocol=udp ] do= { /ip firewall connection remove $i } on-error= { :put "Skipped $i" }

In this case, this is what we're left with: if the connection is no more, the error has to be swallowed.

If you happen to remember, we've also suggested to simply have an "ignore non-existent" mode for this stuff. It could even be the default. But it seems as though that's not meant to be.

The problem with the timeout-based approach is that connections can end in ways other than normal timeout. TCP connections may be RST-d or FIN-ed. All connections may be removed by interface down and masquerade (at least in newer versions.)

Even if it's not to your taste, currently catching the errors is the proper thing to do within the confines of the scripting system.

Simply on-error-resume-next-less version:

:foreach i in={10000; 30000} do={ /ip firewall connection remove [find where .id=$i] }

As you can see, there is no valid reason to use on-error-resume-next.

No find (the $i), no error...


P.S.: When do examples... Two times the same non founded index?
Skipped 10000
Skipped 30000
It's not realistic at all... In fact, you had already deleted them (from 10001 to 29999), how can he find the same numbers again?...

However,
since I have no gain from making people think differently,
and so as not to arouse further antipathy beyond that which I already arouse,
I said (and wrote) mine, I'll end this here.

But this depends on whether MikroTik's devs explicitly implement special optimization for [find where .id=$i]. They would need a special code path that sees that the condition is just .id=xxx and only does a hash lookup to confirm the existence of the record.

But if they implement [find] in a generic way, with the same code path used for other parameter checks, then your code devolves into a O(n²) quadratic time complexity, because [find] would iterate over all conntrack entries and compare the conditions. This sub-iteration is done for each iteration of the outer foreach.

If we assume that remove <single-id> is O(1) (hopefully the connection table is indexed / hashed by id), then @lurker888 solution only takes O(n).

The additional problem is that [ find .id=... ] returns either an empty list (which is ok) or the id itself. Then, when we try to delete it, either the connection is still present - or it has just been removed between this second find and the removal, in which case we get... an error.