This is a general issue I've noticed in rOS7, regardless of specific version, including latest 7.23. While sometimes having similar issues in rOS6, it's generally more robust.
We have multiple systems routing services that usually have ~200k+ connections.
Some times some of these, like SIP connections, stall and I remove them instead of waiting for them to timeout, using something like /ip/firewall/connection remove [find src-address=10.20.1.20 src-port=5060]. 100% of the time this fails half-way through with a "No such item" error.
My "solution" is to run it a bunch of times, at which point it finally cleans up all related connections.
Another issue is that, in rOS6, if connection table was especially big and causing issues I could do a loop :
local continue true
while ($continue) do={
do {
/ip/firewall/connection remove [find src-address=10.20.1.20 src-port=5060]
set continue false
} on-error={}
}
This fails in rOS7 with an "interrupted" message, along with the usual "no such item".
I appreciate the fact that going through 200k connections is possibly intensive, but I can't just be flushing the whole connection tracking table whenever I need to fix 5 connections for a specific host.
I know that [find] will first build a list of connections matching the find filter, then, if some expire while the command is running, it will fail.
However the issue here is that it fails while printing too :
[sin3vil@MASTER] > ip firewall/connection/print where reply-src-address=10.222.3.124
Flags: E - expected; S - seen-reply; A - assured; C - confirmed; D - dying; F - fasttrack; H - hw-offload; s - srcnat; d - dstnat
# PROTOCOL SRC-ADDRESS SRC-PORT DST-ADDRESS DST-PORT TCP-STATE TIMEOUT ORIG-RATE REPL-RATE ORIG-PACKETS REPL-PACKETS ORIG-BYTES REPL-BYTES
217234 SAC udp 192.168.189.132 5060 10.222.3.124 5060 59m44s 0bps 0bps 441 289 238 824 144 309
217235 C udp 10.99.0.173 26210 10.222.3.124 5060 40m37s 0bps 0bps 22 0 18 194 0
217236 C udp 10.99.0.173 47870 10.222.3.124 5060 20m9s 0bps 0bps 33 0 27 280 0
217237 C udp 10.99.0.173 38882 10.222.3.124 5060 14m49s 0bps 0bps 33 0 27 269 0
217238 C udp 10.99.0.173 36123 10.222.3.124 5060 30m44s 0bps 0bps 33 0 27 280 0
217239 SAC udp 10.99.0.173 22168 10.222.3.124 5060 12m3s 0bps 0bps 872 504 413 687 268 579
217240 SAC udp 10.99.1.243 20118 10.222.3.124 5060 59m45s 0bps 0bps 501 15 42 147 8 393
no such item (4)
There's also this weird issue, where while my tracking setup is like this :
If some connection are closed (removed) between the start and the end of the command, it do error and stop working, leaving open some connections.
Because those close automatically in 60 seconds...
and I care about closing the older ones...
So if the script finishes within a minute,
it doesn't throw errors because the ones after 60 seconds close automatically...
tcp-established-timeout=1d.... 1 Day???
With over 16000+ effective users (multiple datacenter, not start all from only one unique position...),
no one have a single problems with defaults modified with:
I don't know of anything that my customers need that leaves the TCP connection hanging for nothing, without even passing a keepalive packet for a whole day...
You said discussed in this thread, so I assumed you meant all posts.
This form is the one that yearly7100 says works in both rOS6 and rOS7 and isn't for and foreach specific.
The for/foreach specific error handling is not really applicable here as the find itself seems to fail :
[sin3vil@MASTER] > local items [/ip firewall connection find src-address=10.222.3.124 src-port=5060];put "items collected!";foreach item in=$items do={ /ip firewall connection remove $item } on-error={}
interrupted
no such item (4)
The loop itself also doesn't handle it :
[sin3vil@MASTER] > for i from=0 to=5 do={ /ip firewall connection remove [find src-address=10.222.3.124 src-port=5060] } on-error={}
interrupted
no such item (4)
And :onerror also seems to fail:
[sin3vil@MASTER] > for i from=0 to=5 do={ /ip firewall connection remove [find src-address=10.222.3.124 src-port=5060] } on-error={}
interrupted
no such item (4)
I think you missed the whole point: find gives you a static result set of internal ids. In the meantime any of these connections in your result set could timeout, meaning they vanish while you're looping over the result set. that's why where timeout>60 is in the sample. and if removing/iterating your connections (200k+) takes longer than 60s, just increase it to be within processing time.
My script is closer to the solution (fix RouterOS).
Or it work, or not (and must be specified why), nothing is perfect,
but any proposed solution must be really one valid alternative.
I explained the steps, as I always try to prevent errors, not relying on the shitty on-error-resume-next way of programming,
which sometimes in some particular cases (the classic :resolve that doesn't find the domain[1]) I'm forced to use when I'm not programming for myself.
In this case, this is what we're left with: if the connection is no more, the error has to be swallowed.
If you happen to remember, we've also suggested to simply have an "ignore non-existent" mode for this stuff. It could even be the default. But it seems as though that's not meant to be.
The problem with the timeout-based approach is that connections can end in ways other than normal timeout. TCP connections may be RST-d or FIN-ed. All connections may be removed by interface down and masquerade (at least in newer versions.)
Even if it's not to your taste, currently catching the errors is the proper thing to do within the confines of the scripting system.
:foreach i in={10000; 30000} do={ /ip firewall connection remove [find where .id=$i] }
As you can see, there is no valid reason to use on-error-resume-next.
No find (the $i), no error...
P.S.: When do examples... Two times the same non founded index?
Skipped 10000
Skipped 30000
It's not realistic at all... In fact, you had already deleted them (from 10001 to 29999), how can he find the same numbers again?...
However,
since I have no gain from making people think differently,
and so as not to arouse further antipathy beyond that which I already arouse,
I said (and wrote) mine, I'll end this here.
But this depends on whether MikroTik's devs explicitly implement special optimization for [find where .id=$i]. They would need a special code path that sees that the condition is just .id=xxx and only does a hash lookup to confirm the existence of the record.
But if they implement [find] in a generic way, with the same code path used for other parameter checks, then your code devolves into a O(n²) quadratic time complexity, because [find] would iterate over all conntrack entries and compare the conditions. This sub-iteration is done for each iteration of the outer foreach.
If we assume that remove <single-id> is O(1) (hopefully the connection table is indexed / hashed by id), then @lurker888 solution only takes O(n).
The additional problem is that [ find .id=... ] returns either an empty list (which is ok) or the id itself. Then, when we try to delete it, either the connection is still present - or it has just been removed between this second find and the removal, in which case we get... an error.