> /tool/profile cpu=all duration=10s
Columns: NAME, CPU, USAGE
NAME CPU USAGE
snmp 0 0.5%
ethernet 0 2.5%
console 0 0%
firewall 0 5%
networking 0 11.5%
logging 0 0%
management 0 12.5%
wireless 0 3%
encrypting 0 5%
routing 0 21.5%
ssl 0 1%
profiling 0 0%
bridging 0 0%
unclassified 0 4%
cpu0 66.5%
snmp 1 0%
ethernet 1 5%
firewall 1 10%
networking 1 30.5%
management 1 1.5%
wireless 1 3.5%
encrypting 1 14.5%
routing 1 13%
ssl 1 2%
bridging 1 3.5%
unclassified 1 8%
cpu1 91.5%
One of the culprits was 10ms keepalive time on BGP sessions. For some reason crossfig or whatever it's called decided, that keepalive=1s in 6.x means keepalive=10ms in 7.x (which is impossible to set by hand). Not only this: routing engine decided to obey and happily spammed keepalives 100 times per second. Setting it back to 1s and restarting sessions reduced CPU usage by 20%do you have fast-track enabled?
also, there is no route-cache on rOS v7.x.x like rOSv6 had
to be frank, I have no idea... this info was what I got from support when opened a ticket for high cpu usage on my RB4011 before the fasttrack was fixed.btw. if there's no route-cache on v7, then what does "/ip settings set route-cache=yes" do?
maybe there wasn't, but now there is? Who knows.to be frank, I have no idea... this info was what I got from support when opened a ticket for high cpu usage on my RB4011 before the fasttrack was fixed.btw. if there's no route-cache on v7, then what does "/ip settings set route-cache=yes" do?
Thanks for clearing this up. Please figure out why keepalive-time=1s gets converted to 10ms when upgrading from 6 to 7. I've seen this few times already, but didn't investigate until now.route cache setting is doing nothing, it will be removed in the future.
1 second is an unreasonably low keepalive time. You would normally not set the keepalive time but rather set the hold time, and the keepalive time will be 1/3 of that.Please figure out why keepalive-time=1s gets converted to 10ms when upgrading from 6 to 7. I've seen this few times already, but didn't investigate until now.route cache setting is doing nothing, it will be removed in the future.
Only thing that's unreasonable is converting 1000 milliseconds to 10 milliseconds when upgrading from 6.49.6 to 7.2.1; when one day BFD arrives, I'll use it to get response times faster than 3 seconds, but until then I guess I'll have to live with 3 second lag until redundancy kicks in after some usual fiber vs. rat or fiber vs. runaway excavator1 second is an unreasonably low keepalive time. You would normally not set the keepalive time but rather set the hold time, and the keepalive time will be 1/3 of that.
Please figure out why keepalive-time=1s gets converted to 10ms when upgrading from 6 to 7. I've seen this few times already, but didn't investigate until now.
Indeed 3s is the lowest hold time, and it would result in a 1s keepalive time, but I think in cases where you want fast BGP response to link down it is better to use BFD.
(unfortunately BFD does not yet work in v7 but it is "promised" to arrive soon)
That would not work, as BGP is running over TCP not over UDP. New keepalives are inserted above TCP and it is the TCP re-try mechanism that governs sending them to the other side. A well-implemented TCP would not even try to send the newly added data before the (re-)transmission timers of the existing data kicked in (or an ACK is received).Increasing the frequency of keepalives does not make BGP converge faster. Hold time is the one that controls it. The only reason to set very frequent keepalives is when the latency or packet drop on the working link is so high, that you need to send 10 keepalives to make sure that at least one of them will reach the destination within 10 seconds.