So, I can also confirm that the BB images floating around cause routers to crash and reboot. For me, on a test 450G, it happens approx. every 4 minutes. It turns out that it is actually causing a kernel panic on the host router. This is not 100% clear when testing on a recent version of RouterOS (6.28 in this case) because it looks like MikroTik developers have suppressed kernel panic output to the console in newer versions (which I'm not sure why they think that helps anybody?!?), but when I downgraded my test router to 6.9, the crashes still happened at the same intervals, and with 6.9 I actually got a kernel panic report on the serial console whenever it happened. After the crash, the watchdog (if enabled) would cause the router to reboot.
I agree with you, that MetaROUTER guest must not crash the host. However, according to autosupout.rif it happens:
I can confirm the very same but on RouterOS 6.28 version running hecke's BB image as MetaROUTER.
Even Hecke confirmed it's not stable and needs more patching to make it functional.
Right, but hecke
was referring to the crashes that were happening only within the guest
because of OpenWRT kernel bugs, not crashes that were happening on the host
, which until now nobody else had even brought up in previous discussions about this image (this is totally new information; all previous discussions about BB instability referred to the guest
crashing whenever it received data over the virtual ethernet interface), and which he likely cannot do anything about since fixing this kind of crash is MikroTik's responsibility...I don't know why you would expect that any changes that hecke
might make to the BB image would fix the router crashes and reboots. It's certainly possible
that he or somebody else might find a way to work around the RouterOS bug by making changes in the BB kernel, but it is
a RouterOS bug! If Windows running on top of VMware crashed, and that caused VMware itself to also crash (taking all other running guests down with it), you would not blame the VMware crash on Windows, even if Windows itself was known to be buggy. The host should never crash as a result of something that the guest does, legally or illegally.
Even if the host router was not crashing and rebooting all the time with the BB image, I'm not sure why anybody would want to use this image since the guest itself is constantly kernel panicing whenever it receives network data from the host.
Now, it turns out that the kernel panic occurring on the host is also
happening within exactly the same function (skb_put) as when the BB guest's kernel panics, and it is panicing for a very similar reason (packet length received by guest is some ridiculously astronomical value), which is very interesting. It does suggest that if the bug in the guest's kernel is fixed, it might very well also stabilize the host. But that doesn't mean that there isn't still a bug in the host that needs to be fixed.
Here is a sample panic report from the host kernel:
skb_over_panic: text:ceea47bc len:202688 put:202688 head:cefaa800 data:cefaa840 tail:0xcefdc000 end:0xcefaaf40 dev:<NULL>
Kernel bug detected[#1]:
$ 0 : 00000000 00000000 00000080 c03c0000
$ 4 : 00000001 c03c1418 00000001 c0266a54
$ 8 : 0000000a 00000000 0000000a 20646576
$12 : 00000000 00000001 00000000 00000000
$16 : cefa47a8 ced98d80 cee15c00 00000001
$20 : 000000d0 0000000d 00000080 c03c0000
$24 : 00000002 c0266330
$28 : c03b2000 c03b3d00 00000009 c02a59f0
Hi : 00000000
Lo : 19d60000
epc : c02a59f0 skb_put+0x78/0x84
ra : c02a59f0 skb_put+0x78/0x84
Status: 1000d203 KERNEL EXL IE
Cause : 10000024
PrId : 0001800a (MIPS 24Kc)
Process swapper (pid: 0, threadinfo=c03b2000, task=c03b5c10, tls=00000000)
Stack : ced98e40 ceea47bc 000317c0 000317c0 cefaa800 cefaa840 cefdc000 cefaaf40
c03958fc c03c0000 00000009 ceea47bc ced98e40 6cb2ecc0 00000000 00000001
cefa47a8 0000012c c03d0000 c03d0000 c0420000 00000080 c03c0000 c03d0000
c03d7e88 c02afa74 c040e018 00000fbf 00000000 c010eaa0 ffffe03f c0383e9c
00000001 00000001 c0410000 c0410000 c040e024 00000100 c040e018 c03c14a0
[<ceea47bc>] vm_release_queue+0x7a8/0x1468 [net-back@0xceea4000]
Code: 2484595c 0c0ce9ee 01202821 <0200000d> 03e00008 00000000 27bdffc0 afb20020 afb1001c
---[ end trace dd086ffc4df0a0d9 ]---
Kernel panic - not syncing: Fatal exception in interrupt
panicSaver: dumping panic to flash
flash: erase f
flash: prg f
Somebody (I suppose this will end up being me...*sigh*
) will need to open a bug report with MikroTik with information on how to reproduce, autosupouts created after the crash, and a copy of the BB image that can be used to reproduce this.
I'm also not sure why Mikrotik stopped with providing new patches and modification since it can only make their product better and even more widely used.
I would guess because OpenWRT is a constant moving target that changes kernel versions like every few weeks. MikroTik tends to stick with one kernel and fine-tunes it over the course of an entire release (RouterOS 6.x has used a heavily-patched Linux 3.3.5 kernel for the entirely of the 6.x series). MikroTik trying to keep up with OpenWRT changes would be nuts; I don't blame them one bit for not constantly updating the MR guest patches for OWRT.
Ideally, if OWRT wants MetaROUTER to be a supported platform, then somebody on the OpenWRT development team needs to be the one to keep the MR release working and up-to-date. That's how all of the other officially-supported OWRT platforms work. Do you see any of the other router manufacturers (Linksys, Netgear, etc.) actively contributing to OpenWRT development for their particular products? Yeah, didn't think so.
By the way, anyone got Liquid's AA with asterisk gui running? It's missing 3 dependency packages and I'm wondering if they can be compiled on our own machines (not sure if they [packages] need some patching as well for AA or we can just patch using standard mips subversion)?
I tried to address this earlier in this post
. The problem is that liquidcz
only took the asterisk-gui package from my Kamikaze patch, and did not port over my modified version of the asterisk18 package included in the same patch sets. My asterisk-gui package is not compatible with the stock asterisk18 package from any version of OpenWRT. He will need to port my asterisk18 packages over to his AA buildroot and rebuild the Asterisk packages for his repository. (It looks like -force-depends won't work in this case because there is just too much missing/different between stock OpenWRT asterisk18 and mine.)
Alternatively, you (or anybody else) can take a basic AA buildroot, make whatever changes to it (if any) that liquidcz
made to his, copy over my asterisk18 and asterisk-gui packages from my Kamikaze patches that I publish freely, and build your own copy of AA with my Asterisk changes baked in.