x86 Machine - RouterOS 4.5 Crashing

Hi All,

I have a x86 machine, installed RouterOS v4.5 using the ISO image from the MikroTik website. I purchased a Level 6 License and have it installed.

Specs On The Machine:
SuperMicro Motherboard #X7SLM-L
Pentium D 945 Dual Core 3.4GHz 800MHz 2x2MB
Intel 945GC + ICH7R Chipset
2Gig of Ram
32Gig Solid State Sata II Drive ( Primary )
500Gig Sata II Drive ( Cache Drive )
2 x Realtek RTL8111C-GR Gigabit Ethernet ports.

The problem is that the machine will randomly lock up, there is nothing written to the log when this happens. The only remedy I have is to reboot the router by switching off the power and turning it back on. I have tried v3.3, v4.5 & v4.10, same thing happens on all three versions.

The machine runs fine on the bench, only locks up when put it in service.

If anyone has any ideas I would love to hear them…

Thanks
Dean

Are you able to log into the device locally when it locks up, or is it completely locked up? If you can log into it when it happens, you can make a supout file and send it to MikroTik support and have them look at it.

Is there any specific set of circumstances when it happens? Since it’s only happening when installed and not on the test bench, I’m guessing it’s under a certain amount of load. Have you run memtest on the RAM to see if there are any issues with it? Have you checked out the CPU to make sure that’s good as well?

You can also try adding sources to the log file and writing them to disk temporarily to see if anything shows up, since if it’s writing to memory it is wiped out upon a reboot.

Unfortunately the machine is locked up tight. Once it has crashed it must be rebooted.

Prior to installing the RouterOS I plugged in a drive with XP on it and put the machine through all the standard tests, CPU, Memory, Hard Drives (as secondary’s), Upgraded the firmware for all the devices. So far I have not been able to identify any set of events that cause the problem. From a fresh boot the system will run from 2 hours, up to 22 hours (the longest run), I have had the system crash in the middle of the day (heaviest load) and at 2 in the morning (almost no load).

I have set the sources to both disk and remote for the log files, nothing is written to them at the time of the crash, the system just goes down.

Needless to say, VERY frustrating!

Thanks For The Response…

can you find autosupout.rif after power cycling the device? which packages are enabled? do you have lots of things going on on the ROS device?

in my case, my RB1000 locked up regularly at irregular interval and rarely produces autosupout.rif, but eventually produced one.

Well if it’s not locking up at all on your test bench, it could be something environmental to where you have it installed. Have you tried plugging it into a different power outlet or a dedicated circuit, or possibly putting it on a battery backup device?

I doubt it’s the RouterOS seeing as the problem exists for you across many of the versions. You could try a fresh netinstall and reformat the flash drive with it however, never used it myself, so I’m not sure how it works.

You could also try downloading a Linux tool like memtest86+ and booting to that directly from the disc, or something like Ultimate Boot CD that has a couple of test tools built into it. The XP tests generally aren’t the best. You’ll want to let it run for several passes on the RAM, and the same thing for the CPU tests, you’ll want to let them run for at least 8 hours, possibly overnight.

Hi,

Sorry for the delay in replying,

As far as packages on the router, all packages are installed and enable.

Should have mentioned, when the router is in service it is located in a temperature regulated, power conditioned rack. The rack it self is powered through an APC 5Kva battery backup.

I did download the Ultimate Boot CD you suggested and will run that tonight, see what happens.

Thanks for the suggestion.

Dean

you do NOT need ALL packages =) remove/disable unnecessary ones

I went through the packages and uninstalled everything I didn’t need.

These are the ones left in the machine:
advanced-tools
calea
dhcp
multicast
ntp
ppp
routerboard
routing
security
system

Unfortunately, it did not correct the issue, it ran for about 3 hours prior to locking up, AGAIN! Grrrrrrr

Thanks for the tip though,
Dean

I guess you don’t use standart power supply method (wall socket - wire - PSU). Dig that direction. There are testers (avometers?) which remember lowest/highest voltage, I suggest you use one.

I downloaded the Ultimate Boot CD recommended by Feklar,

After testing everything the Ultimate Boot CD found nothing wrong with the system, all tests passed. At this point I am beginning to believe that the problem lies with the Realtek RTL8111C-GR Gigabit Ethernet ports.What I need is a way to test these under load on the bench.

Any suggestions on the best method for this?

Thanks
Dean

More Notes On The Issue:

I am pretty sure this issue has to do with RouterOS / Realtek RTL8111C-GR Gigabit Ethernet ports

Here is why, I used Auto Negotiate for all tests and changed only the OS in the x86 machine, used same configuration.
3.30 - Will not connect at 1Gbps, connects only at 100bps/Full Duplex
4.5 - Connects at 1Gbps Full Duplex
4.6 (*) - Will not connect at 1Gbps, connects only at 100bps Half Duplex
4.10 - Same As 4.6
5.0 Beta 3 (Just For Fun) - Same As 4.6

*Note: tries to connect at 1Gbps but the Ethernet link goes up and down cycling every 5 seconds. Have to set the Ethernet port static to 100Mbps/Full Duplex to connect, but then only at Half-Duplex.

Any ideas?

Once again, aquire an electric tester. There could be something wrong with your cables. Of course you will need lots of logics to find the problem. :sunglasses:
The 4.5 version could be just coinsedence.

P.S. Length of cables? Type? device on the other end? Draw a picture!

Checked all cables with a Fluke tester, all passed. The cables being used are Cat6 cables 3 feet long Since the unit is plugged into a APC Battery Backup and none of the other devices plugged into it have problems I know it’s not the power.

The connections are as follows: Cisco 2821 Router - x86 RouterOS - Cisco SRW-2016 Gig Switch

Everything is powered from the UPS or some devices are straight from “wall socket”?
I remember some situations where the problem was devices using different phases and somewhere nearby someone stealing electricity :slight_smile:
Such problem will not be discovered by any tester, 600$ or 16$.

P.S. Why Cisco->RouterOS-> switch? Who’s masquerading? Who is who? Cisco is ISP and RouterOS is clients router?

Yes, everything is powered by the UPS. It is currently running at 35% capacity and the voltages through the rack system are constant, 118 Volts.

The Cisco 2821 is a Fiber Channel hand off to Ethernet. The x86 will take the place of the RB-1000 that is currently serving as the core router for the wireless network.

The RB-1000 works perfectly, just getting over worked.

PentiumD? uhm… turn of hyperthreading in BIOS? :sunglasses:

Intel Pentium D 945 has no HT

I think the problem is in DNS…

Nope, Not DNS, If it was the machine would not function from the start.

And this machine uses the same DNS as the RB-1000 that works perfectly.

I didn’t mean machines DNS