RB532A + hotspot + User-manager = crash

Just over a month ago I had a problem with an RB532A as a CPE which was running User-Manager and hotspot for the user. It had become almost totally unresponsive but before it crashed altogether I saw that it had tens of thousands of log and session entries despite having been up for only five weeks, and was showing zero free hard-drive space.

I replaced it with a brand new 532A with the same configuration except that I turned all U-M logging off. The original 532 had to be ‘saved’ by a boot interrupt from the serial port and a reinstallation of RouterOS by netinstall.

I’ve received a call from the user that the same thing has happened with the replacement board - hotspot no-longer responds and ‘system resources’ is showing zero space on the hard-drive.

User-manager logging was turned off and I installed winagain’s ‘clear userman log scripts’ which was last run only last Saturday.

The user has now lost his hotspot - and the income from it - until I can get back out to him tomorrow. Hopefully I’ll still be able to get access to the Routerboard as it is still functioning as a router, and even more hopefully rescue it.

I’m not sure what version of Routerboard it’s running, tho’ it’s likely to be either 2.9.46 or 51. Trying to do a sup-out is likely to crash it completely if it’s out of memory.

I’ll check back here before I do anything. Can anyone suggest:

  1. what might be chewing up the hdd memory?

  2. what I ought to check out if I can still get access it?

  3. how can I free up memory in a hurry if there’re no obviously bloated files to delete?

  4. how can I see what’s taking up the memory?

  5. anything else?

Will report back further tomorrow.

Thanks

OK, the fault here seems to have been a silly one of mine but as I’ve learned a lesson from it and it’s the kind of thing that could happen to anyone I’ll report it here on a ‘here be dragons’ basis.

About a month ago an RB532 crashed because log and session reporting in a hotspot/User-manager set-up completely filled the hard-drive to the point the Router was unable to respond.

So I replaced it with another 532 which I re-configured, stopped all logging, taught the user how to do some housekeeping by removing ancient session records and also moved a small amount of PPPoE RADIUS work the User-Manager had been doing to another board all to lighten the load.

Unfortunately I left one of the network users CPEs still pointing at it for RADIUS AAA.

So for around three weeks that CPE had been trying to log on to the ‘wrong’ User-manager which, of course, was rejecting it and logging the authentication attempt and failure. The result was a 43MB file called log-txt which filled the hdd to capacity and brought the router to a standstill.

Deleting the log-txt file freed up the hdd and the board is working normally again. However although I deleted the 43MB log-txt file, userman still shows there are 167,151 log entries for the event - and I haven’t been able to find any way of clearing these entries except by ‘removing’ them via the web-page either 20 or 1,000 at a time, the latter taking well over a minute to effect. As I don’t have 167+ minutes to spend in front of a keyboard just removing these entries I’ve left them there.

Is there any other way of getting rid of them?

And the log.txt file was back again this morning, like a fat and unwelcome visitor hogging all the bar.

I removed a few thousand log entries manually a page at a time but to get rid of them all that way is going to take hours.

Winagain’s script (see below:

http://forum.mikrotik.com/t/cant-remove-userman-logs/18531/1

)

doesn’t work, perhaps because I’ve deleted the customer whom the user creating the accounting failure log entries was under, and winagain’s script only works for users beneath the logged on customer. Perhaps.

Can anyone suggest why and when the log.txt file is rewritten and if I can stop it, or any other way of deleting these entries en masse?

Failing that I looks as if I’ll have to take the router down, reset it and reconfigure it as winagain did to get rid of his memory-hogging log.

I guess you need to disable that script and ask author, how you can fix it.

Do you think it’s the script?

As far as I can tell the script does its job when I run it as admin, in that it removes all the log entries relating to admin’s users. But it doesn’t remove ALL the log entries - those relating to the user ‘Billy’ remain because (I’m guessing) ‘Billy’ is a user of customer ‘manager’. So scripts run by by customer ‘admin’ can only effect log - and session? - entries of customer admin but not those of customer ‘Billy’.

Unfortunately customer ‘Billy’ no longer exists as I removed him when I shifted the RADIUS function elsewhere. I guess I could recreate him and see if running the script as Billy deletes ‘Billy’s’ log entries but I won’t be back there for a few days to try it. And of course User-Manager might not recognise the new Billy as the old one.

Even admin in this case - the subscriber - didn’t create user Billy and therefore be his parent, as the router Billy was created on was replaced with the present one.

Anyone know if running ‘reset-configuration’ under the system menu clears the hdd of non-system files? I could always restore a back-up of the configuration, but otherwise it looks more and more like it’s going to be a three hour job removing the log entries manually.

reset-configuration erases all files.
Try to disable script that is generating .txt file and see whether it will help.

Thank you Sergejs.