User Manager for 30K Subscribers

Can anyone share their experience of using User-manager with a large number of subscribers. 30-40 thousand subscribers.

I know this sounds like a terrible idea. But unfortunately it is the only solution accepted by the ISP management software vendor.

User manager is just radius server, how can this be only solution?



There are other requirements and matters that were not mentioned.
One of those is that the ERP/CRM needs to be responsible to fill up this data.
They are not familiar with any kind of Radius.
But, by some weird reason, they are familiar with MikroTik API.
So, they accepted to fill up the data via Mikrotik API.

We are not in the mood to deal with this Radius scenario.

But the question IS NOT that!

Have anyone som experience with more than 20K users in user-manager, with 5-10 minutes interim update.

Have you tried daloRADIUS

Yes! But manny years ago.
I took a look on it… I’m not sure, but looks like the projet has no updates since 2021.
That’s correct?

@fischerdouglas; I’m not sure I get the question.

Are you asking about: 1) The total number of users in the database, 2) The number of concurrent users connecting to a RAS server, or 3) whether all those hosts are connecting to a single RADIUS server at the same time?

For the number of users in the database, that’s mainly a storage capacity issue, and 30-40 thousand subscribers shouldn’t be a problem. When it comes to concurrent connections and transaction rates, the built-in RoS User-Manager has some limitations because it’s based on SQLite. You’ll need to run performance tests to determine the actual limits. For software like FreeRADIUS or Microsoft NPS, it really depends on your system’s resources and configuration.

The actual scenario is 32-33K ISP subscribers running on +/- 40 Mikrotik gateways of several flavors (1072, 1036, 4011, 3011, 2216).
The ERP/CRM provision the users directly on the gateway (an horrendous methodologyI know!).
We are going to deploy hardware based BNGs, that do not support having the users provisioned directly on it(and even it supports, it is an stupidity.)

So the idea is requiring the ERP/CRM to provision users on user-manager of a CHR with good CPU/Memory/IOPS-Disk, using the same API syntax.
And then point the BNGs to query user manager to authenticate the ISP subscribers.

I known it is a ugly solution, but is what can be done right now.
ISP do not want to change ERP.
ERP do not give access to data(via DB or API).

It works o small scale. 2K subscribers tested.

The hope is the if ERP/CRM guys sees it working as it should be (not with that shit of writing the users directly on the device and doing trough radius) they consider to adopt radius embedded on theyr system.

So, what I only need to know is if someone has already tested user-manager with 30-40K ISP subscribers active, and sending interim update every 5 minutes.

Depending on how fast the storage on your CHR is, it might not work. Or if the router has SSD storage it might cause excessive wear. With 40K subscribers and 5-minute interim update interval, which is 300s, we are looking at about 135 updates/second. User manager is using a SQLite file to store everything (including sessions). If you read about SQLite update/insert performance you’ll find out that it only fast at that when using transactions and batching many updates in a single transaction, because when updating each transaction will lock the database file and will flush the storage device. With interim updates however, they are all independent updates that arrive at arbitrary times and will certainly not batched together by User Manager, which means each update is a separate new transaction and must be independently flushed to the storage.

From SQLite FAQ https://sqlite.org/faq.html#q19 with spinning disks only about 60 transactions per second are possible, so SSD is a must. However, each transaction is supposed to be fully committed to the persistent storage which means blocks on the NAND will be written for each transaction.

I think you’re missing the point, it’s about running a separate RADIUS server. Plus, SQLite isn’t really made for this kind of workload anyway.

whip up a fake API endpoint that mimics RouterOS for the endpoints they want to use

Simply save the data correctly to a MySQL or similar database

Configure FreeRADIUS (or multiple for load balance/redundancy), and point all the routers at that for RADIUS Authentication.

This is how I would do it within these limitations.

Please re-read the thread. OP explicitly asked about using User Manager in a CHR installation, due to MikroTik API usage. If User Manager is to be used, then so is the underlying SQLite DB that it relies on.

You’re absolutely right, so I’ve clarified the answer!

That said, it’s a pity that RoS User-Manager doesn’t offer the same configuration options as for example FreeRadius where you can set it up to use an external database server.

Seems like ship sailed on alternatives. I don’t have direct experience with UM… But my thought be CHR be required for sure. After all, It’s just auth requests, not traffic. So how ofter subscribers re-auth’ing?

I also don’t think there a lot of complex DB operations either. Perhaps few SELECT and UPDATE happen per auth. Plus benefits of SQLite is you can copy the file to backup up, and/or perhaps use another CHR with same file to do analytics on the recent copy. I am more familiar with Microsoft’s Active Directory - which is kinda similar to RADIUS. And that uses a file-based “JET” database (which striped down/optimized 1990s Access DB file) & it still works with millions of records (and every Outlook email potentially making their own queries - so way more than any RADIUS).

It think it’s really the CPU, and with CHR & that’s up to you. And, do right things in VM platform on the network card (i.e. not use an emulated card) too. But CHR is orders of magnitude better at the “server” tasks, than any of the network-centric CPU in hardware is what may save you here. i.e. throw more hardware at the problem.

And this does seem testable with some of freeradius client tools to use against User Manager. Certainly there is a point where SQLite might be the bottleneck for sure, but that kinda depends more on exactly how UM uses it.

Yeah, the VM should be pretty easy to scale up with standard measures, and depending on how MT implements SQLite caching, you might even be able to use it as an in-memory database if you add a lot of RAM. But since SQLite is single-threaded, how it handles command queuing with a bunch of concurrent read and CRUD operations without losing a transaction (to ensure ACID compliance) is tough to say without some actual performance testing.

The NTDS database (ESE aka “JET”), used by AD, Exchange, and NPS, is actually pretty solid when it comes to transactional performance in a multi-user environment. But it doesn’t use SQL since it’s more of a hierarchical database manager, like for mapping LDAP trees and also aligned with NoSQL like schema-less, key/value storage, and semi-structured data etc.

And for performance testing there are a lot of good tools out there like RadPerf etc: https://www.serverwatch.com/guides/radius-server-test-tools/

Even then, if Mikrotik’s implementation write immediately, one can always cheat with a CHR. Just put enough memory on the host to it be used as a buffer. I don’t know about Windows (no experience there), but Linux has an almost pathological urge in buffering writes to disk - and caching reads too. Give enough memory to the CHR host, and it will all be cached - even if the host sends a write to its virtualized disk.

There is one catch: this will not work if we use RAW as the disk format - that one is controlled by the guest system. But basically anything else? Yeah, heavily cached to read AND write, if there is memory available.

That might be the case, but it really depends on how the developers have set up the SQLite settings, like journal_mode, cache_size, temp_store, synchronous, and how they handle client busy timeouts, etc. And of course, the maximum number of concurrent transactions. If the underlying file system for CHR is fully memory-cached and SQLite can’t perform a proper write-through, there’s a risk of data loss if something crashes or goes wrong.

CPU, Memory and SSD are not a problem!
We can do some KIWI (Kill It With Iron) resolution.

I have never used, but I understood that with rose-storage I can fake a disk on RAM, so we could use it toavoid issues on write needs.

My worries are now just on single thread of SQLite.

I will try to elaborate some benchmarking on a test environment.

Just be aware you will lose ALL writes if you are unable to sync a RAM drive to permanent storage. My recommendation is to use at least some kind of delayed-write filesystem. I also think you should consult an experienced system architect to help you design these types of solutions.


Yes, it’s probably wise to conduct a realistic performance test with production like volumes to ensure functionality under expected maximum load. It’s unfortunate to implement a solution with the premise “we thought it would work” when reaching full capacity and everything goes to hell. One only gets burned once to avoid repeating those mistakes. (been there, done that)

Not really. That part (the host CHR buffer write) is completely transparent to the client. Now, about the speed and lock table on the client, You are completely right: if the CPU can’t keep up, there is nothing much else we can do.

Also, I think your point of “test first, promise later” is 100% spot on. Time and again we got burnt by this. No amount of “in theory it should” is replacement for “we REALLY did it, and went well”.

If you can throw hardware, I won’t worry too much. Especially about SQLite, for several reasons:

  • UM does cache things (i.e. the “Sent from Cache” stat).

  • And the users/etc AFAIK are stored in RouterOS config, not the database. From the schema, it looks like mosts tables (“user”, “group”, etc.) are just “pointers” with the .id matching /user-manager/… config - except “session” which is a real table:
    sqlite> .schema

CREATE TABLE IF NOT EXISTS ‘config’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘radiusserver’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘router’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘attribute’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘group’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘user’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘limitation’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘profile’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘profilelimit’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘userprofile’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘usercounters’ (id INTEGER PRIMARY KEY, msg BLOB);
CREATE TABLE IF NOT EXISTS ‘session’ (id INTEGER PRIMARY KEY , “8000001” INTEGER, “8000003” INTEGER, “20000007” TEXT, “20000004” TEXT, “20000008” TEXT, “2000001c” TEXT, “2000001b” TEXT, “8000005” INTEGER, “8000006” INTEGER, “18000019” BLOB, “8000009” INTEGER, “1800001a” BLOB, “800000e” INTEGER, “800000b” INTEGER, “f” INTEGER, “800000c” INTEGER, “800000d” INTEGER, “8000011” INTEGER, “8000015” INTEGER, “10000016” INTEGER, “10000017” INTEGER, “20000066” TEXT);
CREATE TABLE IF NOT EXISTS ‘payment’ (id INTEGER PRIMARY KEY, msg BLOB);

with the BLOB being same 4 bytes for ALL users in table “user” (but the RouterOS .id == DB id, 100K of SAME value) - which I’m guessing mean “user data in config” (i.e. memory)

sqlite3 um5.sqlite ‘SELECT msg from “user” where id=48333’ | hexdump

324d 0a06

  • Mikrotik uses a -wal file with SQLite - which make total sense since the only table is one that see a lot of writes — session. And that exactly what SQLite has the “Write-Ahead Log” (thus the -wal) is for write-heavy things. All totally reasonable design. Except, it does mean the SQLite is only useful for extracting session data (i.e. copy file, run sqlite3 CLI to “output .csv” from “select * from session”.

  • I loaded 100K users. Users loaded in ~2 minutes from a :foreach — which be faster via API, since it doesn’t have to parse the script. I didn’t any do load tests in this 10 minute experiment… But resulting SQLite is 6Mb - now with session obviously would get MUCH bigger.

  • Since I was more curious what Winbox4 would do with 100K UM users here, than SQLite… And can report Winbox4 has no issues… The dialogs all work. You’ll you notice in dropdown with UM users do takes 10-20 seconds to populate & winbox is not quite as snappy if the UM view with users is loaded (since it polling for changes). Config take longer to exports, etc. - kinda similar to have large routing tables - everything work, but admin UI isn’t as snappy with a large config loaded. And this was on a RB1100AHx4, which is far from even a laptop in performance.

  • Now… I wouldn’t use ROSE. Just mount one of the VM local disks to store the DB.

  • As pointed out, it ain’t hard to test RADIUS - plenty of tools. You can try a laptop with CHR VM to see if it breaks with some stress test & you’d know any real server be fine. Now… since it’s easy to test, I’m pretty sure Mikroitk does load testing on it. It’s the more multi-vendor interop where Mikroitk isn’t as good. Efficient code they generally do better.