✨ Steering AI to use new manual.mikrotik.com

Amm0 · June 4, 2026, 3:04pm

Under development as a "Useful useful article". Top post is recommendations, with third post being background/"grounding" for the answer here, 2nd post is background research from other models.

MikroTik is working on a new documentation site that, I believe, should help AI agents like ChatGPT, Claude, and "friends" to better respond to RouterOS inquires. See:

Since MikroTik added "AI support" to the doc engine behind it, I prompted several agents on what they recommend for telling future AI agent how to use it. So I compared some of the web-based LLMs to see what they suggest. Here is the combined recommendation from Claude Opus after reviewing the previous agents work:

Current Recommendations (from Opus 4.8)

RouterOS is one of the topics LLMs are worst at — not because they've seen
too little, but because they've seen too much of the wrong material: years
of wiki.mikrotik.com (mostly v6) and the retired Confluence help site, plus a
decade of forum snippets. So they confidently hand you v6 syntax for a v7 box,
or invent properties that never existed.

MikroTik's new manual fixes the supply side of this: every page is now
published in machine-readable form specifically so assistants can read it. The
job left to us is the demand side — telling the LLM to actually go read it.
Here's how, from lightest to heaviest.

First, know what your tool can do. A prompt only helps if the tool can
fetch a URL. If it can't, the prompt just makes it more careful — it can't
read a page it can't reach.

Your tool	Fetches live URLs?	Use
ChatGPT (browsing on), Claude.ai, Gemini, Perplexity, Copilot	Yes	A or B
Claude Code/Desktop, Cursor, Codex, custom agents	Yes — best fit	A + D
Offline / local models (Ollama, LM Studio), no-browse chat	No	C

A — Paste once into "custom instructions" / system prompt (heavy users)

You are helping with MikroTik RouterOS. Assume RouterOS v7 unless I say otherwise.

Your training data is unreliable for RouterOS: it is dominated by the old
wiki.mikrotik.com (mostly v6) and the retired Confluence help site, so you tend
to mix v6 and v7 syntax and invent commands or properties. Do not trust your
memory for exact syntax.

The current authoritative source is MikroTik's new manual, which publishes a
machine-readable copy of every page:

  - https://manual.mikrotik.com/llms.txt       index of every page (start here)
  - https://manual.mikrotik.com/llms-full.txt  the whole manual in one file
  - add ".md" to any docs URL                  clean Markdown of that page
       e.g. https://manual.mikrotik.com/docs/introduction.md
  - https://manual.mikrotik.com/docs/cli-reference/   command / menu / argument
       reference, increasingly generated from RouterOS itself

For every RouterOS question:
  1. Fetch /llms.txt and pick the relevant page(s).
  2. Fetch the ".md" version of those page(s) for the actual content.
  3. For exact command paths, property names, argument types and enum values,
     check the CLI Reference and match what it defines — do not invent flags.
  4. Cite the page(s) you used.
  5. If you could not fetch the docs, say so and answer from memory only with a
     warning — never present unverified syntax as fact.

B — Per-question snippet (when you can't set a system prompt)

Before answering, fetch https://manual.mikrotik.com/llms.txt, find the page that
matches my question, and read its ".md" version (append .md to the URL). For
exact command/property/enum syntax check https://manual.mikrotik.com/docs/cli-reference/
and don't invent properties. Assume RouterOS v7, cite the page you used, and if
you can't reach the docs say so instead of guessing — your RouterOS training data
is mostly old v6 wiki content.

My question: <your question here>

C — No web access (offline, local model, or privacy-sensitive)

The prompt can't help a model that can't browse — feed it the docs yourself:

# whole manual as one file (large) — paste the relevant part as context
curl -L https://manual.mikrotik.com/llms-full.txt -o routeros-manual.txt

# or just the page you need, as clean Markdown
curl -L https://manual.mikrotik.com/docs/<path>.md

Paste the relevant section into the chat before your question. The same
llms-full.txt is also what you'd index for a local RAG setup.

D — Agentic / MCP tools (where this really pays off)

Tools that fetch on a loop — Claude Code/Desktop, Cursor, Codex, Perplexity,
or any custom agent — can run the whole llms.txt → .md → cli-reference
workflow themselves, no re-pasting. Option A as the system prompt + any generic
"fetch"/web tool is usually enough.

If you'd rather not have the model crawl HTML at all, there's a ready-made
RouterOS-docs MCP I maintain — rosetta (GitHub - tikoci/rosetta: MCP Server with RouterOS docs + commands + products + changelogs, using SQLite-as-RAG, sourced from MikroTik · GitHub) —
that exposes RouterOS docs, the full command tree across many versions,
changelogs, and hardware specs as MCP search tools:

bunx @tikoci/rosetta --setup     # local stdio MCP for Claude Code/Desktop, Cursor, Codex…

It even installs on the router via /app (RouterOS 7.22+), so any AI assistant
on your LAN can query it. (Honest note: rosetta still indexes the prior doc export
and is mid-migration to manual.mikrotik.com — but the command tree and version
data come straight from the router.)

One trick: verify it actually read the docs

Ask: "Which manual.mikrotik.com page did you read, and quote its first line."
If it can't name a real page or quote it, it answered from memory — treat the
syntax as unverified.

TL;DR

The single highest-value instruction in all of the above is "check the CLI Reference and don't invent properties."

Most RouterOS hallucinations are atthe property/path/enum level, and that's exactly what the CLI Reference pins down.

Amm0 · June 4, 2026, 3:04pm

Original Background Research - loosely incorporated into above

The following the results from querying various AI agents for "their thoughts" on good prompts for RouterOS to use new doc site https://manual.mikrotik.com

Prompt

i'm writing a post for forum.mikrotik.com about how to better prompt a LLM like you to use MikroTik's new doc site for RouterOS. can you suggest what prompt I should suggest that steer an LLM to checking MikroTik new documentation site that use Docusaurus with an llm.txt and .md renderings. push back if NOT helpful but my believe is training is light on MikroTik RouterOS config/troubleshooting. the new doc site has a explaination here: Introduction | RouterOS Manual and there is also a new CLI reference that knows the RouterOS command schema: CLI Reference | RouterOS Manual

Claude Sonnet 4.6 (high thinking)

Good instinct, and your premise is correct — I'll validate it before diving in. Let me explain the actual issue, then give you a well-crafted prompt to share.

Pushback / validation first

You're right that training is light on RouterOS, but the more precise problem is stale + conflicting training data. LLMs almost certainly have:

Heavy coverage of the old wiki.mikrotik.com (no longer updated, predominantly v6 syntax)
Some coverage of RouterOS - RouterOS - MikroTik Documentation (the Confluence-era site)
Zero or negligible coverage of the new manual.mikrotik.com (Docusaurus site, clearly post-training)

The result is that LLMs confidently produce v6 syntax for v7 users, because the old wiki was voluminous, well-indexed, and widely mirrored. It's not just light training — it's actively misleading training. This is worth calling out in your forum post, because it explains why the fix works.

The good news: the whole MikroTik manual is published in plain formats specifically so it can be read by retrieval pipelines, assistants, and other automated tools — not only in a browser. The recommended pattern is to read /llms.txt first to find the relevant pages, then fetch the individual .md pages needed for detail. When the complete corpus is needed in one piece, /llms-full.txt works instead.

The CLI Reference section is generated automatically from the RouterOS system itself rather than written by hand — the menus, commands, and argument types are extracted from the software, so the reference tracks what the system actually exposes. This makes it uniquely valuable for grounding LLMs in exact current syntax.

Key points worth making in your forum post

The three MikroTik doc sites (wiki.mikrotik.com, help.mikrotik.com/docs, manual.mikrotik.com) coexist and LLMs get confused between them — explicitly telling the LLM which one is current matters
The .md URL trick works on every page of the new site and gives much cleaner input than the rendered HTML
The CLI Reference being machine-generated is a significant quality signal — it's not a human trying to document a command, it's the schema of the software itself
LLMs with web search (ChatGPT browsing, Claude.ai, Perplexity) can execute this workflow automatically; users without it should use llms-full.txt as a paste-in context block

Gemini (Flash Extended)

Your belief is 100% correct—there is absolutely no reason to push back. Standard LLM training datasets are notoriously unreliable when it comes to MikroTik RouterOS. LLMs frequently hallucinate script syntax, conflate obsolete RouterOS v6 commands with the newer v7 architecture (especially regarding the routing engine, bridge VLAN filtering, and the newer wifi wave2/WiFi 7 parameters), and miss strict data-type constraints.

MikroTik’s brand-new addition of machine-readable endpoints (/llms.txt, /llms-full.txt, and raw Markdown pages) is a massive win for the community. Because web-enabled AI models (like Gemini, ChatGPT, or Claude) can browse in real time, forcing them to pull from these endpoints completely bypasses their outdated or shallow training data.

Here is a breakdown of how to structure your forum post, along with the exact prompt templates you can suggest to the community.

Key Technical Points to Highlight in Your Forum Post

When explaining to the forum why this works and how to use it, you should highlight a few critical facts about MikroTik's new setup:

The Power of the llms.txt Standard: Explain that /llms.txt and /llms-full.txt are part of a rapidly adopting standard specifically designed to give AI models a clean, noise-free site map of Markdown files. It prevents the model from getting lost in standard HTML UI elements (sidebars, footers, scripts) and provides direct links to the information it needs.
The Auto-Generated CLI Reference: Emphasize that the new cli-reference section isn't written by hand; it is compiled directly from the RouterOS software schema itself. Telling the LLM to check this specific path means it will look up actual exposed binaries and arguments, drastically reducing the hallucination of non-existent flags.
Strict Parameter Matching: Mention that the new docs explicitly define variable types (like iface_enum or ipPrefix). Instructing the LLM to observe these argument definitions prevents it from generating malformed strings in scripts.
Keeping Pace with Rapid v7 Development: With features like Wi-Fi 7 Multi-Link Operation (MLO), container configurations, and SOCKS user authentication changing frequently, an LLM's static training cutoff is instantly useless. Real-time browsing of the new docs fixes this.

ChatGPT (web/"free account", model unknown)

I think your instinct is correct, but I'd frame it a little differently.

The problem is not necessarily that an LLM was trained on too little RouterOS. The bigger problem is that RouterOS changes rapidly, MikroTik's old wiki/help sites are still heavily indexed, and many models have seen years of forum posts, old wiki pages, and RouterOS v6-era examples. Without guidance, an LLM may mix:

old wiki content,
Confluence-era help.mikrotik.com pages,
forum snippets,
RouterOS v6 syntax,
RouterOS v7 syntax,
and even completely invented CLI properties.

The new Docusaurus-based documentation is valuable because it provides a canonical, machine-readable source, including Markdown renderings and CLI schema documentation. The CLI reference is especially interesting because it gives an AI a structured source for command paths, arguments, types, enums, and property descriptions. (manual.mikrotik.com)

For a forum post, I'd recommend something like:

Background and Grounding on Recommendation for Steering AI to use new manual.mikrotik.com (from Opus 4.8 xhigh)

Some background on why the prompts above are shaped the way they are. I had
Claude (Opus) cross-check the three agent answers in post #2 against the live
site and correct what they got wrong — notes below.

The real problem isn't "too little training" — it's stale, conflicting training

LLMs haven't seen too little RouterOS; they've seen too much of the wrong
RouterOS. Their data is dominated by:

wiki.mikrotik.com — voluminous, heavily mirrored, mostly v6, no longer updated
the retired Confluence help.mikrotik.com site
~20 years of forum posts mixing v6 and v7

The new Docusaurus site (manual.mikrotik.com) is almost certainly after most
models' training cutoff. So the failure mode isn't "I don't know" — it's
confident v6-for-v7 answers and invented properties. That's why the prompts lead
with "don't trust your memory" and name the old sites explicitly: you're
overriding bad priors, not filling a blank.

What the new site changed (and why it matters to an LLM)

MikroTik now emits a machine-readable copy of the docs on every build
(via the docusaurus-plugin-llms plugin):

/llms.txt — an index of all ~558 pages as [Title](….md): description
links, following the llmstxt.org convention. It's a
clean, noise-free map so the model finds the right page without wading through
HTML nav/sidebars/scripts.
per-page .md — append .md to any docs URL for the raw Markdown
(e.g. …/docs/introduction.md). Cheaper to read and far less ambiguous than
rendered HTML.
/llms-full.txt — the entire manual in one file, for bulk ingest / RAG / offline.
robots.txt now explicitly welcomes the AI crawlers (ClaudeBot, GPTBot,
PerplexityBot, Google-Extended, CCBot…) — so over time this content should
even reach future training runs, not just live browsing.

The CLI Reference is the highest-value target

Most RouterOS LLM mistakes are property-level: wrong menu path, wrong property
name, a v6-only property, an enum typo, or a flag that simply doesn't exist. The CLI Reference documents exact menus, properties, argument types and enums — the
structured shape of the commands — which is precisely what models get wrong from
memory. Hence the prompt line "check the CLI Reference and don't invent properties."

One honest correction to the post-#2 answers: a couple of agents called the CLI
Reference "machine-generated." MikroTik's own words are more measured — auto-updating
the command reference from RouterOS source is "partially started in the CLI
reference section" (@normis). So: increasingly authoritative, not yet fully generated. Worth checking, not yet blindly trusting for the newest features.

Honest limits (why post #1 leads with a capability table)

A prompt cannot make a non-browsing model fetch a page. For plain chat
windows and local models it only induces caution. That's what Option C (paste-in / llms-full.txt) is for.
Web-browsing and agentic tools (ChatGPT browsing, Claude.ai/Code, Cursor,
Perplexity, Copilot) are where the endpoints genuinely shine — they can run llms.txt → .md → cli-reference on their own.
Always have the model cite and quote the page it read. If it can't, it didn't read it.

Small corrections to the agent answers in post #2

It's /llms.txt (plural) and /llms-full.txt — not "llm.txt". The file and the standard are both plural.
Don't hard-code specific argument-type names in your prompt (one agent guessed iface_enum / ipPrefix). The real page defines its own set (address, bool, enum, ipAddr, ipv6Prefix, …). Telling the model to "match the types and enums the CLI Reference defines" is safer than naming them.

Net: the supply side is now excellent. These prompts are just about reliably pointing the model at it — and knowing when your tool can't follow.

normis · June 5, 2026, 9:00am

Good job on the MCP server, looks awesome! Let us know if you need other data or in some better format

infabo · June 5, 2026, 11:11am

@Amm0 Do you know https://www.skills.sh/?

Amm0 · June 5, 2026, 11:33am

Do you know https://www.skills.sh/?

No. But I keep some SKILL.md files here: GitHub - tikoci/routeros-skills: Custom instruction SKILL.md for MikroTik RouterOS v7 · GitHub which are published to https://skillstore.io/skills?q=routeros&tools=claude and, I believe, would use skills.sh to install them using the skill.sh scheme:

npx skills add https://github.com/tikoci/routeros-skills

The skills are far from perfect, and not all well-structured... but they capture a few repeated AI mistakes I've seen. And I have tried to make sure there "grounded" by having agents use https://github.com/tikoci/quickchr to validate things against a "real" router... so likely not "wrong" but may be incomplete.

Amm0 · June 6, 2026, 8:21pm

Thanks! Although I do believe that prompting an LLM agent to use Docusaurs llms.txt/*.md would already do a LOT – with no MCP required. So the new manual.mikrotik.com site should go a long way since Docusaurs seems to have WAY better architecture than Confluence for ongoing improvements. Rosetta MCP's main special ability beyond "steering to Docusaurs" is now just it's indexing/search - which manual.mikrotik.com be hard to match, as well as having all the product/changelog/video linked in one search ("tool call").

Using Rosetta as /app: https://github.com/tikoci/rosetta#install-on-mikrotik-app

And it worth commending MikroTik on the new custom /app YAML that makes it trivial to install the Rosetta MCP (and bunch of other minor improvements like multiline strings in scripting so cut-and-paste an /app/add just work). So outside the package/device-mode hell dance, it's just 3 steps, see GitHub - tikoci/rosetta: MCP Server with RouterOS docs + commands + products + changelogs, using SQLite-as-RAG, sourced from MikroTik · GitHub .

This important since adding MCP Server varies a lot. But the most "universal" method is to use a URL to an MCP Server (like Rosetta) in AI agents like Claude Code, CoPilot, Cursor, etc. Since /app will wrap Rosetta in HTTPS automatically, it's also secure without any work on my part. Previous to /app using the "raw" /container system, Rosetta setup be like 10-20 steps to explain, and add another 10-20 to setup certificates for HTTPS. With /app it's 5 commands, 1 reboot, and 1 power cycle at worst.

Further, the new cmd= support allows you to use the Rosetta /app as TUI from RouterOS Terminal. Perhaps surprisingly... all the ANSI/termcap stuff work through it. Background Rosetta also support a "TUI mode" that does NOT need AI since it just exposes searching of the SQLite DB that stores all the docs/changelogs/schemas/specs/etc. and it UX is entirely based on the SAME methods that an AI agent use navigate the data (although an agent get JSON, while TUI gets ANSI). And this can be easily accessed using one command (assuming Rosetta /app is installed):

/container/shell app-rosetta cmd="/app/rosetta browse"

Will get a search interface to RouterOS docs from the WinBox/SSH/etc terminal.

Basically I did not run into any issues in packaging Rosetta as a /app. So good work on /app, too!

@rodions was already very helpful by publishing the HTML export of Confluence, which is basis of rosetta MCP. But since you ask...

1. My #1 request be a new `matrix.json` that expands the existing product `matrix.csv` that y'all long maintained on www.mikrotik.com.

Today, rosetta's "ETL pipeline" has to scrape the www.mikrotik.com for product data CSV. As well as parse EACH product pages' HTML to find the ethernet and IPSec test results, and also locae the block diagram URL from the HTML content too.

If there were process on your www backend that generated some machine-readable JSON file like matrix.json (or whatever name) with ALL the of the product details from matrix.csv, PLUS the test results as JSON array of object under each product in JSON, along with a link to PNG with block diagram (and perhaps link to PDF brochure for completeness, although rosetta does not use the brochure)

Alternative and trivial work: The new website lost the direct download link the matrix.csv, so rosetta has to emulate your LiveWire framework to fetch it (since CSV is generated in client-side JS in new website). But just a static URL like http://mt.lv/products/matrix.csv would help me. And, one line mentioning the product matrix CSV in your new llms.txt on manual.mikrotik.com give AI agent a clue where it could resolve a product name into it's CPU/switch-chip/#port/PoE-abilities/etc – without rosetta MCP – likely provide even better answers if a prompt/instruction steered toward the llms.txt
e.g.: The product data allow for AI prompts that mention a device to be resolved to their specs. So for example, a user prompting an LLM with rosetta MCP installed, can ask things like "Does the RB5009 support L3HW?", which looks up the product from SQL (which then returns the matrix.csv, "test results", and even the "block diagram" [since some LLMs can "read" an image]), and queries docs for L3HW. This important since docs reference the switch chip, but user is likely to use a router name/model. And thus allow an AI to "reason over" the L3HW docs page with router spec is way more likely to get the right answer. And answer a likely follow on question: "What is the cheapest devices with L3HW, that has at least 8 ports with 1Gb port speed?"

2. Downloadable "Captions"/subtitles and "Chapters" `.vtt` files for the "YouTube" videos

Rosetta actually also pulls in the @MikroTik YouTube channel, so the MCP know the full collection of videos. Since it indexes both the captions and "surfaces" the chapters to MCP clients, the "main" MCP function routeros_search() will also return any video results so an agent can offer a video, i.e. if task is more of an question, than request for configuration.

Today transcript/captions/subtitles from YouTube are collected by Rosetta's build using yt-dlp scrapper against the @MikroTik channel to find your videos, and then added some same full-text index (FTS5) so a "topic search" will return a videos that mention VRRP (with ordering determined by BM25 indexed ranking)

But scraping YouTube is very fragile so it's a manual process since it needs a cookie etc etc to run yt-dlp. I would have used MikroTik's TikTube site, since the underlying PeerTube seems to have a nice REST API that Rosetta could use to fetch captions and subtitle VTT files that could be using the GitHub CI. This be WAY cleaner than yt-dlp schemes that require me running them on desktop (so Rosetta may be few videos behind YouTube).

The issue is the YouTube catalog is NOT synced to TikTube... so only YouTube has the full set of videos. Now I don't specifically care about TikTube, although to me it make sense they were in-sync... But if you had the captions/chapters/subtitles in VTT form someplace else that available for download... that work equality as well as PeerTube's REST API.

How video works in the MCP server... Since the Rosetta non-AI terminal interface ("TUI") emulates an MCP client (e.g. how an AI agent would call the Rosetta MCP server) so a human can "see" the same view as future agents. So a search on "VRRP" (for example) would find the one VRRP video from routeros_search() with pointer to `, and it break response up by chapters, so it has time-based URL to sections within the video. This all works since transcripts are all fully index, so it can find related video like the "7.20 changelog" videos that mentioned VRRP changes. e.g.

image1126×810 169 KB

3. Minor, and mentioned in FEEDBACK thread, but manual.mikrotik.com should avoid any URL needing url encoding since url encoding is avoidable in Docusaurus by id in front-matter.

While Rosetta should not care about urlencoding when it eventually pulls from new manual.mikrotik.com... By avoid url encoding, you prevent an entire class of bug/issues for any programatic/agentic use of the doc pages since it's always possible something re-encode/early-decode special chars given a URL may passthrough many functions/processes (and if even one thing f*ck up encoding along the way, the URL becomes worthless downstream)

4. Docs and schemas for `/app` YAML

Despite writing a article on custom /app YAML, publishing a "JSON schema" for /app YAML, storing per-version catalogs of the "built-in" /app catalog, and even having a web-based /app YAML editor:

Yet, Rosetta MCP itself actually knows nothing about /app YAML. Part of this is lack of time, and part is that as a first principle Rosetta ONLY sources MikroTik generated content WITHOUT personal opinions/coloring. So waiting y'all to update the docs and perhaps publish an official scheme for the /app YAML

Side note: the /app YAML Editor support Google's experimental WebMCP interface for /app YAML - but WebMCP is completely different protocol from the "classic" MCP Server supported by Rosetta

normis · June 8, 2026, 2:06pm

Did you know we have a product API? Why not use that? It's in your mikrotik account in the profile info page, bottom
we don't upload / write captions always, so harder to do
We did already fix the urlencoding for docusaurus, all URLs should be clean kebab-case.

Amm0 · June 8, 2026, 3:14pm

Nope, didn't know that. But I don't see it in my account. Maybe it something that has to be enabled, like Branding Kit?

Possible I don't understand "profile info page, bottom"... but even "account information" subpage does not have any API thing, other than "Account authorization key" (now IDK what that does or where to use it either...)

But does the "Product API" have the "ethernet/IPSec test results"? IMO the test results are really helpful when planning/purchasing a device (i.e. Rosetta MCP provides the 512 byte results on a device lookup, with the full results for all packet size available on 2nd tool call).

Currently, my code has to scrape that www site to find test results, and the that's the weakest part since there are some irregularities in how test results are presented. So have them as CSV or JSON allow tools to them more directly.

And, side note, perhaps the "mikrotik account" (which I presume is www.mikrotik.com/user) needs a doc page

It's actually the chapters that aren't hard to do... But I think new videos have been pretty good about using them (although have not "audited" that).

But it's often true the videos have some factoid or "color" that's is actually not in the docs, which is why the transcripts are useful.

Perhaps some a doc page that list your video catalog? Maybe some Docusaurus plugin that generates it... Even the list, and chapters. And there are likely some videos that are NOT referenced in docs under the topic. Basically there is no "YouTube SDK" for third-parties.., so pulling video metadata from Google requires scraping, but as the account owner you can export the list/etc.

- usually it's not hard to keep up on ya'll improvements, but that was quick work.

normis · June 9, 2026, 4:12am

Try these:

API service for all product parameters: https://api.mikrotik.com/parameters?apiKey=03e64c40-2f1a-44bd-b03f-f2bcf8530d53-976715c0-9518-4e4e-906e-5ad78bfa8fbc
API service for one product: https://api.mikrotik.com/parameters?apiKey=03e64c40-2f1a-44bd-b03f-f2bcf8530d53-976715c0-9518-4e4e-906e-5ad78bfa8fbc&product=344

Amm0 · June 9, 2026, 6:22pm

Thanks, I'll take a look at those more. One question: is numerical id in product=344 idempotent ("number is always same for the same product"), or it merely an index to the last query of all records?

And... still does not have the "ethernet test results" in any machine-readable format... which I do think is very useful for comparing products.

normis · June 10, 2026, 11:55am

it's always the same, but better get the ID from the big list anyway

Amm0 · June 17, 2026, 1:53pm

Tried another quick cross-model test using both my Rosetta MCP in Claude Code (with Anthropic subscription) as well as Gemini (set the "Pro" on free plan) and ChatGPT (using it default free site) from the 7.24beta2 release note:

The obvious question which actually product models does that list of switch chip resolve?

So I prompted Claude Opus 4.8 which has a ton of pre-loaded RouterOS stuff, including my Rosetta MCP doc tool. And then tried "steering" Gemini and ChatGPT website (using logged in but "free plan" on these).

Claude Code answer...

Cross-checked. I pulled the full devices.csv catalog and matched it against
the doc-derived mapping. Here's how they line up:

CSV cross-check vs. doc mapping

┌───────────┬──────────────────────────────┬──────────────┬──────────────┐
│    Chip   │                              │     In       │              │
│  (release │       Model(s) from doc      │ devices.csv? │ CSV cpu field│
│    note)  │                              │              │              |
├───────────┼──────────────────────────────┼──────────────┼──────────────┤
│           │                              │              │ 98DX8208 ✅  │
│ 98DX8208  │ CRS309-1G-8S+IN              │ ✅           │ direct chip  │
│           │                              │              │ match        │
├───────────┼──────────────────────────────┼──────────────┼──────────────┤
│           │                              │              │ 98DX8216 ✅  │
│ 98DX8216  │ CRS317-1G-16S+RM             │ ✅           │ direct chip  │
│           │                              │              │ match        │
├───────────┼──────────────────────────────┼──────────────┼──────────────┤
│ 98DX8212  │ CRS312-4C+8XG-RM             │ ✅           │ QCA9531      │
│           │                              │              │ (mgmt CPU)   │
├───────────┼──────────────────────────────┼──────────────┼──────────────┤
│ 98DX8332  │ CRS326-24S+2Q+RM ·           │ ✅ ✅        │ QCA9531      │
│           │ CRS326-4C+20G+2Q+RM          │              │ (both)       │
├───────────┼──────────────────────────────┼──────────────┼──────────────┤
│ 98DX3257  │ CRS354-48G-4S+2Q+RM ·        │ ✅ ✅        │ QCA9531      │
│           │ CRS354-48P-4S+2Q+RM          │              │ (both)       │
├───────────┼──────────────────────────────┼──────────────┼──────────────┤
│           │ CRS504-4XQ-IN ·              │              │ QCA9531      │
│ 98DX4310  │ CRS504-4XQ-OUT ·             │ ✅ ✅ ✅ ✅  │ (CRS5xx) /   │
│           │ CRS510-8XS-2XQ-IN ·          │              │ AL73400      │
│           │ RDS2216-2XG-4S+4XS-2XQ       │              │ (RDS2216)    │
├───────────┼──────────────────────────────┼──────────────┼──────────────┤
│ 98DX8525  │ CCR2216-1G-12XS-2XQ ·        │ ✅ ✅        │ AL73400 /    │
│           │ CRS518-16XS-2XQ-RM           │              │ QCA9531      │
├───────────┼──────────────────────────────┼──────────────┼──────────────┤
│ 98CX8410  │ CRS520-4XS-16XQ-RM           │ ✅           │ AL52400      │
├───────────┼──────────────────────────────┼──────────────┼──────────────┤
│ 98DX3255  │ CCR2116-12G-4S+              │ ✅           │ AL73400      │
└───────────┴──────────────────────────────┴──────────────┴──────────────┘

Result: the answer aligns

All 14 product SKUs from the doc-based mapping exist as real products in
devices.csv — every model name checks out.

The one nuance worth calling out: the CSV's cpu column only stores the Marvell
switch ASIC for the cases where the switch chip is the main processor — the
older ARM 32-bit CRS3xx boxes (CRS309→98DX8208, CRS317→98DX8216). For every
other product, the cpu field records the management/control CPU (QCA9531,
AL73400, AL52400), and the switch ASIC isn't stored at all. That's exactly
why:

A direct CSV/device_lookup query on the chip ID only ever resolves 2 of 9
chips (8208, 8216).
The other 7 chips return nothing in a chip-string search — not because the
products are missing, but because the catalog doesn't carry the switch-ASIC
field for them.

So the two data sources are consistent: the L3HW doc table is the
authoritative chip→model source (it's the only place the switch ASIC is
recorded for QCA9531/AL-managed boxes), and the devices.csv confirms every
resolved model is a genuine SKU. No contradictions, nothing to revise in the
earlier answer.

One small data-quality note for the rosetta dataset itself: if you wanted
chip-ID lookups to work directly for all these switches, the devices table
would need a separate switch_chip column distinct from cpu — right now those
two facts are conflated into one field.

ChatGPT combine the results with a quick summary:

Short answer: Claude's answer is the strongest, Gemini's is nearly correct, and my answer was functionally equivalent to Claude's for the original question.

The key distinction is what question was being answered.

If the question is:

"What specific MikroTik products correspond to the switch ASICs listed in the 7.24beta2 release note?"

Then the canonical answer is:

ASIC	Products
98DX8208	CRS309-1G-8S+IN
98DX8216	CRS317-1G-16S+RM
98DX8212	CRS312-4C+8XG-RM
98DX8332	CRS326-24S+2Q+RM, CRS326-4C+20G+2Q+RM
98DX3257	CRS354-48G-4S+2Q+RM, CRS354-48P-4S+2Q+RM
98DX4310	CRS504-4XQ-IN, CRS504-4XQ-OUT, CRS510-8XS-2XQ-IN, RDS2216-2XG-4S+4XS-2XQ
98DX8525	CCR2216-1G-12XS-2XQ, CRS518-16XS-2XQ-RM
98DX3255	CCR2116-12G-4S+
98CX8410	CRS520-4XS-16XQ-RM

Claude, your Rosetta MCP, and my answer all agree on that table.

Where Gemini is slightly weaker

Gemini says:

"98DX3257 – CRS354-48G-4S+2Q+RM (and potentially its PoE variant CRS354-48P-4S+2Q+RM)"

The word "potentially" is unnecessary.

The MikroTik L3HW documentation explicitly associates the 98DX3257 family with both CRS354 variants. So Claude's answer is more precise here.

Gemini also omitted:

CRS504-4XQ-IN
CRS518-16XS-2XQ-RM
RDS2216-2XG-4S+4XS-2XQ

Those omissions matter because the release note names ASICs, not product families.

Where Claude goes beyond the original question

Claude's table includes a second level of validation:

cross-checking against devices.csv
noting when the ASIC appears in the CPU column versus being a companion switch ASIC
identifying management CPUs (QCA9531, AL73400, etc.)

That's useful metadata, but it isn't actually required to answer the release note.

For example:

CRS309 uses the 98DX8208 as the main SoC.
CRS312 uses a QCA9531 management CPU plus a 98DX8212 switch ASIC.

That distinction explains why some MikroTik inventories can be confusing, especially when you're mining devices.csv.

One thing Claude caught that many models miss

The release note says:

added HW offloaded VRF support on ... switches

Yet the list contains:

CCR2116 (98DX3255)
CCR2216 (98DX8525)

Those are routers, not switches.

The only way the release note makes sense is if MikroTik is referring to the Prestera switch ASICs that exist inside those CCR platforms, not the product category itself.

That's exactly how I interpreted it as well.

If I were ranking them

For RouterOS/MikroTik accuracy specifically:

Claude + Rosetta MCP — best answer because it validated against MikroTik-specific structured data and exposed the chip/CPU nuance.
My answer — correct mapping, but lacked the extra validation layer.
Gemini — mostly correct but omitted several products and expressed uncertainty where the docs appear definitive.

What I find more interesting is not the final table—which all three systems mostly agree on—but why Claude got there. The Rosetta MCP appears to have access to a normalized device database (devices.csv) plus documentation lookup, which is exactly the sort of grounding that prevents the classic RouterOS failure mode you've complained about before: mixing switch ASICs, management CPUs, and RouterBOARD product names from old wiki-era training data.

For MikroTik questions, a model with access to:

product.csv / devices.csv
manual.mikrotik.com .md
llms.txt

will almost always beat a foundation model relying on pretraining alone, because the hard part isn't networking knowledge—it's the ever-changing MikroTik hardware matrix.

sirbryan · June 17, 2026, 3:38pm

It would be cool to see a tool where you could plug in the release notes for a particular version (or family of versions, like all of 7.21.x) and have it spit out what actual devices are affected by each release note line item--an expansion of what you've demonstrated here.

Similar to what your exercise was, but with the ability to filter out what's pertinent to either 1) your hardware or 2) your hardware that's running a particular version, or, with even more detail, 3) which of your hardware running a specific version and configuration.

A user could upload their configs (or a supout) to a model, and it would peruse it, compare release notes of the previous version or two, and any follow-up versions, then point out what releases have fixes for their specific hardware/software/configuration.

To then scale that out to a system that monitors your devices, and any time a release comes out, it hashes through all the data and makes recommendations (i.e. 7.24.5 would be best on your 2116's because of X, and 7.21.5 on all of your CRS3xx's because of Y--except for this one, which should move to 7.22.x...).

Or is that what you're working towards?

Amm0 · June 17, 2026, 7:44pm

If you want agent to know more about actual devices, there is a "hidden" page with a CSV of all products:
https://mikrotik.com/products/matrix
This has a rich table most LLM could consume, and any agent have much better chance to "reason" through the new .md docs mention in top post, when it has the row from the product.csv. e.g. it know your models #ports, switch chip,, arch, etc. etc. etc. - so when AI is reading the manual.mikrotik.com, it has YOUR devices CPU/arch/etc in the current context which is going to prevent guessing from training.

My Rosetta MCP makes this more efficient and more automatic... since product CSV is a database table... but I can assure everyone an agent know the specs for your model is going to dramatically improve results (however it get its).

I commented on the manual.mikroitk.com and suggested a Docusaurus plugin that resolves between switch/CPU chips and models and their abilities (PTP, HWQoS, PoE, "beeper", etc) would be useful so the doc pages have the mapping someplace already... But the release note above is the proof-point of why. Although the release notes are not index by Docusaurus today, some principle applies.

Well, you highlighted a weakness in my AI toolchain, nothing knows about a supout.rif - but that's actually a good idea for my tools to know about. Since it's information dense so an agent can more easily check all the various bits (e.g. the logs and other Linux-like stats shown in other sections). I'd never thought about it.

Rosetta MCP has all of the restraml schema tools data build - indexed by RouterOS version. So it has a fair chance of using the command/attribute data to verify a config file, on top of all the potential sources to check. For example, not only release notes, docs, matrix.csv, etc. but Rosetta even indexes the YouTube video transcripts, like the "What's new in 7.2x" series of videos. e.g. Rosetta is a "one stop shop" for RouterOS information, all indexed (and AI agents pretty good at constructing cleaver queries based on your prompt) and allow for "progressive discovery" (e.g. first result returns points to various sources for the search string)

Data Source	Coverage
Documentation pages	317 pages (~515K words) from the retired help.mikrotik.com Confluence export
Property definitions	4,860 with types, defaults, descriptions
Command tree	5,114 commands, 551 dirs, 34K arguments
Version history	46 RouterOS versions tracked (7.9–7.23.1)
Hardware products	144 devices — specs, pricing, block diagrams
Performance benchmarks	2,874 tests across 125 devices (ethernet + IPSec)
YouTube transcripts	518 videos, ~1,890 chapter-level segments
Callout blocks	1,034 warnings, notes, and tips

And depending on your AI tools, there is often some schedule tasks that could say check daily the mikrotik site, and then review some directory of your config files against any new releases notes and/or documentation updates.

And the later goes to @fischerdouglas commentary in new manual FEEDBACK thread about the changes to docs being a important piece of metadata, since if you do run on it on a schedule and if agent has already done some comprehensive review of config/monitoring-data/etc., it only need to confirm things that were updated. So the /console/inspect/"CLI Reference" knows the command tree... the "prose" in the docs related to those changes is often equality important.

At this point, I've been more interested in developing AI-enabled RouterOS TOOLS that could underpin "higher level" things like UI or site-specific tooling or whatever. Since I started a couple year back cateloging the RouterOS scheme, I've largely been trying to wrap that all up (and more) for AI. Which has necessitated some thinking about "what agent an needs to work with RouterOS"....

Rosetta MCP is really just SQLite DB with all of MikroTik "stuff" and MCP just wires up SQL queries to it for an AI. Or you can use bunx @tikoci/rosetta browse to get a non-AI TUI which also does searches.. Since it has an AI-friendly interface, you avoid agents having to parse something like manual.mikrotik.com/llms.txt - which will help a lot over nothing - but "digging deeper" is harder using web retrieval.

But mainly I've been working on the other part of my RouterOS AI tools "trilogy", which are two CLI tools that human or code-based agent can easily interact with RouterOS, both real ones and "fake" ones for testing.

`quickchr` — Fastest path to MikroTik RouterOS CHR instance for testing and scripts

So one tool is quickchr, which allows an agent (or human) to easily bring up a CHR at specific version & fully provisioned (device-mode, packages, enable trail license) in 30s to 1m. I have a lot of unit/integration tests in various project that are based on quickchr since it work as both GitHub runner (which in turn is used to build the schema used by Rosetta MCP) & quickchr also run locally (any Mac, Windows, or Linux with QEMU stuff installed).

But essentially, a human run bunx @tikoci/quickchr setup which bring up a wizard for bring up CHR at version/provisioned via a wizard.

wizard-demo

Or an AI agent can construct a more complex CLI with args to bring up what it want to test (assuming you prompted it to use quickchr to validate something). Overall, quickchr is pretty solid at this point.

The networking scheme is a bit "funky" at first, since out of the box it prefers QEMU user-mode networks to avoid needing sudo or root. This also ensures it's "easy" to start, without needing to provide an LLM creds or constructing complex networking setups, etc. e.g. if you want to test a commands or config, you may not need a high powered CHR to just know something "works". See networking doc: quickchr/docs/networking.md at main · tikoci/quickchr · GitHub on the more esoteric networking schemes employed. There is TUN/TUN support as well as first-class Mac support. e.g. can use the native Hypervisor framework (via QEMU) to access bridged or "shared" network - this requires sudo once to setup, but then can be used without root when a agent/human wants to use quickchr

And there are some example usages here: quickchr/examples at main · tikoci/quickchr · GitHub

`centrs` - Multi-protocol (REST, native API, mac-telnet, SSH) CLI and TypeScript interface to RouterOS, with rich help and validation

This one is "functional" but still a WIP since trying to work through the exact interface and code validation and reverse-engineering things like mac-telnet or btest. There are quite a few concepts bundled into centrs but essentially it way an agent (or human) can actually "interact" (read or write) to a real RouterOS device.

For example, it can use any protocol, including L2 mac-telnet, to access a router. To enable more seamless and safer authentication centrs can use a .CDB file (WinBox's password storage), either its own or theoritically shared with WinBox one (if you grant permissions for that). And before any command is issued, it's checked against :parse and/or /console/inspect so that there is "feedback" before a command issued in the form of an envelope with errors/warning/tips (see centrs/docs/errors at main · tikoci/centrs · GitHub ).

The current state of centrs is tracked here: centrs/docs/MATRIX.md at main · tikoci/centrs · GitHub with the full concept explained in centrs/docs/CONSTITUTION.md at main · tikoci/centrs · GitHub . Including future being an "MCP Server" (like Rosetta) to avoid needing to use CLI (but noting that agents are often just as good at using a CLI, if available, as an MCP server). Also considering packaging it as a /app to allow it working with a single router. The MCP server is more complex since centrs via HTTPS needs to implement OAuth2 front-end to RouterOS auth so it could be done safely - since Rosetta never interacts with a router (and is really a front-end to a read-only database) it avoid all the thorny authentication issues. Since centrs is designed to be "safe" (by at least some rational definition) while still writing to RouterOS, it's a bit trickier than Rosetta...

But the basic workflow is an agent can "do homework" using `bunx @tikoci/rosetta`, "try it out" using `bunx @tikoci/quickchr`, and then "safely perform the action" on a real device using `bunx @tikoci/centrs`, with a lot of "common stumbling blocks" for AI agent +RouterOS hopefully avoided.

Amm0 · June 18, 2026, 5:32pm

LOL. So after my post above, had some updates centrs to "push". The build failed, strangely a DNS error. My DNS works, but from forum there were more wide-spread troubles with *.mikrotik.com that seeming infected GitHub's "Runners" (containers that build/test code in GitHub [MS/Azure] cloud). That's because in "AI trilogy of tools", quickchr is needed to test centrs (but getting a DNS failures downloading images at the current stable/long-term), and rosetta needed to download the doc archive from box.mikrotik.com.

Some might find it interesting look at how the "clunker" handled yesterday's DNS outage:

centrs spotted the DNS issue since integration tests failed (all code is well tested, and reviewed by 2 other "code review" agents, before pushed to main). Since centrs immediate goal was getting the unit/integration test to pass, Claude Code offered to hack /etc/hosts since it thought it was an IPv6 on GitHub issue (it wasn't, but the runner didn't not have IPv6, so not a bad a guess). So I "referred" the matter to quickchr.
quickchr, after prompting, took a deeper look at the problem since its directives essentially steer towards being "always grounded & robust/rich in error handling", especially since it's used in other projects as the test harness for RouterOS in GitHub Actions runners. So the centrs failures were a "bug" in quickchr. Which then triggered it's instructions to "own" being the "CHR expert", so why IPv6 & GitHub runners were not working become it problem, since it should know about CHR related topics. So it's analysis lead to this "memory" that guided the eventual fix:

---
name: chr-ci-ipv6-dualstack
description: "Why CHR-integration on GitHub Actions fails at quickchr resolveVersion — runner DNS returns ESERVFAIL for *.mikrotik.com; FIXED in quickchr via public DNS, not centrs"
metadata: 
  node_type: memory
  type: project
  originSessionId: 18a92444-2298-4743-9fdd-0d554dec0b19
---

**Root cause (confirmed by on-runner probe 2026-06-16, supersedes the earlier
IPv6 theory):** GitHub-hosted `ubuntu-latest` runners' **system DNS resolver
returns `ESERVFAIL` for `upgrade.mikrotik.com` / `download.mikrotik.com`** —
slowly (2–26 s) — via *both* getaddrinfo and c-ares-over-resolv.conf. So a plain
`fetch` times out resolving, or (when the stub hands back only the unreachable
AAAA) fails with Bun's `errno: 0` ConnectionRefused / FailedToOpenSocket. A
**direct query to a public resolver (1.1.1.1 / 8.8.8.8) answers in ~10 ms**.
The "IPv6 happy-eyeballs no-egress" framing was a **red herring** — the fault is
DNS at the runner's stub, not IPv6 egress. This also explains why `getent
ahostsv4` returned empty on the runner (same broken stub).

This killed quickchr's `resolveVersion()` before any CHR booted, failing every
CHR test in the QA `chr-integration` job. Surfaced 2026-06-16 by the first
manual `workflow_dispatch` QA runs that forced CHR-on-GHA to reach MikroTik
(push/PR runs gate the job off, so it had never resolved a version before).

**This is a quickchr bug, NOT a centrs bug — do not work around it in centrs.**
Fixed in quickchr `fetchResilient()` (`src/lib/net.ts`, tikoci/quickchr#1):
resolves the A record via `dns.Resolver.setServers([1.1.1.1,8.8.8.8,1.0.0.1])`
(public DNS, 3 s timeout), connects to the IPv4 literal with `Host` + TLS SNI
preserved, and falls back to a normal fetch when public DNS is blocked. Verified
green end-to-end on the real runner (quickchr's Integration job downloads CHR +
boots via QEMU). centrs needs **no** workflow change — just bump
`@tikoci/quickchr` once the fix is released (>0.4.0).

**Rejected workaround (don't repeat):** an `/etc/hosts` IPv4 pin via
`getent ahostsv4 <host>` in qa.yaml — wrong layer (belongs in quickchr) **and**
non-functional (hit the same broken stub resolver → empty). centrs PR #53 was
closed unmerged.

**How to apply:** if a sister project's CHR-in-CI dies at `resolveVersion`
(slow `ESERVFAIL` or `errno: 0`), route it to quickchr — review/extend
`fetchResilient` there, don't patch the consumer's workflow. See
[[june-gloom-phase2-ci-techdebt]].

Since quickchr rejected centrs hack, and planned to add fallback code to resolve *.mikrotik.com using 1.1.1.1 or 8.8.8.8 or 1.0.0.1... so future DNS issue try alternative resolution and be more IPv6 vs IPv4 aware. But this "PR" shows how the system works to implement the fix... So this included Claude Opus coding the fix to quickchr, while CoPilot (using GPT 5.5) reviews the code changes, with a third-party CodeRabbit code review agent (free from OSS) provides a "third eye" on the changes...

github.com/tikoci/quickchr

fix(net): IPv4 fallback for downloads on IPv6-less egress (CI runners) (#1)

main ← fix/resilient-fetch-ipv4

opened 10:17PM - 16 Jun 26 UTC

mobileskyfi

+315 -10

## Problem A sister project (centrs) hit repeated CI failures where every CHR t…est died at quickchr startup (`resolveVersion`, `errno: 0`). The original "IPv6 no-egress / happy-eyeballs" theory turned out to be a **red herring**. An on-runner DNS probe (GitHub `ubuntu-latest`, 2026-06-16) found the real fault — **the runner's system resolver returns `ESERVFAIL` for `*.mikrotik.com`**, slowly, via every system path: | Resolver | Result | |---|---| | `dns.lookup({family:4})` (getaddrinfo) | ❌ `ESERVFAIL` ~9 s | | `dns.lookup({all})` (getaddrinfo) | ❌ `ESERVFAIL`/`ETIMEOUT` 22–26 s | | `dns.resolve4` (c-ares → resolv.conf) | ❌ `ESERVFAIL` 2–22 s | | normal `fetch` | ❌ times out | | **`Resolver([1.1.1.1, 8.8.8.8]).resolve4`** | ✅ **~10 ms** → `159.148.147.251` | So a plain `fetch` times out resolving, or (when the stub hands back only the unreachable AAAA) fails with Bun's `errno: 0`. The only thing that works is querying a **public resolver directly**. This also explains the centrs `getent ahostsv4` → empty result (same broken stub). ## Fix New `fetchResilient()` (`src/lib/net.ts`), used at the three external call sites (`versions.ts`, `images.ts`, `packages.ts`; `rest.ts`/local 127.0.0.1 untouched): 1. Resolve the A record by querying **public DNS directly** — `dns.Resolver` with `setServers([1.1.1.1, 8.8.8.8, 1.0.0.1])`, bounded by a 3 s timeout — bypassing the runner's broken `resolv.conf` stub. 2. Connect to the **IPv4 literal** with the `Host` header and **TLS SNI** preserved so certificate validation still passes. 3. **Fall back** to a normal `fetch` (system resolver) when public DNS is unavailable (e.g. a network that blocks public resolvers), so it stays correct off CI. 4. HTTP responses and aborts/timeouts pass through unchanged — never retried. ## Verified - **CI on the real runner is green end-to-end**, including the **Integration job** that downloads CHR images and boots them via QEMU (the exact path that was failing): 11/11 + 6/6 + 7/7 + 6/6, 0 fail. - Unit suite **555 pass / 0 fail** (Linux + Windows); `biome`+`tsc` clean; `cspell` clean. - `test/unit/net.test.ts` (9 cases): URL rewriting, connection-failure classification, IPv4 happy path (asserts Host + SNI), 5xx pass-through, no-public-DNS-answer fallback, IPv4-socket-failure fallback, abort pass-through. Download suites mock the public-DNS lookup to stay network-free. ## Docs DESIGN.md decision #9 records the measured root cause and a **sister-project routing rule**: a CI-only `ESERVFAIL`/`errno: 0` download failure is a quickchr concern — don't work around it in the consumer's workflow. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

rosetta does not use quickchr, rather it needs to download the HTML pages from box.microtik.com which was failing yesterday. Since it was pretty clear the issue was on the MikroTik side, and figured it be fixed. I ignored the first failure (but kinda proved it was not a quickchr): Release · tikoci/rosetta@990ca32 · GitHub so I could re-test once report from forum showed it was likely fixed, which it was this morning PDT: Release · tikoci/rosetta@990ca32 · GitHub

Amm0 · June 24, 2026, 4:36pm

I have another project, GitHub - tikoci/bench-routeros-tools: Benchmark RouterOS agent-support strategies across MCPs, skills, and retrieval. · GitHub that I've been testing various models and tools against "problematic for AI" RouterOS tasks. It a bit of mess if your try to read the docs, but the there are actually a few different tests performed, and importantly I'm still trying to structure them into proper benchmarks. The difficult part if identify potential test than can be "scored".

I added the "steering to manual.mikrotik.com" recently so see how it compares to "baseline" (no prompting, no tools), using Rosetta MCP, and my routeros-skills. So tikoci/bench-routeros-tools now add using the prompt in top post as test cases. I have run it only on Claude Haiku (there "cheap" model).

The raw data is stored in the GitHub, so more analysis can be done on past runs in future. _I'll re-run on Sonnet and/or Opus, but that burns credits/hits limits, so plan to review the current set to tweak tests. But I had Opus (high-end model) to review the results of the tests, quoted in part below (and see CORPUS.md linked below to convert the "short-hand names" for the test cases)

Claude Opus's Report on Haiku-based test

The sharpest cross-cut is vendordoc-steer ("Steering AI to use new manual.mikrotik.com"): it has the best fabrication discipline of any column (31/33 syntax-valid — reading the real manual kills invented args) and it almost always actually fetched the page — yet it still misses all three device-truth traps (blackhole, unreachable, dhcp-default). So reading the vendor manual ≠ knowing device truth: the page doesn't make the bare-flag-vs-type=, the deleted-unreachable, or the disabled=yes default legible enough to change behavior.

Net: at Haiku the ranking is rosetta > steer > baseline > skills, but the headline isn't the ranking — it's that the four real discriminators (two route traps + dhcp default + wg) all point at the same gap, and only the validate → run device tier closes the part that grounding can't. Honest-matrix bonus: the nine false route-unreachable "perfect"s are gone, so the per-column totals above are now real.

HELP WANTED — Looking for suggestions for tests where AI "gets it wrong"...

Please comment below with what prompt you tried, the results, and what you'd expected. I'll try to add more cases to my benchmarks. The current benchmark have the following test cases:

github.com/tikoci/bench-routeros-tools

docs/CORPUS.md

main

# Task Corpus — what the benchmark actually tests

The benchmark is driven by a single task corpus, [`tasks/corpus.yaml`](../tasks/corpus.yaml):
**49 RouterOS v7 tasks**, each a natural-language intent plus the device-true
"gold" command(s) and metadata (target menu areas, safety class, state
dependency, version scope, optional decoys/forbidden commands).

Two consumers read it:

- **Structural metrics** (model-free) run over **all 49** — token cost, tool-selection
  ambiguity, retrieval coverage, command-syntax validity, the deterministic scorer.
- **The live ladder** (`harness/live/run_live_ladder.py`) sends a balanced
  **11-task subset** through real `claude -p` generations, scored by the same
  scorer and CHR `/console/inspect` validator. See
  [`REPORT_LIVE.md`](REPORT_LIVE.md).

> Scope is RouterOS **v7** (validated on CHR 7.22.1 / 7.23.1). The corpus
> intentionally mixes "everyone should get this" baseline config with a small set
> of **config traps** that discriminate the augmentation strategies.

This file has been truncated. show original

These can be run against a variety of models/harness. But most recents tests are the "live results" series using claude -p, with options to control how the context gets loaded so as not pollute it with any local learns/skills/~CLAUDE/etc.

Larsa · June 24, 2026, 5:34pm

Wow, a NETLLMBENCH-like benchmark for MikroTik/RouterOS!

I really like the “knows the docs” vs “knows the device truth” distinction. That is exactly where LLMs tend to get a bit too confident with RouterOS: the answer can look right, cite the manual, and still fall over the moment it hits an actual device.

The route traps, DHCP default behavior, and WireGuard cases sound like great discriminators. And yes, finding tests that are realistic, repeatable, and objectively scoreable is probably the hard part. That is also closely related to the main problem with fine-tuning: getting proper training/evaluation data, with cases where there is a clear expected result instead of just plausible-looking answers.

I think these are basically the same challenges everyone has to tackle when building the whole ecosystem around an effective model. Getting a working RAG setup is only one part of it; the biggest challenge of all is still the training and evaluation data.

For RouterOS, some kind of real or emulated device validation feels almost unavoidable. Otherwise it is too easy to benchmark “plausible-looking answers” instead of operational correctness.

Very cool project indeed!

Amm0 · June 24, 2026, 11:07pm

Currently the "benchmarks" are a WIP, and more designed to answer some questions and identify/quantify things with some structure, and a strong focus on the effects of tools. Also, I'm limited by my various LLM subscriptions time-based limit (since I don't want give Anthropic/MS/friends a blank check for direct SDK usage). i.e. My initial need was to guide future tool development, and understand things at more general level... than to create a strict/durable apples-to-apples comparisons.

Basically bench-routeros-tools is more a "QA/usability engineer" for my rosetta MCP, at this point. e.g. Ensure if various tools are "useful".

I'll gather more data on the simple "Steering to manual" discussed in my top post, but the current result to show it "better than nothing" but not as good as having some additional tools/CLIs/skills wired into your agent harness. The "steering" has only been checked against a low-cost model, Claude Haiku... so there is not data in benchmarks if it helps high-end model like Opus or GPT 5.5 Pro.

Note that the "skill tests" in benchmarks use my @tikoci/routeros-skills but those do not actually incorporate any prose that help in any of the "traps"... so the fact they get "low marks" from bench-routeros-tools is not surprising, and likely be at the top of IF they actually had some SKILL.md content about these "known difficulties". e.g. they know more about REST, CHR, command structure, etc. etc. than configuration tasks.

A related paper, A Network Arena for Benchmarking AI Agents on Network Troubleshooting highlight a more practical use case than validating topologies, and focus more on processing support tickets. In particular, it has a table of the tests it's using that might be useful to incorporate into my RouterOS benchmarks, see https://arxiv.org/html/2512.16381v1#A1.T3

This take some work to actually map those into RouterOS config needed (both initial and failing case) and new scripts needs to validate success/failure of them... but quickchr should be able model the needed topology.

I have quickchr which is sophisticated CLI to bring a provisioned CHR that's "ready-to-use" by an agent. Certainly useful to humans, but a lot of the design went it making it easy for agents. Not requiring root and fixed control plane using port mapping are the core things. With the networking schemes designed around not requiring sudo (using sockets, "user-mode" networking, and L2 via "socket connect"). Plus with routeros-skills loaded, an agent add an understanding of TZSP, REST API and command structure that then "do stuff" with the quickchr test router.

Basically an agent does not have know much about setting up CHR RouterOS, since that's all taken care of by quickchr, including trial licensing, device-mode, extra-packages, extra disks.

NETLLMBENCH uses https://www.kathara.org :

Finally, in the emulation phase, once the LLM under Test’s
outputs pass format verification, they are input into a custom-
built emulator, developed using Kathará [14], which replicates
a networking environment to practically test these outputs.
This emulation phase is crucial for verifying the practical
applicability of the LLM under Test’s responses.

And quickchr essentially does the same for RouterOS:

Currently, the benchmark were checked against a "real" CHR as part of the development and ad hoc evaluation of the results. But not used during benchmarking. But the thought is "more complex" challenges (like fully provisioning clan-enable bridging) could be checked using quickchr as part of the benchmark and validated using ping/etc after to see if "whatever" the agent did "worked" (vs today where it seeing if one-line commands are correct).

While Kathará (used by NETLLMBENCH) is similar to quickchr... Compare Kathará's man page vs quickchr MANUAL.md. So there are some differences too:

quickchr use QEMU directly while Kathará uses docker. IMO, QEMU is better approach to this for a few reasons:

#1 being RouterOS in "docker still requires QEMU, so this skips the middleman. And thus an agent has access to all the "raw" QEMU machines things like the "QEMU Console" that allows inspecting any HW-like thing, including even running gdb that allow "debugging" (at assembly level) of RouterOS.
QEMU has not changed much in decades, any LLM knows how to use it while "docker" has more variation on a lot of fronts
QEMU qcow disks can use "snapshots" as part of any tests to rollback/forward changes during testing, with quickchr providing direct CLI access to using them
quickchr and QEMU are "more cross platform" than docker, since for QEMU CLI is same on Windows, Mac, and Linux, with all support "socket mode" and SLiRP "user mode" networking. And for "advanced" cases... quickchr uses it's "custom" generic mapping for a "shared network" and "bridge network" so the quickchr CLI is "portable. (e.g. the actual QEMU CLI does vary by platform like if your using TAP vs vmnet... BUT if quickchr's generic mapping is used, and OS is setup with the same named bridge/shared network, same CLI will work)

Kathará has some support for "orchestration", while quickchr leaves this up to agent/human design a test script. My thought was at some point automating GNS3/similar may be better approach, so quickchr focus more on providing simple but "deep access" to a single CHR. So instead of CLI/API support for multi-router setups, quickchr has some examples. Like the matrica example that takes a starting config, loaded it on long-term then checks to see if it changes between any of the channels up to development, with idea being a future agent can "copy the example" to construct an some multi-router orchestration.

That why I have the "AI trilogy of tools" described in post above: rosetta <-> centrs <-> quickchr.

Although the broader question is how could RouterOS become more "AI friendly" to begin with, instead of adapting tools/prompts/skills/MCPs around the current particulars.