team work :parse on special parameters

Sertik · December 23, 2022, 7:00am

Imagine that there is a Microtic Telegram parser capable of performing the functions of a user from a chat. That is, we report something like myFunc par1 par2 … parN in the chat, the parser transmits this to the Microtic and it executes.

[:parse ":global $funcName; [\$$funcName $parameters]"]

At the same time, I form a string for :parse, in which I pass the name of the executable function to $funcName, and its parameters to $parametrs. Of course, the parameters of the function can be different: positional and named, but only string parameters are incorrectly passed (:type “str”)
All this works well for me. But only if the parameters are set in Latin. As soon as I try to pass parameters in the national language (for example in Russian), the construction stops working. I tried to recode the parameters to UTF8, but it doesn’t help.

How do I pass parameters in the national language to $parameters ? Or is it impossible ?

I ask Rextended to help me.

rextended · December 23, 2022, 8:17am

RouterOS accept only 7-bit characters:
NUL@ SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI
DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC[ FS\ GS] RS^ US_
SP ! " # $ % & ’ ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~ DEL
Notice:not all control codes from NUL to US do something.

Any other character is unsupported.

Ç ü é â ä à å ç ê ë è ï î ì Ä Å
É æ Æ ô ö ò û ù ÿ Ö Ü ¢ £ ¥ ₧ ƒ
á í ó ú ñ Ñ ª º ¿ ⌐ ¬ ½ ¼ ¡ « »
░ ▒ ▓ │ ┤ ╡ ╢ ╖ ╕ ╣ ║ ╗ ╝ ╜ ╛ ┐
└ ┴ ┬ ├ ─ ┼ ╞ ╟ ╚ ╔ ╩ ╦ ╠ ═ ╬ ╧
╨ ╤ ╥ ╙ ╘ ╒ ╓ ╫ ╪ ┘ ┌ █ ▄ ▌ ▐ ▀
α ß Γ π Σ σ µ τ Φ Θ Ω δ ∞ φ ε ∩
≡ ± ≥ ≤ ⌠ ⌡ ÷ ≈ ° ∙ · √ ⁿ ² ■ NBSP
The unsupported 8-bit characters can be represented as HEX values, from \80 to \FF

€ 0x81 ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ 0x8D Ž 0x8F
0x90 ‘ ’ “ ” • – — ˜ ™ š › œ 0x9D ž Ÿ
NBSP ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ SHY ® ¯
° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
0x81, 0x8D, 0x8F, 0x90 and 0x9D are unused on CP1252

Ђ Ѓ ‚ ѓ „ … † ‡ € ‰ Љ ‹ Њ Ќ Ћ Џ
ђ ‘ ’ “ ” • – — 0x90 ™ љ › њ ќ ћ џ
NBSP Ў ў Ј ¤ Ґ ¦ § Ё © Є « ¬ SHY ® Ї
° ± І і ґ µ ¶ · ё № є » ј Ѕ ѕ ї
А Б В Г Д Е Ж З И Й К Л М Н О П
Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я
а б в г д е ж з и й к л м н о п
р с т у ф х ц ч ш щ ъ ы ь э ю я

rextended · December 23, 2022, 8:31am

UTF-8 is a mix of 8-bit characters for represent all possible languages.

How send 8-bit character to be used with RouterOS?

You must transform the character to HEX equivalent,
for example ±, common on both Latin CP1252 and Cyrillic CP1251 must be converted from UTF-8,
and on UTF-8 that character use two bytes: ± = “\C2\B1”
So, the program that send the string to RouterOS must convert all non-7-bit characters from UTF-8 to escaped HEX sequences.

Fo convert Привіт to one MikroTik usable string:
П = \D0\9F
р = \D1\80
и = \D0\B8
в = \D0\B2
і = \D1\96
т = \D1\82
Привіт = “\D0\9F\D1\80\D0\B8\D0\B2\D1\96\D1\82”
But obviously the conversion must happen before the RouterOS is involved on any way.

Or more “simply” (for all languages) use directly only 7-bit characters:
Привіт => Pryvit

The same is for emoticons, at the end are siply characters with specific design.
“@anav ” = “@anav \F0\9F\8D\81”

Sertik · December 23, 2022, 9:22am

Rex, thank you for your detailed answer. I understand everything you wrote.
To convert the parameters to UTF8, I used this function. It does a great job when you need to send CP1251 to Telegram, but it doesn’t work when I try to use it for :parse

 Function of converting string CP1251 to UTF8
    # https://forummikrotik.ru/viewtopic.php?p=81457#p81457
:global FuncCP1251toUTF8
:if (!any $FuncCP1251toUTF8) do={:global FuncCP1251toUTF8 do={
        :local cp1251 [:toarray {"\20";"\01";"\02";"\03";"\04";"\05";"\06";"\07";"\08";"\09";"\0A";"\0B";"\0C";"\0D";"\0E";"\0F"; \
                                 "\10";"\11";"\12";"\13";"\14";"\15";"\16";"\17";"\18";"\19";"\1A";"\1B";"\1C";"\1D";"\1E";"\1F"; \
                                 "\21";"\22";"\23";"\24";"\25";"\26";"\27";"\28";"\29";"\2A";"\2B";"\2C";"\2D";"\2E";"\2F";"\3A"; \
                                 "\3B";"\3C";"\3D";"\3E";"\3F";"\40";"\5B";"\5C";"\5D";"\5E";"\5F";"\60";"\7B";"\7C";"\7D";"\7E"; \
                                 "\C0";"\C1";"\C2";"\C3";"\C4";"\C5";"\C6";"\C7";"\C8";"\C9";"\CA";"\CB";"\CC";"\CD";"\CE";"\CF"; \
                                 "\D0";"\D1";"\D2";"\D3";"\D4";"\D5";"\D6";"\D7";"\D8";"\D9";"\DA";"\DB";"\DC";"\DD";"\DE";"\DF"; \
                                 "\E0";"\E1";"\E2";"\E3";"\E4";"\E5";"\E6";"\E7";"\E8";"\E9";"\EA";"\EB";"\EC";"\ED";"\EE";"\EF"; \
                                 "\F0";"\F1";"\F2";"\F3";"\F4";"\F5";"\F6";"\F7";"\F8";"\F9";"\FA";"\FB";"\FC";"\FD";"\FE";"\FF"; \
                                 "\A8";"\B8";"\B9"}];
        :local utf8   [:toarray {"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"000A";"0020";"0020";"000D";"0020";"0020"; \
                                 "0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020";"0020"; \
                                 "0021";"0022";"0023";"0024";"0025";"0026";"0027";"0028";"0029";"002A";"002B";"002C";"002D";"002E";"002F";"003A"; \
                                 "003B";"003C";"003D";"003E";"003F";"0040";"005B";"005C";"005D";"005E";"005F";"0060";"007B";"007C";"007D";"007E"; \
                                 "D090";"D091";"D092";"D093";"D094";"D095";"D096";"D097";"D098";"D099";"D09A";"D09B";"D09C";"D09D";"D09E";"D09F"; \
                                 "D0A0";"D0A1";"D0A2";"D0A3";"D0A4";"D0A5";"D0A6";"D0A7";"D0A8";"D0A9";"D0AA";"D0AB";"D0AC";"D0AD";"D0AE";"D0AF"; \
                                 "D0B0";"D0B1";"D0B2";"D0B3";"D0B4";"D0B5";"D0B6";"D0B7";"D0B8";"D0B9";"D0BA";"D0BB";"D0BC";"D0BD";"D0BE";"D0BF"; \
                                 "D180";"D181";"D182";"D183";"D184";"D185";"D186";"D187";"D188";"D189";"D18A";"D18B";"D18C";"D18D";"D18E";"D18F"; \
                                 "D001";"D191";"2116"}];
        :local convStr ""; 
        :local code    "";
        :for i from=0 to=([:len $1]-1) do={
            :local symb [:pick $1 $i ($i+1)]; 
            :local idx  [:find $cp1251 $symb];
            :local key  ($utf8->$idx);
            :if ([:len $key] != 0) do={
                :set $code ("%$[:pick ($key) 0 2]%$[:pick ($key) 2 4]");
                :if ([pick $code 0 3] = "%00") do={:set $code ([:pick $code 3 6])}
            } else={:set code ($symb)}; 
            :set $convStr ($convStr.$code);
        }
        :return ($convStr);
    }
}

First, let’s say I recode a string with parameters, passing it through the function FUNCTIONP1251TOUTF8,

:local $parameters [$FuncCP1251toUTF8 $parameters]

and then I give it to :parse. Isn 't that right ?

[:parse ":global $funcName; [\$$funcName $parameters]"]

But, parameters containing Russian text are not transmitted.

rextended · December 23, 2022, 9:47am

Again?

Sertik · December 23, 2022, 9:57am

But obviously the conversion must happen before the RouterOS is involved on any way.

That is , in fact , it cannot be done since Telegram cannot do this ? So this is a limitation of Telegram, not RouterOS?

rextended · December 23, 2022, 10:05am

Telegram use natively UTF-8 and you need a UTF-8 to ASCII to RouterOS converter.

The function FuncCP1251toUTF8 (really made with feet) and also my version for CP1252 (not the best, but more clear)
http://forum.mikrotik.com/t/rextended-fragments-of-snippets/151033/1
Is clearly called to_UTF8 because convert RouterOS ASCII characters received to CP125x and to UTF-8 for use it on Telegram & Co.

Instead from Telegram if non-7-bit ascii text (Cyrillic, emoticon & Co.) is used, you receive UTF-8 and RouterOS is unable to directly understand.

Sertik · December 23, 2022, 10:17am

Instead from Telegram if non-7-bit ascii text (Cyrillic, emoticon & Co.) is ued, you receive UTF-8 and RouterOS is unable to directly understand.

is it still possible to somehow solve this problem on RouterOS or is it no longer possible?

rextended · December 23, 2022, 10:33am

Telegram send: “Привіт @Sertik, Привіт @anav ” on UTF-8"

RouterOS receive, if nothing between convert the string:
"ÐŸÑ€Ð¸Ð²Ñ–Ñ‚ @Sertik, ÐŸÑ€Ð¸Ð²Ñ–Ñ‚ @anav ðŸ"RouterOS stop parsing at “Ð” because do not understand.

Like you use a function to convert RouterOS to Telegram language, inside RouterOS
you must write a function to convert Telegram to RouterOS language inside Telegram.

Sertik · December 23, 2022, 10:36am

Got it, thanks. The verdict is - impossible. You can close the topic. REX, don’t be offended by the incomprehensible and thank you very much!