Convert any text to UNICODE

Hello,

Is it possible to create in RouterOS a converter similar to this one?

https://r-1.ch/mikrotik-unicode-ssid-generator.php

I have found that by converting any text to UNICODE, it can be displayed on the terminal screen and also recognizes it correctly in Telegram (would be a great solution).

For example, we start from the text: hello my friend camión Ññ

In the web converter we have this result:
“\68\65\6C\6C\6F\20\6D\79\20\66\72\69\65\6E\64\20\63\61\6D\69\C3\B3\6E\20\C3\91\C3\B1”

If we test it in Telegram, the result is OK.




{
:local text "\68\65\6C\6C\6F\20\6D\79\20\66\72\69\65\6E\64\20\63\61\6D\69\C3\B3\6E\20\C3\91\C3\B1"          
:local MessageText $text
:local SendTelegramMessage [:parse [/system script get MyTGBotSendMessage source]]
$SendTelegramMessage MessageText=$MessageText
:put [$text]
}

And on the terminal screen it is also OK.




thanks in advance.

Well, issue is INPUT to CLI/winbox/etc is ASCII only. And winbox’s terminal will strip Unicode, so you won’t see anything there – only via ssh would the UTF-8 output work.

But in script you could use variables with the UTF-8 escape sequences with string interpolation, which should work to go to stuff via fetch, like so:

:global tilden "\C3\B1"
:put "espa$(tilden)ola"

:global garaisi "\C4\AB"
:put "Labr$(garaisi)t"

You’d obviously have to define the set of unicode char codes (in RouterOS’s byte notation, “\xx\yy\zz”) ahead of usage, but that might work in some cases.

I suppose you could also use a function in the approach above, so that each letter could still be output as normal ASCII. Syntax is trickier with a function however:

# global flag to output UTF-8
:global "use-unicode" 1

# Tilde over lowercase N
:global tildan do={
   :global "use-unicode" 
   :if ($"use-unicode" = 1) do={
       return "\C3\B1"
    } else={
       return "n"
    }
}

# output as unicode
:set "use-unicode" 1
:put "espa$([$tildan])ola"

# output as ascii
:set "use-unicode" 0
:put "espa$([$tildan])ola"

To write a defined text is fine, but the study is to be able to extract any text from a received SMS and forward it by Telegram with Unicode characters, so it could be read in Telegram. I have tried to see how this converter works [ https://r-1.ch/mikrotik-unicode-ssid-generator.php ], but my programming knowledge is very limited. :frowning:

BR.

Ah, that’s slightly different SMS uses UCS-2 encoding, not UTF-8. So it’s really not same as the “emoji” code, which takes UTF-8.

You’re looking for a direct UCS-2/UTF-16 to UTF-8 conversion? That seems already covered by @rextended code above.

UTF-16/UCS-2 using double-byte to store the “popular” unicode – same format as Windows (and SMS) use internally. UNIX (and JSON) etc generally favor UTF-8, which is same as normal ASCII, but uses escape code in the extended ASCII and a variable number of bytes to store the unicode.

WRONG solution for telegram, what MT ROS should do is parse JSON. The cheap hack of parsing text is a fragile approach.

There is no JSON involved here – although read/writing JSON has long been missing but different issue.

OP is starting with UCS-2 encoded SMS PDU – if only he was starting with JSON.

I’d think there must be some converter in forum, but I don’t find one instantly. I know I ain’t writing one since it tickier than it looks I suspect.

Perhaps an example shows the problem. We’ll go with the tilde ñ.
In ASCII/CP1252/Latin-1 that decimal 241, it’s one byte, as hex: F1 or as binary: 1111 0001
In GSM7 it can’t be shown since only lower ASCII is supported.
In UCS-2 which GSM can, optionally, use, everything is two bytes, so tilde-lowercase-n is just, in hex: 00F1
In UTF-8, it’s also two bytes (but other unicode could be 3 or 4 bytes – UCS2 is always just two bytes). But it’s a more confusing C3B1 in hex when encoded as UTF-8. Since UTF-8 supports the entire unicode, the higher/extended ASCII codes are re-used in encoding, so while ñ is part of of extended ascii, the extended ascii is “hijacked” to re-used for encoding the full set of unicode into multiple bytes.

Wikipedia has a char of the needed conversion logic:

from: https://en.wikipedia.org/wiki/UTF-8#Encoding

You’ll note your tilde case “ñ” in ASCII is 0x00F1 but since that > 0x0080, UTF-8 encoding kicks in. Only the lower 127 ASCII characters are unchanged by UTF-8, so the Latin-1/etc the lost in UTF-8. GSM7 used in GSM PDU messages is only the lower 127 chars of ASCII, so using a ñ similar triggers encoding, just two byte UCS2 instead.

Columbia has table of UCS2 values that might also be helpful:
http://www.columbia.edu/kermit/ucs2.html

So I’d think this is possible in RouterOS script however. But different logic than the @rexetended one. But since in UCS-2, the two bytes are the same as the unicode code point, it’s just matter of remapping ones them to the multiple bytes used by UTF-8. UTF-8 is what’s required for JSON (and display in SSH).

But the issue may be how you even both identify the encoding and extract the UCS2 encoding from an SMS PDU, that the first problem before you get encoding to UTF-8 for use in HTTP stuff like telegram etc.

Logic:
UCS-2 have 65535 possible values (ignoring at the start the invalid sequence), always are 2 bytes.
UTF-8 do not have fixed characters length, and “UNICODE entry point” are different from what effectively is wroted inside the string.
For example, again the €uro sign:
€ is one character of CP1252 (Windows 1252) and other, but not all… (but we suppose to use UCS-2 that have for sure that symbol)
€(1252) = 0x80, is UNICODE entry point 0x20-0xAC and is writed effectively as 0xE2 0x82 0xAC on a string.
But… 0x20-0xAC is also the UCS-2 encoding for €uro…
I have already done both tables for characters on CP1252.
Supposing whe have always correct input value (input check can be added later)

Someone can test this if is working as expected.
I do not test that because do not have time today…

Based on already existing tables:
http://forum.mikrotik.com/t/rextended-fragments-of-snippets/151033/1
On future one conversion function based on bit, instead of tables can be done, when I have time


code removed, see
https://forum.mikrotik.com/viewtopic.php?p=983695#p983695

The string on example is the converted string “hello my friend camión Ññ” to UCS-2
Entry points: ó = 00 FE, Ñ = 00 D1,ñ = 00 F1

Result is the string for telegram:
%68%65%6C%6C%6F%20%6D%79%20%66%72%69%65%6E%64%20%63%61%6D%69%C3%B3%6E%20%C3%91%C3%B1

Can be decoded here:
https://www.urldecoder.org/

That does work…as designed does not cover all of UCS2 obviously.

I just hope @diamuxin isn’t portugues, latvian, or anyone who needs cedillas, macons, etc. - they’re not in Latin1 charset so not converted by this code. @rextended knows this, but the “lookup table method” is way easier than doing the bit-math needed to convert UCS-2 to UTF-8…


While Windows CP1252 and ISO-8859-1, or more generally “Latin 1”, are the same. The euro sign € is an oddity since it’s in CP1252, but not a character in ISO-8859-1. e.g. https://en.wikipedia.org/wiki/Windows-1252#Codepage_layout vs https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_page_layout.
In theory, the SMS’s UCS-2 should have this encoded using 0x20AC, since that’s it’s unicode codepoint - but who knows, it could be the 0x0080 since most OS accept both for €.


Since OP wanted telegrams, that’s totally right. But technically that’s “urlencoded” UTF-8, which is what you need if you use HTTP with query parameters (which I think telegram examples use to avoid needing JSON…).


100% agree. But JSON could have UTF-8 unicode chars inside, so problem here still wouldn’t just go away with that (e.g. it not JUST some “json2array” that’s missing in scripting) – they have a long list, which include basic encoding/decoding the RFC-ish/unicode formats too – as shown here!

But if someone did want “raw UTF-8” – which is what be needed inside some JSON… using “\” instead of “%” in the loop that builds the results would do get your UTF-8 as a byte stream (likely also need to return [:parse [return $constr]] (to re-interpolate the escape sequences too.

Hello, what a nice surprise to read this progress on the recoding of characters valid for Telegram, thanks to both of you for your interest.

No, Amm0 don’t worry, I’m working with the spanish language, I mainly wanted to make it compatible with “ñ” and acute accents like “á, é, í, ó, ú” (uppercase and lowercase of course), the euro currency symbol (€) is not important because I hardly receive SMS with it.

I want to start testing what @rextended has kindly suggested but I have a doubt, to use the function $testUCS2toUTF8 directly from an SMS content does not convert anything, I suppose that first I will have to convert from normal text to UCS2, right?

1.- Received SMS


<?xml version="1.0" encoding="UTF-8"?>
<response>
	<Count>1</Count>
	<Messages>
		<Message>
			<Smstat>0</Smstat>
			<Index>40000</Index>
			<Phone>+34XXXXXXXXX</Phone>
			<Content>campaña y acción.</Content>
			<Date>2023-02-11 17:37:29</Date>
			<Sca></Sca>
			<SaveType>0</SaveType>
			<Priority>0</Priority>
			<SmsType>1</SmsType>
		</Message>
	</Messages>
</response>

Test script, from > system/script/run test1


:global testUCS2toUTF8
:local content "campaña y acción."; # text extracted from test sms received
:put [$testUCS2toUTF8 $content]
Result on terminal screen: <empty>

I want to collaborate, how do I have to test?

You can’t just cut-and-paste the ñ into string. You need to pull the raw SMS data from your modem via at-chat (or maybe /tool/sms supports raw bytes now, dunno) – that’s what’s in UCS-2 format. If it’s your modem that’s giving you the XML, that’s likely UTF-8 inside, which would just parsing to pull out the message part (another complex parsing assignment however), then just urlencoding (e.g. the %20%CD%Ed format).

How are you getting that XML?

The message is extracted directly in XML from a Huawei USB modem using an API built into the device by that manufacturer.

Starting from this function (token parser library)


# TOKEN PARSER LIBRARY
# v1.0.0 pkt

# :put [($tokenParser->"getTag") source=$xml tag="SessionInfo"]

# ($tokenParser->"getBetween")
#  get delimited value
#    source - source string
#    fromTok - (optional) text AFTER this token (or from source beginning) will be returned
#    toTok - (optional) text BEFORE this token (or until source finish) will be returned
#    startPos - (optional) start position (default 0 = beginning)
#  returns an array with fields data and pos
# 
# ($tokenParser->"getTag")
#  get value for XML tag
#    source - xml string
#    tag - tag which value is to be returned
#    startPos - (optional, text index) start position (if not specified, it will search from the beginning of the string)
#  returns the tag content
# 
# ($tokenParser->"getTagDetailed")
#  get value and end position for XML tag
#    source - xml string
#    tag - tag which value is to be returned
#    startPos - (optional, text index) start position (if not specified, it will search from the beginning of the string)
#  returns an array with fields "data" (tag content) and "pos" (where tag ends)
# 
# ($tokenParser->"getTagList")
#  get a list with the content of each appearance of tag
#    source - xml string
#    tag - tag which value is to be returned
#  returns an array with tag contents
# 
# ($tokenParser->"forEachTag")
#  incremental parser that calls the callback with the content of each appearance of tag
#    source - xml string
#    tag - tag which value is to be returned
#    callback - callback (with param content) to be called for each appearance of tag
#    callbackArgs - callback will be called passing this value in param args


:global tokenParser ({})

:set ($tokenParser->"getBetween") do={ # get delimited value
  # source - source string
  # fromTok - (optional) text AFTER this token (or from source beginning) will be returned
  # toTok - (optional) text BEFORE this token (or until source finish) will be returned
  # startPos - (optional) start position (default 0 = beginning)
  
  # returns an array with fields data and pos
  # if fromTok and/or toTok are specified and neither of them appear in source, empty string "" will be returned as data

  # based on function getBetween by CuriousKiwi, modified by pkt

  :local posStart
  if ([:len $startPos] = 0) do={
    :set posStart -1
  } else={
    :set posStart ($startPos-1)
  }

  :local found true
  :local data 

  :local resultStart
  :if ([:len $fromTok] > 0) do={
    :set resultStart [:find $source $fromTok $posStart]
    :if ([:len $resultStart] = 0) do={ # start token not found
      :set found false
      :set data ""
    }
    :set resultStart ($resultStart + [:len $fromTok])
  } else={
    :set resultStart 0
  }

  :local resultEnd
  :if (found = true && [:len $toTok] > 0) do={
    :set resultEnd [:find $source $toTok ($resultStart-1)]
    :if ([:len $resultEnd] = 0) do={ # end token not found
      :set found false
      :set data ""
    }
  } else={
    :set resultEnd [:len $source]
  }

  :if ($found = true) do={ :set data [:pick $source $resultStart $resultEnd] }

  :return { data=$data; pos=$resultEnd }
}

:set ($tokenParser->"getTag") do={ # get value for XML tag
  # source - xml string
  # tag - tag which value is to be returned
  # startPos - (optional, text index) start position (if not specified, it will search from the beginning of the string)

  # returns the tag content

  :global tokenParser
  :return ([($tokenParser->"getBetween") source=$source fromTok=("<$tag>") toTok=("</$tag>") startPos=$startPos]->"data")
}

:set ($tokenParser->"getTagDetailed") do={ # get value and end position for XML tag
  # source - xml string
  # tag - tag which value is to be returned
  # startPos - (optional, text index) start position (if not specified, it will search from the beginning of the string)

  # returns an array with fields "data" (tag content) and "pos" (where tag ends)

  :global tokenParser
  :return [($tokenParser->"getBetween") source=$source fromTok=("<$tag>") toTok=("</$tag>") startPos=$startPos]
}

:set ($tokenParser->"getTagList") do={ # get a list with the content of each appearance of tag
  # source - xml string
  # tag - tag which value is to be returned

  # returns an array with tag contents

  :global tokenParser

  :local result ({})
  :local doneTags false
  :local startPos 0

  :do {
    :local tagContent [($tokenParser->"getTagDetailed") source=$source tag=$tag startPos=$startPos]

    :local content ($tagContent->"data")
    :if ($content != "") do={
      :set ($result->[:len $result]) $content

      # advance start pos to search for next tag
      :set startPos ($tagContent->"pos")
    } else={
      :set doneTags true
    }
  } while=($doneTags = false)

  :return $result
}

:set ($tokenParser->"forEachTag") do={ # incremental parser that calls the callback with the content of each appearance of tag
  # source - xml string
  # tag - tag which value is to be returned
  # callback - callback (with param content) to be called for each appearance of tag
  # callbackArgs - callback will be called passing this value in param args

  :global tokenParser

  :local doneTags false
  :local startPos 0

  :do {
    :local tagContent [($tokenParser->"getTagDetailed") source=$source tag=$tag startPos=$startPos]

    :local content ($tagContent->"data")
    :if ($content != "") do={
      [$callback tagContent=$content args=$callbackArgs]

      # advance start pos to search for next tag
      :set startPos ($tagContent->"pos")
    } else={
      :set doneTags true
    }
  } while=($doneTags = false)
}



Function to get a list of SMS messages


:global recvSMS do={
  :local lteIP "192.168.8.1"

  :global tokenParser

  # get SessionID and Token via LTE modem API
  :local urlSesTokInfo "http://$lteIP/api/webserver/SesTokInfo"
  :local api [/tool fetch $urlSesTokInfo output=user as-value]
  :local apiData  ($api->"data")

  # parse SessionID and Token from API session data 
  :local apiSessionID [($tokenParser->"getTag") source=$apiData tag="SesInfo"]
  :local apiToken [($tokenParser->"getTag") source=$apiData tag="TokInfo"]

  # header and data config
  :local apiHead "Content-Type:text/xml,Cookie: $apiSessionID,__RequestVerificationToken:$apiToken"
  :local recvData "<?xml version=\"1.0\" encoding=\"UTF-8\"?><request><PageIndex>1</PageIndex><ReadCount>20</ReadCount><BoxType>1</BoxType><SortType>0</SortType><Ascending>0</Ascending><UnreadPreferred>1</UnreadPreferred></request>"

  # recv SMS via LTE modem API with fetch
  :return [/tool fetch http-method=post http-header-field=$apiHead url="http://$lteIP/api/sms/sms-list" http-data=$recvData output=user as-value]
}

Script to extract the content of messages:


:global tokenParser
:global recvSMS
:local xmlSmsList ([$recvSMS]->"data")
:local smsList [($tokenParser->"getTagList") source=$xmlSmsList tag="Message"]
:local smsCount [:tonum [($tokenParser->"getTag") source=$xmlSmsList tag="Count"]]

:if ($smsCount > 0) do={

:foreach tagContent in=$smsList do={

  :local index [($tokenParser->"getTag") source=$tagContent tag="Index"]
  :local date [($tokenParser->"getTag") source=$tagContent tag="Date"]
  :local phone [($tokenParser->"getTag") source=$tagContent tag="Phone"]
  :local content [($tokenParser->"getTag") source=$tagContent tag="Content"]
  :local read ([($tokenParser->"getTag") source=$tagContent tag="Smstat"] = 1)

  :if ($content != "") do={
    :put "$index $read $date $phone $content"    
    /tool e-mail send to=user@mail.com subject="SMS $phone" body="$index $read $date $phone $content" 

   # Telegram Start
   :local MessageText "SMS $phone $content"
   :local SendTelegramMessage [:parse [/system script get MyTGBotSendMessage source]]
   $SendTelegramMessage MessageText=$MessageText
   # Telegram End
  }
}

}

Content Telegram Module “MyTGBotSendMessage”


:local BotToken "XXXXXXXXXX:XXXXXXXXXX-XXXXXXXXXXXXXXXXXXXXX";
:local ChatID "XXXXXXXXX";
:local parseMode "HTML";
:local SendText $MessageText;

/tool fetch url="https://api.telegram.org/bot$BotToken/sendMessage\?chat_id=$ChatID&parse_mode=$parseMode&text=$SendText" keep-result=no;

That is the process, I hope I have explained myself well.


BR.

Okay, well that’s simplier than UCS-2 from SMS via AT. Your the Huawai modem is doing you a favor here - most modems give you an SMS PDU that requires parsing before you even get to the UCS2.

Your “:local content” variable should already have UTF-8 in it (e.g. the XML metadata <?xml version="1.0" encoding="UTF-8"?>). So you just need to use a different @rextended function with the $content (e.g. [$fURLEncode $content]) before passing it along to telegram:
http://forum.mikrotik.com/t/replace-characters-in-string-url-encode/76863/11

To get the raw UTF-8 bytes into the urlencoded string (e.g. UTF-8 that’s % encoded for use in the HTTP query string).

RouterOS only allows you parse the bytes involved in unicode, but it really doesn’t haven’t unicode support for display/input in CLI/winbox/SSH/etc.

In that case, I tried with the $fURLEncode function but it doesn’t work either.


# ------------------- fURLEncode ----------------------
#
:global fURLEncode do={
    :local Chars {" "="%20";"!"="%21";"#"="%23";"%"="%25";"&"="%26";"'"="%27";"("="%28";")"="%29";"*"="%2A";"+"="%2B";","="%2C";"/"="%2F";":"="%3A";";"="%3B";"<"="%3C";"="="%3D";">"="%3E";"@"="%40";"["="%5B";"]"="%5D";"^"="%5E";"`"="%60";"{"="%7B";"|"="%7C";"}"="%7D"}
    :set ($Chars->"\07") "%07"
    :set ($Chars->"\0A") "%0A"
    :set ($Chars->"\0D") "%0D"
    :set ($Chars->"\22") "%22"
    :set ($Chars->"\24") "%24"
    :set ($Chars->"\3F") "%3F"
    :set ($Chars->"\5C") "%5C"
    :local URLEncodeStr
    :local Char
    :local EncChar
    :for i from=0 to=([:len $1]-1) do={
        :set Char [:pick $1 $i]
        :set EncChar ($Chars->$Char)
        :if (any $EncChar) do={
            :set URLEncodeStr "$URLEncodeStr$EncChar"
        } else={
            :set URLEncodeStr "$URLEncodeStr$Char"
        }
    }
    :return $URLEncodeStr
}

I have modified the array to include two special characters “ñ” and “ó” but it does not work.

:local Chars {" “=”%20";“!”=“%21”;“#”=“%23”;“%”=“%25”;“&”=“%26”;“'”=“%27”;“(”=“%28”;“)”=“%29”;“*”=“%2A”;“+”=“%2B”;“,”=“%2C”;“/”=“%2F”;“:”=“%3A”;“;”=“%3B”;“<”=“%3C”;“=”=“%3D”;“>”=“%3E”;“@”=“%40”;“[”=“%5B”;“]”=“%5D”;“^”=“%5E”;“`”=“%60”;“{”=“%7B”;“|”=“%7C”;“}”=“%7D”;“ñ”=“%C3%21”;“ó”=“%C3%23”}


Result:
campa%C3%21a%20y%20acci%C3%23n.
status: failed

I surrender :frowning:

Thank you in any case.

..

C3 21 C3 23???
this is correct urlencoded string…
campa%C3%B1a%20y%20acci%C3%B3n
ñ = \C3\B1 and ó = \C3\B3
C3 21 = ! and C3 23 = #

I not remember the 2021 version (now deleted), but I have already done URLencode for UTF-8 some days ago…
http://forum.mikrotik.com/t/rextended-fragments-of-snippets/151033/1

If you have “campaña y acción” you can use directly ASCIItoCP1252toURLencode

If you have the SMS value readed directly by AT commands and converted to UCS-2 string,
“campaña y acción” = \00c\00a\00m\00p\00a\00\F1\00a\00\20\00y\00\20\00a\00c\00c\00i\00\F3\00n
(obviously I have alrady converted standard letters on “c-a-m-p…”)
ñ = \00\F1 ó = \00\F3
At this point you convert the UCS-2 string with testUCS2toUTF8 and pass the results to UTF8toURLencode to obtain the URL/GET/POST string for fetch.

I have already done that function, simply remove the % on the function and pass the result to:
hexstr2chrstr
http://forum.mikrotik.com/t/decode-ussd-on-wap-lte-kit/115644/4

Or directly convert “on the fly” the character with hex2chr
http://forum.mikrotik.com/t/how-to-convert-a-hex-value-to-a-char/97913/9

Or alter the table to give directly the characters instead of hex values…

from
:local CP1252toUTF8 {"00";"01";"02";.....................;"C3BD";"C3BE";"C3BF"}
to
:local CP1252toUTF8 {"\00";"\01";"\02";.....................;"\C3\BD";"\C3\BE";"\C3\BF"}

and from
        :local utf ($CP1252toUTF8->[:find $CP1252testEP [:pick $string $pos ($pos + 2)] -1])
        :local sym ""
        :if ([:len $utf] = 2) do={:set sym "%$[:pick $utf 0 2]" }
        :if ([:len $utf] = 4) do={:set sym "%$[:pick $utf 0 2]%$[:pick $utf 2 4]" }
        :if ([:len $utf] = 6) do={:set sym "%$[:pick $utf 0 2]%$[:pick $utf 2 4]%$[:pick $utf 4 6]" }
        :set constr "$constr$sym"
to
        :local utf ($CP1252toUTF8->[:find $CP1252testEP [:pick $string $pos ($pos + 2)] -1])
        :set constr "$constr$utf"

searchtag # rextended ucs2utf8

I have completed the function :slight_smile: :slight_smile: :slight_smile:

Without using tables, converting all UCS-2 (unicode 2 bytes entry point) characters to UTF-8…

:global UCS2toUTF8 do={
    :local numbyte2hex do={
        :local input [:tonum $1]
        :local hexchars "0123456789ABCDEF"
        :local convert [:pick $hexchars (($input >> 4) & 0xF)]
        :set convert ($convert.[:pick $hexchars ($input & 0xF)])
        :return $convert
    }

    :local charsString ""
    :for x from=0 to=15 step=1 do={ :for y from=0 to=15 step=1 do={
        :local tmpHex "$[:pick "0123456789ABCDEF" $x ($x+1)]$[:pick "0123456789ABCDEF" $y ($y+1)]"
        :set $charsString "$charsString$[[:parse "(\"\\$tmpHex\")"]]"
    } }

    :local chr2int do={:if (($1="") or ([:len $1] > 1) or ([:typeof $1] = "nothing")) do={:return -1}; :return [:find $2 $1 -1]}

    :local string $1
    :if (([:typeof $string] != "str") or ($string = "")) do={ :return "" }
    :local output ""

    :local lenstr [:len $string]
    :for pos from=0 to=($lenstr - 1) step=2 do={
       :local input (([$chr2int [:pick $string  $pos      ($pos + 1)] $charsString] * 0x100) + \
                     ([$chr2int [:pick $string ($pos + 1) ($pos + 2)] $charsString]        ))
        :local results [:toarray ""]
        :local utf   ""
        :if ($input > 0x7F) do={
            :if ($input > 0x7FF) do={
                :if ($input > 0xFFFF) do={
                    :if ($input > 0x10FFFF) do={
                        :error "UTF-8 do not have code point > of 0x10FFFF"
                    } else={
                        :error "UCS-2 do not have code point > of 0xFFFF"
# the following commented lines are not used on UCS-2
# but I have already prepared my script for future changes to work with all UNICODE code points from 0x000000 to 0x10FFFF as well...
#                        :set ($results->0) (0xF0 + ( $input >> 18        ))
#                        :set ($results->1) (0x80 + (($input >> 12) & 0x3F))
#                        :set ($results->2) (0x80 + (($input >>  6) & 0x3F))
#                        :set ($results->3) (0x80 + ( $input        & 0x3F))
                    }
                } else={
                    :set ($results->0) (0xE0 + ( $input >> 12        ))
                    :set ($results->1) (0x80 + (($input >>  6) & 0x3F))
                    :set ($results->2) (0x80 + ( $input        & 0x3F))
                }
            } else={
                :set ($results->0) (0xC0 + ($input >>    6))
                :set ($results->1) (0x80 + ($input  & 0x3F))
            }
        } else={
            :set ($results->0) $input
        }
        :foreach item in=$results do={
            :set utf "$utf%$[$numbyte2hex $item]"
        }
        :set output "$output$utf"
    }
    :return $output
}

{
:local ucsreadedfromsms “\00h\00e\00l\00l\00o\00\20\00m\00y\00\20\00f\00r\00i\00e\00n\00d\00\20\00c\00a\00m\00i\00\F3\00n\00\20\00\D1\00\F1”
:put [$UCS2toUTF8 $ucsreadedfromsms]
}

results:
%68%65%6C%6C%6F%20%6D%79%20%66%72%69%65%6E%64%20%63%61%6D%69%C3%B3%6E%20%C3%91%C3%B1
The string on example is the converted string “hello my friend camión Ññ” to UCS-2
Entry points: ó = 00 FE, Ñ = 00 D1,ñ = 00 F1

For test the results:
https://www.urldecoder.org/


EDIT: Reformatted, fixed for non CP1252 characters.

Considering that the SMS message is extracted from my modem in UTF-8 format (I have already commented it on http://forum.mikrotik.com/t/convert-any-text-to-unicode/164329/1)

<?xml version="1.0" encoding="UTF-8"?>
<response>
	<Count>1</Count>
	<Messages>
		<Message>
			<Smstat>0</Smstat>
			<Index>40000</Index>
			<Phone>+34XXXXXXXXX</Phone>
			<Content>Google España G-126663 es tu código de verificación.</Content>
			<Date>2023-02-12 13:09:30</Date>
			<Sca></Sca>
			<SaveType>0</SaveType>
			<Priority>0</Priority>
			<SmsType>1</SmsType>
		</Message>
	</Messages>
</response>

I have tried the function $UTF8toURLencode


:global UTF8toURLencode do={
    :local ascii "\00\01\02\03\04\05\06\07\08\09\0A\0B\0C\0D\0E\0F\
                  \10\11\12\13\14\15\16\17\18\19\1A\1B\1C\1D\1E\1F\
                  \20\21\22\23\24\25\26\27\28\29\2A\2B\2C\2D\2E\2F\
                  \30\31\32\33\34\35\36\37\38\39\3A\3B\3C\3D\3E\3F\
                  \40\41\42\43\44\45\46\47\48\49\4A\4B\4C\4D\4E\4F\
                  \50\51\52\53\54\55\56\57\58\59\5A\5B\5C\5D\5E\5F\
                  \60\61\62\63\64\65\66\67\68\69\6A\6B\6C\6D\6E\6F\
                  \70\71\72\73\74\75\76\77\78\79\7A\7B\7C\7D\7E\7F\
                  \80\81\82\83\84\85\86\87\88\89\8A\8B\8C\8D\8E\8F\
                  \90\91\92\93\94\95\96\97\98\99\9A\9B\9C\9D\9E\9F\
                  \A0\A1\A2\A3\A4\A5\A6\A7\A8\A9\AA\AB\AC\AD\AE\AF\
                  \B0\B1\B2\B3\B4\B5\B6\B7\B8\B9\BA\BB\BC\BD\BE\BF\
                  \C0\C1\C2\C3\C4\C5\C6\C7\C8\C9\CA\CB\CC\CD\CE\CF\
                  \D0\D1\D2\D3\D4\D5\D6\D7\D8\D9\DA\DB\DC\DD\DE\DF\
                  \E0\E1\E2\E3\E4\E5\E6\E7\E8\E9\EA\EB\EC\ED\EE\EF\
                  \F0\F1\F2\F3\F4\F5\F6\F7\F8\F9\FA\FB\FC\FD\FE\FF"
    :local UTF8toURLe {"00";"01";"02";"03";"04";"05";"06";"07";"08";"09";"0A";"0B";"0C";"0D";"0E";"0F";
                       "10";"11";"12";"13";"14";"15";"16";"17";"18";"19";"1A";"1B";"1C";"1D";"1E";"1F";
                       "+";"21";"22";"23";"24";"25";"26";"27";"28";"29";"2A";"2B";"2C";"-";".";"2F";
                       "0";"1";"2";"3";"4";"5";"6";"7";"8";"9";"3A";"3B";"3C";"3D";"3E";"3F";
                       "40";"A";"B";"C";"D";"E";"F";"G";"H";"I";"J";"K";"L";"M";"N";"O";
                       "P";"Q";"R";"S";"T";"U";"V";"W";"X";"Y";"Z";"5B";"5C";"5D";"5E";"_";
                       "60";"a";"b";"c";"d";"e";"f";"g";"h";"i";"j";"k";"l";"m";"n";"o";
                       "p";"q";"r";"s";"t";"u";"v";"w";"x";"y";"z";"7B";"7C";"7D";"~";"7F";
                       "80";"81";"82";"83";"84";"85";"86";"87";"88";"89";"8A";"8B";"8C";"8D";"8E";"8F";
                       "90";"91";"92";"93";"94";"95";"96";"97";"98";"99";"9A";"9B";"9C";"9D";"9E";"9F";
                       "A0";"A1";"A2";"A3";"A4";"A5";"A6";"A7";"A8";"A9";"AA";"AB";"AC";"AD";"AE";"AF";
                       "B0";"B1";"B2";"B3";"B4";"B5";"B6";"B7";"B8";"B9";"BA";"BB";"BC";"BD";"BE";"BF";
                       "C0";"C1";"C2";"C3";"C4";"C5";"C6";"C7";"C8";"C9";"CA";"CB";"CC";"CD";"CE";"CF";
                       "D0";"D1";"D2";"D3";"D4";"D5";"D6";"D7";"D8";"D9";"DA";"DB";"DC";"DD";"DE";"DF";
                       "E0";"E1";"E2";"E3";"E4";"E5";"E6";"E7";"E8";"E9";"EA";"EB";"EC";"ED";"EE";"EF";
                       "F0";"F1";"F2";"F3";"F4";"F5";"F6";"F7";"F8";"F9";"FA";"FB";"FC";"FD";"FE";"FF"
                      }
    :local string $1
    :if (([:typeof $string] != "str") or ($string = "")) do={ :return "" }
    :local lenstr [:len $string]
    :local constr ""
    :for pos from=0 to=($lenstr - 1) do={
        :local urle ($UTF8toURLe->[:find $ascii [:pick $string $pos ($pos + 1)] -1])
        :local sym $urle
        :if ([:len $urle] = 2) do={:set sym "%$[:pick $urle 0 2]" }
        :set constr "$constr$sym"
    }
    :return $constr
}

And now I get what I need:


Result on screen:

:put “$index $read $phoneTG $date2dmy $contentTG”
40000 false %2B34XXXXXXXXX 12/02/2023 13:09:30 Google+Espa%C3%B1a+G-126663+es+tu+c%C3%B3digo+de+verificaci%C3%B3n.

Result in Telegram:




Thank you very, very much for your patience.

BR.

Thank you, it was a pleasure, also develop other useful functions.

MikroTik do not decode UTF-2 SMS, but if one have patience (next step?.. :unamused: ) to extract by AT commands the SMS PDU,
and extract UTF-2 text message from the PDU, is possible to forward that message to e-mail, twitter, etc.