Convert any text to UNICODE

rextended · February 11, 2023, 11:38am

Logic:
UCS-2 have 65535 possible values (ignoring at the start the invalid sequence), always are 2 bytes.
UTF-8 do not have fixed characters length, and “UNICODE entry point” are different from what effectively is wroted inside the string.
For example, again the €uro sign:
€ is one character of CP1252 (Windows 1252) and other, but not all… (but we suppose to use UCS-2 that have for sure that symbol)
€(1252) = 0x80, is UNICODE entry point 0x20-0xAC and is writed effectively as 0xE2 0x82 0xAC on a string.
But… 0x20-0xAC is also the UCS-2 encoding for €uro…
I have already done both tables for characters on CP1252.
Supposing whe have always correct input value (input check can be added later)

Someone can test this if is working as expected.
I do not test that because do not have time today…

Based on already existing tables:
http://forum.mikrotik.com/t/rextended-fragments-of-snippets/151033/1
On future one conversion function based on bit, instead of tables can be done, when I have time

…
code removed, see
https://forum.mikrotik.com/viewtopic.php?p=983695#p983695
…

The string on example is the converted string “hello my friend camión Ññ” to UCS-2
Entry points: ó = 00 FE, Ñ = 00 D1,ñ = 00 F1

Result is the string for telegram:
%68%65%6C%6C%6F%20%6D%79%20%66%72%69%65%6E%64%20%63%61%6D%69%C3%B3%6E%20%C3%91%C3%B1

Can be decoded here:
https://www.urldecoder.org/