PDU Fields for incoming SMS

Hi all. I’m trying to figure out the PDU fields of the incoming SMS. On the Internet, there is a lot of any disordered information and it is scattered. It seems to have found several more or less informative schemes. But there is no description for them that is understandable to an ordinary person.
In the topic header (first message), I attach some of the most successful schemes in my opinion, which show the division of SMS into fields.
To make the messages easier to read, I will place the information for each field in a separate message. And if someone wants to add or comment on some field value, it will be more convenient to do this. And probably the ability to add information to a specific message will remain.
So the next post will be about TP-SCA.
PDU string.jpg
PDU parts 2.jpg
filds pdu.jpg

If I understand correctly, the TP-SCA field consists of three parts: the length of the number, the type of number, and the actual number. There are a few questions that I don’t understand:

  1. If I understand correctly, TP-SCA can be absent and taken from the SIM card itself. In that case, it seems the first pair of digits will be “00”. However, it’s unclear how the entire number is encrypted and where it is displayed, if at all.
  2. I read that the SCA number type can have several types, but I only found descriptions for 91 and 81. I also read that there are other types like
    000 - unknown;
    001 - international;
    010 - national;
    011 - network-specific;
    100 - network subscriber type;
    101 - alphanumeric;
    110 - abbreviated;
    111 - reserved.
    However, I don’t know how to translate bits, and it doesn’t seem necessary to write a function for decoding such a small piece of information. I would like to understand how these encrypted data will be represented in human-readable form
    Later, from this template, it would be possible to create an array and retrieve values from it.
  3. I’m also unsure about the possible length of the SCA number. So far, I’ve encountered numbers with 11-12 characters. But perhaps someone knows other possibilities and how the first two digits, which are usually ‘07,’ would appear in that case.

Now let’s talk about the TP-MTI & Co. field. First of all, it’s not entirely clear how to determine the SMS type. There seems to be an explanation: SMS-DELIVER, SMS-STATUS REPORT, SMS-SUBMIT REPORT, RESERVED. There’s even a translation from bits to regular numbers. But I don’t understand it yet. If someone can clarify, please explain which numbers will be used in this context.
In addition, there are TP-MMS, TP-SRI, TP-UDHI, TP-RP fields. Where are their data located? How do they appear in the digital representation for an ordinary person?

Now let’s talk about the sender’s address field (TP-OA). Although it may resemble the SCA number at first glance, there are certain aspects that are not clear:

  1. Why does the same length of the SCA number get encoded as “07,” while for the sender’s address it is “0B”?
  2. What types of addresses can be used in this field?
  3. How can we distinguish between a numeric address and an alphanumeric address?
  4. And how can we determine the length of an alphanumeric address in terms of the number of characters allocated for it, rather than the number of bytes?

Is the TP-PID field ever different from “00”? What other values can it have, how are they encoded, and how many characters do they occupy?

TP-DCS field.

  1. It seems that GSM-7 encoding is “00” and UCS2 encoding is “08”. However, I have also seen “10” and “18”. What are these encodings?
  2. Are there any other types of encoding?

Regarding the TP-SCTS field, I think I understand it. It is encoded similarly to phone numbers. The only clarification I need is whether there is another time format, and if so, how it is encoded and indicated as a different time format.

https://www.activexperts.com/serial-port-component/tutorials/smstechnical/ see how PDU Type is encoded.

The TP-UDL field.

  1. Length of the user data, including the User Data Header if present.
  2. If the SMS is encoded with a 7-bit encoding, this field indicates the number of characters in the message.
  3. If the encoding is UCS2, then the field indicates the number of bytes in the message. Here, it is unclear how to count the characters. What exactly needs to be counted?
  4. Did I understand correctly that to determine the length of the message and whether it is multipart, I need to refer to the values in the TP-MTI field?

Only for sended messages, not for received


you have already your reply, and is not encrypted anything at all
http://forum.mikrotik.com/t/cant-turn-code-into-a-function/167087/1


0x91 and 0x81 do not exist (everytime use hex when the number are hex…)
are BIT 10010001 and 10000001 and must be checked what each bit is.


Is not encrypted, and you are forced to check BIT by BIT or you need one “if” function for each 256 possibilities…

If you notice, one of my screenshots is taken from there. Unfortunately, I do not understand all the information in this article. And first of all, how bits are translated. That’s why I asked to show what bit options are when they are arranged as ordinary numbers in the PDU text

On the smsc the first “07” is the needed bytes, on the senders are the length of the number, not the bytes…

There are some technical reasons behind these choices, like to “shuffle the number” or instead of converting the number +12356789 to BC8CB5 hexadecimal which also takes up less space, they chose that way.

But it is based on the computing power of the networks and terminals of the time, but the basic standard of GSM, over time, has remained the same…
So some things now trivial seem incomprehensible…


It’s obvious… 0x07 is 7 bytes and 0x0B are 11 characters, because 0x0B = 11…

How does it not exist? It's even mentioned in the link above. Here is a screenshot from the article.

Why do this if there are only a few such values? Perhaps even one value. In addition to "07" I have not yet met.

  1. The article is incomplete
  2. I'm referring that


  1. The article is incomplete, for example missing the alphanumeric value, you notice some SMS that coming from one alphanumeric string?
  2. 07 are from 11 to 12 numbers, and the are usual number size for SMSCenter, but if the SMSCenter have 9 or 10 numbers, is 06, and so on.


    You can't draw the conclusion that there aren't others, just because an article (which among other things, if you haven't noticed, call the complete rules)
    doesn't mention them or because you don't know them.

10.5.4.7 (not 6, the article is also wrong)

Possible plausible values used only on SMS (read the full guide to understand what are)…
10010001 0x91 international format - ISDN / standard telephony
10100001 0xA1 national format - ISDN / standard telephony
10110001 0xB1 operator specific number - ISDN / standard telephony
11000001 0xC1 dedicated access short number - ISDN / standard telephony
10011000 0x98 international format - national only number
10101000 0xA8 national format - national only number
10111000 0xB8 operator specific number - national only number
11001000 0xC8 dedicated access short number - national only number
10011001 0x99 international format - mobile operator only number
10101001 0xA9 national format - mobile operator only number
10111001 0xB9 operator specific number - mobile operator only number
11001001 0xC9 dedicated access short number - mobile operator only number

0x81 is 10000001 = unknown - national numbering plan


The possible total not-reserved values combination for that field are ~48. Approx, not counted exactly.
So, must be checked every BIT, or create one array of 48 or more values are one nonsense…

I haven't looked at this thread yet. After your post, I did it. But apparently again my google translator is to blame. It was about a number consisting of letters, ie like "InternetSMS" or "Freedom".


Unfortunately I don't understand when hex is used

Unfortunately, it's not obvious to me. And first of all, it is not clear when bytes are used, when something else is used. The length of the number in both cases is 11 characters. At the 12th character, we add "F" to an even number in both cases. But at the same time, for SCA, the length will be displayed as "07", and for the sender's number "0B". Perhaps because I do not understand bytes, but perhaps it is Google that translates in such a way that it cannot convey the meaning of what was said.

I don't understand what article you are talking about. Perhaps there is described work with text numbers of the sender.

In any case, even if we count every bit, and at the same time we want to somehow use this information, we will need an array of values ​​in order to understand what to do with this information next. If there is no value in the array, we will have to do some default action so that the code execution does not interrupt. If we assume that we will use a maximum of two options: 91 or 81, then we also do not need to count the bits, but only make two conditions. So I don't see the point in counting bits when executing a script anyway. It is better to write down/remember the values ​​once and work with them further.

That is, "07" is not the length of characters, but bytes? Does "0B" count characters, not bytes?
If so, am I right in thinking that 7 bytes means 7 pairs of characters? That is, one pair of characters - the type of number, and 6 pairs - the length of the number? Ie 11-12 characters.

I understand not knowing the programming language, but at least what is a Byte you should know...

A Byte consists of 8 bits ranging from 00000000 to 11111111, because bits can only have 2 values, either 0 or 1,
this so that there are 256 possible combinations,
whose value ranges from 0 to 255, in decimal,
and in hexadecimal it is represented by the combinations of symbols, in increasing order of value, 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
(conceptually it doesn't matter if uppercase or lowercase)

0x is written in front to avoid ambiguity when using only symbols that could be mistaken for decimals (or even binary), for example how much is "10"? or "91"?
therefore the possible values of a Byte, in hexadecimal, range from 00 to FF,
and that's why 0x07 represents 7 Bytes, which obviously are represented by 14 characters because every single Bytes is represented with two.