Short version:
When the length marker of a word is the last byte in a packet the first byte of the next packet erroneously contains the hex value 0x19, followed by the actual word contents. Verified with wireshark (aka ethereal). Is this intended behavior?
Experienced on ROS 3.22 and 3.19
Long version:
My API pull seemed to randomly fail when I was doing a /queue/simple/print on queue information or queue stats information. I could run my PHP script any number of times and it would work perfectly. Every so often (what seemed to be a random interval) it would fail complaining of a corrupt word. So, I added several debugging statements to my code and found all was working as planned. I then wondered if it was perhaps my network card was bad and corrupting packets, so I tested my script on a box closer to the tik, same results.
I then pulled out wireshark and started capturing the traffic to/from the tik. If the script ran fine, I reset the capture and ran the script again. When the script failed I had it tell me what byte offset in the session it failed at (kept a running byte counter). I then noticed a pattern in several packet dumps:
If the last byte in a packet was the length marker for a word then the first byte in the next packet was 0x19, followed by the actual word contents.
Example #1: .tag= word corrupted:
00002BB3 75 65 73 3d 30 12 2e 74 61 67 3d 34 39 63 63 36 ues=0..t ag=49cc6
00002BC3 61 33 35 64 63 61 33 61 00 03 21 72 65 08 3d 2e a35dca3a ..!re.=.
00002BD3 69 64 3d 2a 31 38 09 3d 63 6f 6d 6d 65 6e 74 3d id=*18.= comment=
The tag command begins as I would expect (6th byte in the first line):
Hex value 0x12, saying the word is 18 bytes long.
This is followed by the word value:
2e 74 61 67 3d 34 39 63 63 36 61 33 35 64 63 61 33 61
.tag=49cc6a35dca3a
All is well, the output is as I expected. I verified this pattern several times in the packet dump (200+ queues)
The problem I noticed occurred at the following offset in the same connection, same reply:
00002D83 30 2f 30 13 3d 74 6f 74 61 6c 2d 70 63 71 2d 71 0/0.=tot al-pcq-q
00002D93 75 65 75 65 73 3d 30 12 ueues=0.
00002D9B 19 2e 74 61 67 3d 34 39 63 63 36 61 33 35 64 63 ..tag=49 cc6a35dc
00002DAB 61 33 61 00 03 21 72 65 08 3d 2e 69 64 3d 2a 31 a3a..!re .=.id=*1
As you notice in the 2nd line (offset 0x00002D93) there is the value 0x30, which is the ‘0’ character (total-pcq-queues=0). This is then followed the by value 0x12, which is the length marker for the tag word saying the tag word is 18 bytes long. There is then a packet break.
The first 18 bytes of that packet are:
19 2e 74 61 67 3d 34 39 63 63 36 61 33 35 64 63 61 33
Which translates to:
0x19 .tag=49cc6a35dca3
The next byte after that word (recall, the ros told me this word is 18 bytes long) is 0x61, which says the next word is 97 bytes long. This is actually incorrect.
The 0x61 value is the value ‘a’ that should be on the end of the tag. The next 0x00 byte indicates the end of the sentence.
The 0x19 byte (dec value 25, called the end of medium character) seems to be a extra byte that throws off any attempt to correctly extract the number of bytes from the socket stream. Is this value added on purpose by ros?
Example #2: =.id= word corrupted
First a valid pattern:
000101B4 36 33 38 34 33 30 33 35 66 00 03 21 72 65 08 3d 63843035 f..!re.=
000101C4 2e 69 64 3d 2a 38 42 09 3d 63 6f 6d 6d 65 6e 74 .id=*8B. =comment
0x08 says the id word is 8 bytes long.
3d 2e 69 64 3d 2a 38 42
which translates to:
=.id=*8B
Output is as expected.
The problem I noticed occurred at the following offset in the same connection, same reply:
00010364 2e 74 61 67 3d 34 39 63 63 36 33 38 34 33 30 33 .tag=49c c6384303
00010374 35 66 00 03 21 72 65 08 5f..!re.
0001037C 19 3d 2e 69 64 3d 2a 38 43 09 3d 63 6f 6d 6d 65 .=.id=*8 C.=comme
The 0x08 byte is the last byte in the packet, indicating the next word (the id word) is 8 bytes long.
The next 8 bytes are:
19 3d 2e 69 64 3d 2a 38
Which translates to:
0x19=.id=*8
When the correct value should be:
3d 2e 69 64 3d 2a 38 43
Which translates to:
=.id=*8C
So I was wondering if this is intended behavior and I just missed it in the API docs? If I did miss something are there any other variations on this behavior that I should be aware of?
Thanks in advance for any insight into this.