AWS IOT MQTT Test Client unicode emoji in message payload gets converted from UTF-8 bytes to \u encoding

0

I performed this test on the console's AWS IoT MQTT Test Client:

  1. I subscribed to the shadow/update/accepted topic for my thing
  2. I published a payload containing a unicode house emoji šŸ” in UTF-8 encoded as F0 9F 8F A1 to the shadow/update topic for my thing
  3. But the payload on the shadow/update/accepted topic contained the emoji in UTF-16 escape u format: "\uD83C\uDFE1"

My question is why is AWS converting from UTF-8 to UTF-16?

My problem is that I have code that checks the payload received on the shadow/update/accepted topic is the same as the payload sent on the shadow/update topic and it detects an error if they are different.

NOTE: when testing this on the MQTT Test Client make sure when subscribing and publishing that the Additional Configuration / MQTT payload display option is set to "Display payloads as strings (more accurate)", and not "Auto-format JSON payloads (improves readability)". To see the raw data choose the option "Display raw payloads (displays binary data as hexadecimal values)".

In detail:

1 Subscribe to $aws/things/my_thing/shadow/update/accepted

2 Publish to $aws/things/my_thing/shadow/update

{"state":{"reported":{"name":"šŸ”"}}}

The raw data for this message shows the emoji encoded in UTF-8 f09f8fa1: 7b227374617465223a7b227265706f72746564223a7b226e616d65223a22f09f8fa1227d7d7d

3 Payload on $aws/things/my_thing/shadow/update/accepted topic shows the emoji encoded in UTF-16 \u format:

{"state":{"reported":{"name":"\uD83C\uDFE1"}},"metadata":{"reported":{"name":{"timestamp":1744211708}}},"version":1970,"timestamp":1744211708}

I expected the payload to be identical on both topics.

I tried publishing to the shadow/update topic with the MQTT v5 Payload Format Indicator set to "UTF-8" and to "Binary" and the results are the same. But I notice if I publish to shadow/update with the Payload Format Indicator set to a value then the message on the shadow/update/accepted topic does not have a Payload Format Indicator. I expected the Payload Format Indicator to be passed from publisher to subscriber.

Thanks for any help you can give.

asked a month ago50 views
1 Answer
0
  • I tried publishing to the shadow/update topic with the MQTT v5 Payload Format Indicator set to "UTF-8" and to "Binary" and the results are the same. But I notice if I publish to shadow/update with the Payload Format Indicator set to a value then the message on the shadow/update/accepted topic does not have a Payload Format Indicator. I expected the Payload Format Indicator to be passed from publisher to subscriber. This part is a known issue. Shadow never supported MQTT5.

  • Also, we do convert to Unicode (UTF-16) in the Shadow when processing the topic messages. Please use unicode (UTF-16) encoding for your messages as workaround

AWS
SUPPORT ENGINEER
answered 22 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions