Skip to content

MQTT Bridge silently stops relaying all Pubsub → LocalMqtt messages under high shadow update load

0

Environment:

  • Greengrass Nucleus version: 2.16.1
  • Component: aws.greengrass.clientdevices.mqtt.Bridge
  • Broker: aws.greengrass.clientdevices.mqtt.Moquette (SSL, port 8883)

Description

Under high shadow update volume, the MQTT bridge enters a broken state where all Pubsub → LocalMqtt mappings silently stop forwarding messages. The LocalMqtt → Pubsub direction continues to work normally throughout; devices continue to talk to each other over local mqtt but don't receive messages relayed from PubSub such as shadow manager responses. The bridge remains in RUNNING state with no errors logged and does not self-recover — a manual restart via greengrasscli is required to restore operation.

This is not strictly a boot-time issue. It has also been triggered mid-session during mass shadow updates (though this has only occurred a few times over many fleets). The most consistent reproduction path found so far is to make large desired shadow changes to many devices immediately before they connect.

This causes a burst of shadow delta and response traffic on the Pubsub side at the exact moment the bridge is also processing a flood of incoming LocalMqtt subscriptions from ~100 connecting devices, which appears to be sufficient to trigger the failure.

Reproduction Steps

  1. With devices offline, publish large desired state changes to the shadows of ~100 devices via AWS IoT Core (generating pending deltas for all devices).
  2. Start the Greengrass core with the MQTT bridge and Moquette broker configured.
  3. Have the ~100 client devices connect simultaneously and make shadow get requests on boot.
  4. The bridge receives a simultaneous burst of inbound LocalMqtt subscriptions and outbound Pubsub shadow delta/response messages.
  5. Observe that no Pubsub → LocalMqtt messages are delivered — shadow responses, delta notifications, and all other Pubsub → LocalMqtt routes go silent.

Logs

During the failure window, the ShadowManager continues processing updates successfully on the Pubsub/IPC side:

[INFO] com.aws.greengrass.shadowmanager.ipc.UpdateThingShadowRequestHandler: Successfully updated shadow. {thing name=ESTTarget-000000046, local-version=542}

However, there are no B->C PUBLISH entries in the Moquette message logger for any of the response or delta topics, confirming the bridge is receiving messages on the Pubsub side but not publishing them back to Moquette. The bridge logs no warnings or errors.

Multiple errors from io.moquette.broker.NewNettyMQTTHandler:

2026-03-16T15:09:32.513Z [ERROR] (nioEventLoopGroup-5-2) io.moquette.broker.NewNettyMQTTHandler: Unexpected exception while processing MQTT message. Closing Netty channel. CId=thing_name. {}
java.io.IOException: Connection reset by peer
        at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:233)
        at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223)
        at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:356)
        at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:255)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)

Several authorization errors at the peak of the burst which does not occur under normal circumstances:

[ERROR] (Session Executor 1) io.moquette.broker.PostOffice: client is not authorized to publish on topic: OrionAcct000002/6/resultcof.

The system recently migrated to Greengrass v2 and functioned without issue on Greengrass v1.

Questions for AWS

  1. Is there a known race condition or internal queue/channel in the bridge that can become blocked or dropped when Pubsub message volume coincides with a high rate of LocalMqtt subscription registrations?
  2. Is there a configurable backpressure limit, thread pool size, or queue depth for the Pubsub → LocalMqtt forwarding path that could be tuned to prevent this?
  3. Was the Pubsub → LocalMqtt bridge architecture changed significantly between Greengrass v1 and v2 in a way that would explain the regression under this workload?
  • In your issue reproduction steps, for step 3, are you subscribing to a local topic for receiving the shadow state on get/accepted and publish on to get?

    For the client devices to get the update shadow state, the flow looks like this

    1. Client devices connect to Moquette (LocalMqtt) and publish to $aws/things/{name}/shadow/get
    2. Bridge forwards that from LocalMqtt → Pubsub where the bridge also subscribes to the shadow topic.
    3. ShadowManager processes the request and publishes the response to Pubsub on $aws/things/{name}/shadow/get/accepted
    4. Bridge picks it up via PubSubClient callback and publishes to Moquette (Pubsub → LocalMqtt)
    5. Client device receives it on Moquette because it subscribed to $aws/things/{name}/shadow/get/accepted
  • In the logs you shared, the client devices are experiencing connection issues with the broker. Do you see similar issues with mqtt bridge especially when the messages are dropped?

    Mqtt bridge reconnection with the broker recreates required local mqtt subscriptions - which means that not all local pubsub messages are sent over localmqtt.

asked 2 months ago57 views
1 Answer
1

Based on the available documentation for AWS IoT Greengrass v2, I can provide some context about the MQTT bridge component, though I don't have specific information about the race condition you're experiencing.

The MQTT bridge component relays MQTT messages between client devices, local Greengrass publish/subscribe, and AWS IoT Core. It's designed to handle bidirectional message flow, subscribing to messages on source topics and publishing them to destination topics. The component supports both MQTT 3.1.1 and MQTT 5 protocols.

Recent versions of the MQTT bridge have included several bug fixes and improvements. Version 2.2.1 specifically addressed issues that could result in the bridge failing to subscribe to MQTT topics. Version 2.1.1 included fixes for how the component handles configuration reset updates and reduced the frequency of MQTT client disconnects when certificates rotate. Version 2.2.0 added support for MQTT topic wildcards when specifying local publish/subscribe as the source message broker, which requires Greengrass nucleus v2.6.0 or later.

The bridge uses QoS 1 to publish and subscribe to AWS IoT Core, even when client devices use QoS 0 with the local MQTT broker, which can result in additional latency. The component shares the same log file as the Greengrass nucleus.

However, I don't have specific information in the available documentation about:

  • Known race conditions related to high shadow update loads
  • Configurable backpressure limits, thread pool sizes, or queue depths for the Pubsub → LocalMqtt forwarding path
  • Architectural changes between Greengrass v1 and v2 that would explain this specific behavior

For this type of operational issue involving potential bugs or undocumented behavior under high load conditions, I recommend opening a support case with AWS Support, as they would have access to internal engineering resources and can investigate the specific failure mode you're experiencing with your Greengrass nucleus version 2.16.1 and the MQTT bridge component.
Sources
AWS-provided components - AWS IoT Greengrass
MQTT bridge - AWS IoT Greengrass

answered 2 months ago
EXPERT
reviewed 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.