Environment:
- Greengrass Nucleus version: 2.16.1
- Component:
aws.greengrass.clientdevices.mqtt.Bridge
- Broker:
aws.greengrass.clientdevices.mqtt.Moquette (SSL, port 8883)
Description
Under high shadow update volume, the MQTT bridge enters a broken state where all Pubsub → LocalMqtt mappings silently stop forwarding messages. The LocalMqtt → Pubsub direction continues to work normally throughout; devices continue to talk to each other over local mqtt but don't receive messages relayed from PubSub such as shadow manager responses. The bridge remains in RUNNING state with no errors logged and does not self-recover — a manual restart via greengrasscli is required to restore operation.
This is not strictly a boot-time issue. It has also been triggered mid-session during mass shadow updates (though this has only occurred a few times over many fleets). The most consistent reproduction path found so far is to make large desired shadow changes to many devices immediately before they connect.
This causes a burst of shadow delta and response traffic on the Pubsub side at the exact moment the bridge is also processing a flood of incoming LocalMqtt subscriptions from ~100 connecting devices, which appears to be sufficient to trigger the failure.
Reproduction Steps
- With devices offline, publish large desired state changes to the shadows of ~100 devices via AWS IoT Core (generating pending deltas for all devices).
- Start the Greengrass core with the MQTT bridge and Moquette broker configured.
- Have the ~100 client devices connect simultaneously and make shadow
get requests on boot.
- The bridge receives a simultaneous burst of inbound LocalMqtt subscriptions and outbound Pubsub shadow delta/response messages.
- Observe that no Pubsub → LocalMqtt messages are delivered — shadow responses, delta notifications, and all other Pubsub → LocalMqtt routes go silent.
Logs
During the failure window, the ShadowManager continues processing updates successfully on the Pubsub/IPC side:
[INFO] com.aws.greengrass.shadowmanager.ipc.UpdateThingShadowRequestHandler: Successfully updated shadow. {thing name=ESTTarget-000000046, local-version=542}
However, there are no B->C PUBLISH entries in the Moquette message logger for any of the response or delta topics, confirming the bridge is receiving messages on the Pubsub side but not publishing them back to Moquette. The bridge logs no warnings or errors.
Multiple errors from io.moquette.broker.NewNettyMQTTHandler:
2026-03-16T15:09:32.513Z [ERROR] (nioEventLoopGroup-5-2) io.moquette.broker.NewNettyMQTTHandler: Unexpected exception while processing MQTT message. Closing Netty channel. CId=thing_name. {}
java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at java.base/sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:233)
at java.base/sun.nio.ch.IOUtil.read(IOUtil.java:223)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:356)
at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:255)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:357)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:151)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Several authorization errors at the peak of the burst which does not occur under normal circumstances:
[ERROR] (Session Executor 1) io.moquette.broker.PostOffice: client is not authorized to publish on topic: OrionAcct000002/6/resultcof.
The system recently migrated to Greengrass v2 and functioned without issue on Greengrass v1.
Questions for AWS
- Is there a known race condition or internal queue/channel in the bridge that can become blocked or dropped when Pubsub message volume coincides with a high rate of LocalMqtt subscription registrations?
- Is there a configurable backpressure limit, thread pool size, or queue depth for the Pubsub → LocalMqtt forwarding path that could be tuned to prevent this?
- Was the Pubsub → LocalMqtt bridge architecture changed significantly between Greengrass v1 and v2 in a way that would explain the regression under this workload?
In your issue reproduction steps, for step 3, are you subscribing to a local topic for receiving the shadow state on
get/acceptedand publish on toget?For the client devices to get the update shadow state, the flow looks like this
$aws/things/{name}/shadow/getLocalMqtt → Pubsubwhere the bridge also subscribes to the shadow topic.$aws/things/{name}/shadow/get/acceptedPubsub → LocalMqtt)$aws/things/{name}/shadow/get/acceptedIn the logs you shared, the client devices are experiencing connection issues with the broker. Do you see similar issues with mqtt bridge especially when the messages are dropped?
Mqtt bridge reconnection with the broker recreates required local mqtt subscriptions - which means that not all local pubsub messages are sent over localmqtt.