StreamManager gets into broken state due to DBException$VolumeIOError

0

Greengrass Nucleus Version: 2.10.1

StreamManager Version: 2.1.5

The stream manager component enters a "Broken" state after and issue with Mapdb. It appears to be related to a bad parity bit. There are no obvious steps to reproduce other than normal usage. We would like to understand the root cause and how to resolve this.

Some relevant logs:

{"thread":"Copier","level":"INFO","eventType":"stdout","message":"2023 Sep 07 22:00:29,228 \u001B[32m[INFO]\u001B[m (main) com.amazonaws.internal.DefaultServiceEndpointBuilder: {iotsitewise, us-east-2} was not found in region metadata, trying to construct an endpoint using the standard pattern for this region: 'iotsitewise.us-east-2.amazonaws.com'.","contexts":{"scriptName":"services.aws.greengrass.StreamManager.lifecycle.startup.script","serviceName":"aws.greengrass.StreamManager","currentState":"STARTING"},"loggerName":"aws.greengrass.StreamManager","timestamp":1694124029229,"cause":null}
{"thread":"Copier","level":"INFO","eventType":"stdout","message":"2023 Sep 07 22:00:30,730 \u001B[1;31m[ERROR]\u001B[m (main) com.amazonaws.iot.greengrass.streammanager.StreamManagerService: StreamManagerService: Error initializing","contexts":{"scriptName":"services.aws.greengrass.StreamManager.lifecycle.startup.script","serviceName":"aws.greengrass.StreamManager","currentState":"STARTING"},"loggerName":"aws.greengrass.StreamManager","timestamp":1694124030789,"cause":null}
{"thread":"Copier","level":"INFO","eventType":"stdout","message":"org.mapdb.DBException$PointerChecksumBroken: Broken bit parity","contexts":{"scriptName":"services.aws.greengrass.StreamManager.lifecycle.startup.script","serviceName":"aws.greengrass.StreamManager","currentState":"STARTING"},"loggerName":"aws.greengrass.StreamManager","timestamp":1694124030790,"cause":null}
{"thread":"Copier","level":"INFO","eventType":"stdout","message":"at org.mapdb.DataIO.parity4Get(DataIO.java:476) ~[AWSGreengrassGreenlake-1.0-super.jar:?]","contexts":{"scriptName":"services.aws.greengrass.StreamManager.lifecycle.startup.script","serviceName":"aws.greengrass.StreamManager","currentState":"STARTING"},"loggerName":"aws.greengrass.StreamManager","timestamp":1694124030791,"cause":null}

and

{"thread":"Copier","level":"WARN","eventType":"stderr","message":"Exception in thread \"Thread-1\" org.mapdb.DBException$VolumeIOError","contexts":{"scriptName":"services.aws.greengrass.StreamManager.lifecycle.startup.script","serviceName":"aws.greengrass.StreamManager","currentState":"STOPPING"},"loggerName":"aws.greengrass.StreamManager","timestamp":1694124032528,"cause":null}
已提問 9 個月前檢視次數 298 次
1 個回答
0
已接受的答案

Corruption in this case is likely environmental. Depending on your system, there may be other options for reducing risk of corruption, e.g. configuring data=journal mode if you're using ext4 filesystem. Are you able to provide any more details on what OS, what filesystem? What's the filesystem usage look like?

The current way to recover from this is to delete stream_manager_metadata_store from /greengrass/v2/work/aws.greengrass.StreamManager (or STREAM_MANAGER_STORE_ROOT_DIR if that's configured), as you've likely done already. We're aware of this failure case and are looking to see what we can do in Stream Manager itself to make this experience better

AWS
已回答 9 個月前
  • Thanks for the quick reply.

    We will look into using data=journal if it works on ext3 as well. With regard to the system information for your debugging purposes here is what we are currently running on:

    Distributor ID:	Debian
    Description:	Debian GNU/Linux 9.13 (stretch)
    Release:	9.13
    Codename:	stretch
    
    Filesystem     Type     1K-blocks    Used Available Use% Mounted on
    /dev/root      ext3       7232808 2092336   4766408  31% /
    

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南