StreamManager gets into broken state due to DBException$VolumeIOError

0

Greengrass Nucleus Version: 2.10.1

StreamManager Version: 2.1.5

The stream manager component enters a "Broken" state after and issue with Mapdb. It appears to be related to a bad parity bit. There are no obvious steps to reproduce other than normal usage. We would like to understand the root cause and how to resolve this.

Some relevant logs:

{"thread":"Copier","level":"INFO","eventType":"stdout","message":"2023 Sep 07 22:00:29,228 \u001B[32m[INFO]\u001B[m (main) com.amazonaws.internal.DefaultServiceEndpointBuilder: {iotsitewise, us-east-2} was not found in region metadata, trying to construct an endpoint using the standard pattern for this region: 'iotsitewise.us-east-2.amazonaws.com'.","contexts":{"scriptName":"services.aws.greengrass.StreamManager.lifecycle.startup.script","serviceName":"aws.greengrass.StreamManager","currentState":"STARTING"},"loggerName":"aws.greengrass.StreamManager","timestamp":1694124029229,"cause":null}
{"thread":"Copier","level":"INFO","eventType":"stdout","message":"2023 Sep 07 22:00:30,730 \u001B[1;31m[ERROR]\u001B[m (main) com.amazonaws.iot.greengrass.streammanager.StreamManagerService: StreamManagerService: Error initializing","contexts":{"scriptName":"services.aws.greengrass.StreamManager.lifecycle.startup.script","serviceName":"aws.greengrass.StreamManager","currentState":"STARTING"},"loggerName":"aws.greengrass.StreamManager","timestamp":1694124030789,"cause":null}
{"thread":"Copier","level":"INFO","eventType":"stdout","message":"org.mapdb.DBException$PointerChecksumBroken: Broken bit parity","contexts":{"scriptName":"services.aws.greengrass.StreamManager.lifecycle.startup.script","serviceName":"aws.greengrass.StreamManager","currentState":"STARTING"},"loggerName":"aws.greengrass.StreamManager","timestamp":1694124030790,"cause":null}
{"thread":"Copier","level":"INFO","eventType":"stdout","message":"at org.mapdb.DataIO.parity4Get(DataIO.java:476) ~[AWSGreengrassGreenlake-1.0-super.jar:?]","contexts":{"scriptName":"services.aws.greengrass.StreamManager.lifecycle.startup.script","serviceName":"aws.greengrass.StreamManager","currentState":"STARTING"},"loggerName":"aws.greengrass.StreamManager","timestamp":1694124030791,"cause":null}

and

{"thread":"Copier","level":"WARN","eventType":"stderr","message":"Exception in thread \"Thread-1\" org.mapdb.DBException$VolumeIOError","contexts":{"scriptName":"services.aws.greengrass.StreamManager.lifecycle.startup.script","serviceName":"aws.greengrass.StreamManager","currentState":"STOPPING"},"loggerName":"aws.greengrass.StreamManager","timestamp":1694124032528,"cause":null}
asked 8 months ago289 views
1 Answer
0
Accepted Answer

Corruption in this case is likely environmental. Depending on your system, there may be other options for reducing risk of corruption, e.g. configuring data=journal mode if you're using ext4 filesystem. Are you able to provide any more details on what OS, what filesystem? What's the filesystem usage look like?

The current way to recover from this is to delete stream_manager_metadata_store from /greengrass/v2/work/aws.greengrass.StreamManager (or STREAM_MANAGER_STORE_ROOT_DIR if that's configured), as you've likely done already. We're aware of this failure case and are looking to see what we can do in Stream Manager itself to make this experience better

AWS
answered 8 months ago
  • Thanks for the quick reply.

    We will look into using data=journal if it works on ext3 as well. With regard to the system information for your debugging purposes here is what we are currently running on:

    Distributor ID:	Debian
    Description:	Debian GNU/Linux 9.13 (stretch)
    Release:	9.13
    Codename:	stretch
    
    Filesystem     Type     1K-blocks    Used Available Use% Mounted on
    /dev/root      ext3       7232808 2092336   4766408  31% /
    

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions