スキップしてコンテンツを表示

Best practices for storing Zstd-compressed binary data as String vs Binary attribute in DynamoDB?

0

AWS Documentation here recommends using compression algorithms such as GZIP or LZO to produce binary output in a Binary attribute type within the item.

I was wondering If I can use Zstd algorithm and store it in a simple String format. In my benchmarks, Zstd performed better, it provided better speed and compression level. I plan to use a String attribute because, in case the item size exceeds 400KB even after compressing the payload, I plan to store the payload in S3 and save the link in the DDB table.

My table structure:

PK (String)
SK (String)
ref (String): Possible values inline/ref. Ref means data is stored in S3, inline means it's stored in compressed format in DDB itself
payload (Binary/String): Binary if ref is inline, otherwise String.
質問済み 2ヶ月前56ビュー
1回答
1
承認された回答

For storing Zstd-compressed data in DynamoDB, you should use the Binary attribute type rather than String, even though you're considering Zstd instead of GZIP or LZO.

Binary type attributes are specifically designed to store any binary data, including compressed text. When you use the Binary type, DynamoDB properly handles the data as binary content rather than text. Your applications would need to encode the binary values in base64 format before sending them to DynamoDB, and upon receipt, DynamoDB will decode the data into an unsigned byte array.

While you could technically store base64-encoded compressed data in a String attribute, this approach is less efficient because:

  1. It doesn't leverage the proper data type semantics
  2. Base64 encoding increases the size by approximately 33% compared to storing it as a Binary attribute

Regarding your table structure, your approach of using a "ref" attribute to indicate whether data is stored inline or in S3 is a good practice. When the compressed data exceeds DynamoDB's 400KB item size limit, storing it in S3 and keeping only the reference in DynamoDB is the recommended approach.

For your specific schema:

  • PK (String) and SK (String) are appropriate
  • ref (String) with values "inline"/"ref" is a good approach
  • payload should be Binary when ref="inline" (not String)
  • When ref="ref", your S3 location can be stored as a String

While Zstd may perform better in your benchmarks than GZIP or LZO, the choice of compression algorithm doesn't change the recommendation to use Binary attribute type for storing the compressed data in DynamoDB.
Sources
Best practices for storing large items and attributes in DynamoDB - Amazon DynamoDB
Supported data types and naming rules in Amazon DynamoDB - Amazon DynamoDB
DynamoDB low-level API - Amazon DynamoDB
Maximum Item Size in DynamoDB for Storing Base64 Images | AWS re:Post
DynamoDB overview - AWS Prescriptive Guidance

回答済み 2ヶ月前
AWS
エキスパート
レビュー済み 2ヶ月前
  • @MikeLim Thanks. One question.

    When ref="ref", your S3 location can be stored as a String

    Do you recommend storing the S3 Path in the payload itself in binary format, or shall I create a separate attribute?

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

関連するコンテンツ