跳至內容

Best practices for storing Zstd-compressed binary data as String vs Binary attribute in DynamoDB?

0

AWS Documentation here recommends using compression algorithms such as GZIP or LZO to produce binary output in a Binary attribute type within the item.

I was wondering If I can use Zstd algorithm and store it in a simple String format. In my benchmarks, Zstd performed better, it provided better speed and compression level. I plan to use a String attribute because, in case the item size exceeds 400KB even after compressing the payload, I plan to store the payload in S3 and save the link in the DDB table.

My table structure:

PK (String)
SK (String)
ref (String): Possible values inline/ref. Ref means data is stored in S3, inline means it's stored in compressed format in DDB itself
payload (Binary/String): Binary if ref is inline, otherwise String.
已提問 2 個月前檢視次數 53 次
1 個回答
1
已接受的答案

For storing Zstd-compressed data in DynamoDB, you should use the Binary attribute type rather than String, even though you're considering Zstd instead of GZIP or LZO.

Binary type attributes are specifically designed to store any binary data, including compressed text. When you use the Binary type, DynamoDB properly handles the data as binary content rather than text. Your applications would need to encode the binary values in base64 format before sending them to DynamoDB, and upon receipt, DynamoDB will decode the data into an unsigned byte array.

While you could technically store base64-encoded compressed data in a String attribute, this approach is less efficient because:

  1. It doesn't leverage the proper data type semantics
  2. Base64 encoding increases the size by approximately 33% compared to storing it as a Binary attribute

Regarding your table structure, your approach of using a "ref" attribute to indicate whether data is stored inline or in S3 is a good practice. When the compressed data exceeds DynamoDB's 400KB item size limit, storing it in S3 and keeping only the reference in DynamoDB is the recommended approach.

For your specific schema:

  • PK (String) and SK (String) are appropriate
  • ref (String) with values "inline"/"ref" is a good approach
  • payload should be Binary when ref="inline" (not String)
  • When ref="ref", your S3 location can be stored as a String

While Zstd may perform better in your benchmarks than GZIP or LZO, the choice of compression algorithm doesn't change the recommendation to use Binary attribute type for storing the compressed data in DynamoDB.
Sources
Best practices for storing large items and attributes in DynamoDB - Amazon DynamoDB
Supported data types and naming rules in Amazon DynamoDB - Amazon DynamoDB
DynamoDB low-level API - Amazon DynamoDB
Maximum Item Size in DynamoDB for Storing Base64 Images | AWS re:Post
DynamoDB overview - AWS Prescriptive Guidance

已回答 2 個月前
AWS
專家
已審閱 2 個月前
  • @MikeLim Thanks. One question.

    When ref="ref", your S3 location can be stored as a String

    Do you recommend storing the S3 Path in the payload itself in binary format, or shall I create a separate attribute?

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。