내용으로 건너뛰기

Best practices for storing Zstd-compressed binary data as String vs Binary attribute in DynamoDB?

0

AWS Documentation here recommends using compression algorithms such as GZIP or LZO to produce binary output in a Binary attribute type within the item.

I was wondering If I can use Zstd algorithm and store it in a simple String format. In my benchmarks, Zstd performed better, it provided better speed and compression level. I plan to use a String attribute because, in case the item size exceeds 400KB even after compressing the payload, I plan to store the payload in S3 and save the link in the DDB table.

My table structure:

PK (String)
SK (String)
ref (String): Possible values inline/ref. Ref means data is stored in S3, inline means it's stored in compressed format in DDB itself
payload (Binary/String): Binary if ref is inline, otherwise String.
질문됨 2달 전58회 조회
1개 답변
1
수락된 답변

For storing Zstd-compressed data in DynamoDB, you should use the Binary attribute type rather than String, even though you're considering Zstd instead of GZIP or LZO.

Binary type attributes are specifically designed to store any binary data, including compressed text. When you use the Binary type, DynamoDB properly handles the data as binary content rather than text. Your applications would need to encode the binary values in base64 format before sending them to DynamoDB, and upon receipt, DynamoDB will decode the data into an unsigned byte array.

While you could technically store base64-encoded compressed data in a String attribute, this approach is less efficient because:

  1. It doesn't leverage the proper data type semantics
  2. Base64 encoding increases the size by approximately 33% compared to storing it as a Binary attribute

Regarding your table structure, your approach of using a "ref" attribute to indicate whether data is stored inline or in S3 is a good practice. When the compressed data exceeds DynamoDB's 400KB item size limit, storing it in S3 and keeping only the reference in DynamoDB is the recommended approach.

For your specific schema:

  • PK (String) and SK (String) are appropriate
  • ref (String) with values "inline"/"ref" is a good approach
  • payload should be Binary when ref="inline" (not String)
  • When ref="ref", your S3 location can be stored as a String

While Zstd may perform better in your benchmarks than GZIP or LZO, the choice of compression algorithm doesn't change the recommendation to use Binary attribute type for storing the compressed data in DynamoDB.
Sources
Best practices for storing large items and attributes in DynamoDB - Amazon DynamoDB
Supported data types and naming rules in Amazon DynamoDB - Amazon DynamoDB
DynamoDB low-level API - Amazon DynamoDB
Maximum Item Size in DynamoDB for Storing Base64 Images | AWS re:Post
DynamoDB overview - AWS Prescriptive Guidance

답변함 2달 전
AWS
전문가
검토됨 2달 전
  • @MikeLim Thanks. One question.

    When ref="ref", your S3 location can be stored as a String

    Do you recommend storing the S3 Path in the payload itself in binary format, or shall I create a separate attribute?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.