Saltar al contenido

Best practices for storing Zstd-compressed binary data as String vs Binary attribute in DynamoDB?

0

AWS Documentation here recommends using compression algorithms such as GZIP or LZO to produce binary output in a Binary attribute type within the item.

I was wondering If I can use Zstd algorithm and store it in a simple String format. In my benchmarks, Zstd performed better, it provided better speed and compression level. I plan to use a String attribute because, in case the item size exceeds 400KB even after compressing the payload, I plan to store the payload in S3 and save the link in the DDB table.

My table structure:

PK (String)
SK (String)
ref (String): Possible values inline/ref. Ref means data is stored in S3, inline means it's stored in compressed format in DDB itself
payload (Binary/String): Binary if ref is inline, otherwise String.
preguntada hace 2 meses54 visualizaciones
1 Respuesta
1
Respuesta aceptada

For storing Zstd-compressed data in DynamoDB, you should use the Binary attribute type rather than String, even though you're considering Zstd instead of GZIP or LZO.

Binary type attributes are specifically designed to store any binary data, including compressed text. When you use the Binary type, DynamoDB properly handles the data as binary content rather than text. Your applications would need to encode the binary values in base64 format before sending them to DynamoDB, and upon receipt, DynamoDB will decode the data into an unsigned byte array.

While you could technically store base64-encoded compressed data in a String attribute, this approach is less efficient because:

  1. It doesn't leverage the proper data type semantics
  2. Base64 encoding increases the size by approximately 33% compared to storing it as a Binary attribute

Regarding your table structure, your approach of using a "ref" attribute to indicate whether data is stored inline or in S3 is a good practice. When the compressed data exceeds DynamoDB's 400KB item size limit, storing it in S3 and keeping only the reference in DynamoDB is the recommended approach.

For your specific schema:

  • PK (String) and SK (String) are appropriate
  • ref (String) with values "inline"/"ref" is a good approach
  • payload should be Binary when ref="inline" (not String)
  • When ref="ref", your S3 location can be stored as a String

While Zstd may perform better in your benchmarks than GZIP or LZO, the choice of compression algorithm doesn't change the recommendation to use Binary attribute type for storing the compressed data in DynamoDB.
Sources
Best practices for storing large items and attributes in DynamoDB - Amazon DynamoDB
Supported data types and naming rules in Amazon DynamoDB - Amazon DynamoDB
DynamoDB low-level API - Amazon DynamoDB
Maximum Item Size in DynamoDB for Storing Base64 Images | AWS re:Post
DynamoDB overview - AWS Prescriptive Guidance

respondido hace 2 meses
AWS
EXPERTO
revisado hace 2 meses
  • @MikeLim Thanks. One question.

    When ref="ref", your S3 location can be stored as a String

    Do you recommend storing the S3 Path in the payload itself in binary format, or shall I create a separate attribute?

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.