S3 Tables is not compacting data unless the writer is killed.

0

I’m using AWS S3 Tables with an Apache Iceberg writer that writes data every 5–15 minutes. Over time, the number of small files in the table grew beyond 75,000, but I noticed that automatic compaction did not trigger as I expected. Interestingly, when I manually killed the Iceberg writer and abandoned the run, compaction completed successfully within 1–2 hours afterward.

I have reviewed the S3 Tables documentation but couldn’t find clear details on:

  1. What specifically triggers automatic compaction in S3 Tables?
  2. Does an active writer session prevent or delay compaction?
  3. Are there ways to configure, tune, or manually trigger the compaction frequency?

Any insights or best practices for managing compaction behavior in S3 Tables would be appreciated.

asked 12 days ago50 views
1 Answer
0

Thank you for your question about S3 Tables compaction behavior. I apologize for the confusion you're experiencing. Let me address your concerns and provide some insights based on the available information.

  1. Automatic compaction in S3 Tables is designed to optimize storage efficiency and improve query performance. While the exact triggering mechanism isn't explicitly stated in the documentation, it's generally based on factors such as the number of small files and their sizes relative to a target file size.

  2. Your observation about compaction occurring after killing the Iceberg writer suggests that an active writer session might indeed affect the compaction process. S3 Tables are designed to perform compaction without interfering with concurrent operations, but it's possible that the system prioritizes ongoing write operations over compaction to ensure data consistency.

  3. Regarding configuration and tuning of compaction:

    • You can configure the target file size for compaction at the table level. The default is 512MB, but it can be adjusted between 64MB and 512MB.
    • Compaction can be disabled at the table level if needed, using the AWS CLI.
    • There isn't a documented way to manually trigger compaction in S3 Tables.

Best practices and recommendations:

  • Review your write patterns: Consider if you can batch your writes to create larger files less frequently, which might reduce the need for compaction.
  • Monitor your table statistics: Keep an eye on the number and size of files in your table to understand when compaction might be necessary.
  • Adjust the target file size: If your use case allows, you might experiment with different target file sizes to find an optimal balance between write performance and compaction needs.
  • Consider periodic maintenance windows: If possible, schedule periods where you pause write operations to allow for compaction to occur more readily.

It's worth noting that the behavior you're experiencing might not be the intended functionality of S3 Tables. If this continues to be a significant issue for your workload, I recommend reaching out to AWS support for further assistance and to report this behavior. They may be able to provide more specific guidance or investigate if there's an underlying issue affecting the compaction process in your case.
Sources
S3 Tables maintenance - Amazon Simple Storage Service
How Amazon S3 Tables use compaction to improve query performance by up to 3 times | AWS Storage Blog
Tabular Data Storage At Scale - Amazon S3 Tables - AWS

profile picture
answered 12 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions