Right directory structure to optimize read throughput

1

We are trying to determine how to organize our S3 bucket for optimizing Read operation for this use case. We have a daily job that will write few single digit million files (each file less than 1 MB) to the bucket. The read pattern would be spread throughout the day with parallel requests to read one different file per request. We are thinking to choose between 3 options

  1. Write job creates new directory for each daily run in the bucket. Distributes few files in the daily directory into 36 sub-directories hashed by last character in alphanumeric file name (26 characters + 10 digits)
  2. The bucket contain 36 directories and each directory contains new directory for each daily run.
  3. Manually create 36 partitions and find a way to randomly distribute data (if at all possible)

Which of these options will provide maximum throughput for randomly accessing each file? We are looking at the docs here https://docs.aws.amazon.com/AmazonS3/latest/userguide/organizing-objects.html Remember the read traffic on a particular day will be interested in reading data for a single day's run. So for option 1 will route all daily read requests to single directory which will limit throughput to 5500 as mentioned in the best practices https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

Which options here will give us the highest throughput?

asked 2 years ago92 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions