Right directory structure to optimize read throughput

1

We are trying to determine how to organize our S3 bucket for optimizing Read operation for this use case. We have a daily job that will write few single digit million files (each file less than 1 MB) to the bucket. The read pattern would be spread throughout the day with parallel requests to read one different file per request. We are thinking to choose between 3 options

  1. Write job creates new directory for each daily run in the bucket. Distributes few files in the daily directory into 36 sub-directories hashed by last character in alphanumeric file name (26 characters + 10 digits)
  2. The bucket contain 36 directories and each directory contains new directory for each daily run.
  3. Manually create 36 partitions and find a way to randomly distribute data (if at all possible)

Which of these options will provide maximum throughput for randomly accessing each file? We are looking at the docs here https://docs.aws.amazon.com/AmazonS3/latest/userguide/organizing-objects.html Remember the read traffic on a particular day will be interested in reading data for a single day's run. So for option 1 will route all daily read requests to single directory which will limit throughput to 5500 as mentioned in the best practices https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

Which options here will give us the highest throughput?

gefragt vor 2 Jahren94 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen