Right directory structure to optimize read throughput

1

We are trying to determine how to organize our S3 bucket for optimizing Read operation for this use case. We have a daily job that will write few single digit million files (each file less than 1 MB) to the bucket. The read pattern would be spread throughout the day with parallel requests to read one different file per request. We are thinking to choose between 3 options

  1. Write job creates new directory for each daily run in the bucket. Distributes few files in the daily directory into 36 sub-directories hashed by last character in alphanumeric file name (26 characters + 10 digits)
  2. The bucket contain 36 directories and each directory contains new directory for each daily run.
  3. Manually create 36 partitions and find a way to randomly distribute data (if at all possible)

Which of these options will provide maximum throughput for randomly accessing each file? We are looking at the docs here https://docs.aws.amazon.com/AmazonS3/latest/userguide/organizing-objects.html Remember the read traffic on a particular day will be interested in reading data for a single day's run. So for option 1 will route all daily read requests to single directory which will limit throughput to 5500 as mentioned in the best practices https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

Which options here will give us the highest throughput?

posta 2 anni fa94 visualizzazioni
Nessuna risposta

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande