Right directory structure to optimize read throughput

1

We are trying to determine how to organize our S3 bucket for optimizing Read operation for this use case. We have a daily job that will write few single digit million files (each file less than 1 MB) to the bucket. The read pattern would be spread throughout the day with parallel requests to read one different file per request. We are thinking to choose between 3 options

  1. Write job creates new directory for each daily run in the bucket. Distributes few files in the daily directory into 36 sub-directories hashed by last character in alphanumeric file name (26 characters + 10 digits)
  2. The bucket contain 36 directories and each directory contains new directory for each daily run.
  3. Manually create 36 partitions and find a way to randomly distribute data (if at all possible)

Which of these options will provide maximum throughput for randomly accessing each file? We are looking at the docs here https://docs.aws.amazon.com/AmazonS3/latest/userguide/organizing-objects.html Remember the read traffic on a particular day will be interested in reading data for a single day's run. So for option 1 will route all daily read requests to single directory which will limit throughput to 5500 as mentioned in the best practices https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

Which options here will give us the highest throughput?

preguntada hace 2 años94 visualizaciones
No hay respuestas

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas