Streaming data from S3 to containers launched from AWS Batch

0

For processing genomics datasets in containers launched from AWS Batch, is it possible to stream the data directly from S3, using an approach similar to SageMaker pipe mode (or fast file mode)?

Also, are there any hardware limitations that I should be aware of? For example, in this article, the maximum transfer speed between EC2 and S3 is 100 Gbps (VPC endpoints and public IPs in the same region). In a 2018 article, the bandwidth between EC2 and S3 was stated as 25 Gbps

1 Answer
1

Hello,

Thanks for reaching out to AWS re:post. I understand that you want to stream data directly from S3. S3 does have a feature you can do this with Amazon OpenSearch Service.

OpenSearch is great for real-time application monitoring, log analytics, and website search. You can use it with s3 to stream data. I have provided a link in this response that shows you an example case of this. [1]

I recommend looking at Amazon FSx, it is great for genomics workflows. It provides scalability and data availability of S3 [2]. AWS also provides a series on how to build a genomics batch workflow on AWS. For more insight on this, I recommend look at this link [3].

Please contact if you have any further questions, and feel free to reach out to us via a support case to facilitate a discussion on the specifics of your resources.

[1] Streaming data Example- https://docs.aws.amazon.com/solutions/latest/text-analysis-with-amazon-opensearch-service-and-amazon-comprehend/example-of-streaming-data-from-s3.html

[2] Amazon FSx - https://aws.amazon.com/blogs/storage/using-amazon-fsx-for-lustre-for-genomics-workflows-on-aws/

[3] Genomics Batch Workflows on AWS - https://aws.amazon.com/blogs/compute/building-high-throughput-genomics-batch-workflows-on-aws-introduction-part-1-of-4/

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions