AWS has multiple options for this kind of workload that can be used. Prescribing a solution is harder without having all the details regarding producers/consumers and other requirements. I till try to give you some light regarding a few options.
S3 is well suited to be a data lake. You will keep raw data there for processing somewhere. Usually, ETLs will spun up, download data from S3, process it and save in another datastore.
This second datastore will be the data warehouse (DW) where you have some data that has been processed and has some business value. From there it should be easier to run analytics jobs, because DW solutions are usually optimized for that kind of things (like Redshift).
As for speed, it depends on a bunch of factors.
- Is your data spread in multiple files where you could process them in parallel?
- Can you optimize the code?
- Are you hitting CPU/memory/IO limits?
- Is the download time (from S3) acceptable?
Sorry for not having a more prescriptive answer, but I hope that helps you a little bit.
Faster processing: EBS Vs S3Accepted Answerasked 17 days ago
Is it redundant to have an EC2 instance and its EBS volumes in the same AWS Backup resource assignment?Accepted Answerasked a year ago
How to migrate EBS data of an EC2 without private keyasked 4 years ago
Is there an option to create an EFS from an EBS snapshot?Accepted Answerasked 9 months ago
How can i check files and data stored in EBS volumeasked 14 days ago
Setup sync between EBS volume and S3 using DataSyncasked a month ago
Using Amazon EBS for Data LakeAccepted Answerasked 10 months ago
ec2 Standard Reserved instance: Can you increase the EBS size from the original size when the instance was created?asked 2 months ago
EC2 instance using Instance StoreAccepted Answerasked 2 hours ago
Data transfer speeds from S3 bucket -> EC2 SLURM cluster are slower than S3 bucket -> Google SLURM clusterasked 4 months ago