- Más nuevo
- Más votos
- Más comentarios
You could use AWS Storage Gateway as an Amazon S3 File Gateway. The File Gateway is deployed in your VPC on an EC2 instance and serves up a NFS mount in front of your S3 bucket.
You can copy data between S3 buckets using the AWS CLI: aws s3 cp s3://source-bucket/ s3://destination-bucket/
but there will be a cost in terms of API requests and possibly data transfer if either of the buckets are not in the region that you run the command from.
Even if you do copy the data to another S3 bucket I don't think that solves the problem that you're describing - you want "filesystem" access to the data.
You could copy the data to EFS or FSx for Lustre; but either of those is going to have a cost associated with it as well.
S3FS is useful but you do need to be a little careful because it doesn't allow for multiple writers. Besides, performance may be an issue.
The best answer is to ensure that your code accesses S3 directly to grab the objects of interest and then manipulates them locally; rather than downloading the entire dataset to a location (be that EBS, your own S3 bucket, EFS, etc.). That will (most probably) involve code changes but it has the lowest cost (of AWS services) and the least number of workarounds.
ML Code uses this data for model training and evaluation - for training cycles I will need to load to GPU, process and be able to repeat model training cycles. I'd be repulling and unzipping this data multiple times unless I just pull down and store locally. Maybe I'll look into copying over to EFS - are download / data transfer rates the same for this compared to just transferring to a separate S3 bucket ?
Contenido relevante
- OFICIAL DE AWSActualizada hace un año
- OFICIAL DE AWSActualizada hace un año
- OFICIAL DE AWSActualizada hace 3 años
haven't used these resources yet, but I'll give them a look. Any idea on ballpark cost related with running/managing these services? Thanks!
Here is an estimate from the AWS Pricing Calculator, https://calculator.aws/#/estimate?id=a9a8dec1621d947a67d7807fcd3111e6c5bdcee0, most of the cost is the EC2 (M5.XL). I also assumed that the S3 bucket and the storage gateway were in the same region.