- Newest
- Most votes
- Most comments
The issue you're experiencing is likely related to how Amazon S3 handles large file uploads, especially when streaming from stdin. Here are some insights and potential solutions to your problem:
-
Multipart Upload: When uploading large files to S3, especially those larger than 100MB, AWS automatically uses multipart upload. This process splits the file into smaller parts for more efficient and reliable uploads.
-
Stdin Limitations: When streaming from stdin, the AWS CLI might have difficulty determining the total file size, which can affect how it manages the multipart upload process.
-
Expected-Size Parameter: While the --expected-size parameter can help in some cases, it may not always resolve issues with stdin uploads, especially for very large files.
To address your specific situation:
-
Increase Multipart Chunk Size: You mentioned that the --multipart-chunk-size-mb parameter isn't supported in your version. Consider updating your AWS CLI to the latest version, as this parameter can be crucial for managing large uploads.
-
Use S3 Transfer Acceleration: If available in your region, you could try enabling S3 Transfer Acceleration, which can improve upload speeds for large files.
-
Split the Upload: Instead of streaming the entire tar file, you could split it into smaller chunks and upload each separately. This approach gives you more control over the upload process.
-
Use a Different Tool: Consider using other AWS-supported tools like s3cmd or the AWS SDK, which might handle large stdin uploads more effectively.
-
Temporary File Approach: While not ideal, your method of creating a temporary tar file and then uploading it seems to work reliably. If other solutions fail, this might be the most practical approach, especially if you can clean up the temporary file immediately after the upload.
-
Parallel Upload: If possible, use a transfer application capable of performing parallel requests across multiple connections. This can significantly improve transfer speeds and reliability for large files.
To maximize S3 transfer performance, ensure that your EC2 instance and S3 bucket are in the same AWS Region, and that your EC2 instance has sufficient network throughput to handle the large file transfer.
If you continue to face issues, it might be worth reaching out to AWS support for more specific guidance tailored to your setup and AWS CLI version.
Sources
Maximizing Amazon S3 transfer throughput to Amazon EC2 instances | AWS re:Post
Upload large files to S3 | AWS re:Post
Upload large files to S3 | AWS re:Post
It turns out that the error was mine by failing to realise that du returns the size in kb, not bytes. My calculated expected-size was therefore incorrect by a factor of 1024. Once I added the -b switch to du, correcting the calculated expected-size, the s3 cp worked as expected.
S3 seemingly ignores the expected-size parameter when copying from a file and gets the size from the file itself. But when using stdin it has no way of working out the size and has to rely on the expected-size provided. This explains the difference in behaviour that I experienced between copying from stdin and copying from a file.
Relevant content
- asked 8 months ago
- asked 2 years ago
- asked 8 years ago
- asked 2 years ago
