I want to improve the speed when I transfer data from my Amazon Elastic Compute Cloud (Amazon EC2) instance to my Amazon Simple Storage Service (Amazon S3) bucket.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Use enhanced networking on the Amazon EC2 instance
Prerequisite: If you use Enhanced Network Adapter (ENA), then, complete your instance's prerequisites for enhanced networking. If you use the Intel 82599 VF interface, then prepare your instance for enhanced networking.
Review your instance's metrics with Amazon CloudWatch. If your instance's NetworkPacketsOut value is greater than the value that you expect, then activate enhanced networking.
Use parallel workloads for the data transfer
To improve the time that it takes to complete the data transfer, split the transfer into multiple mutually exclusive operations. For example, if you use the AWS CLI, then run multiple cp, mv, or sync AWS CLI commands at the same time.
If you spread data across multiple prefixes, then run multiple AWS CLI operations that perform separate syncs at the same time. For instructions, see the Run multiple AWS CLI operations section of How do I improve data transfer performance when I use the AWS CLI sync command for Amazon S3?
Customize the upload configurations on the AWS CLI
To speed up the data transfer, customize the following AWS CLI configuration values for Amazon S3:
- The multipart_chunksize value sets the size of each part that AWS CLI uploads in a multipart upload for an individual file. Optimize this value to separate larger files into smaller parts for quicker upload speeds.
Note: A multipart upload requires you to upload a single file in no more than 10,000 distinct parts. Make sure that the chunk size that you set balances the part file size and the number of parts.
- Change the max_concurrent_requests value to increase the number of requests that you can send to Amazon S3 at a time. The default value is 10. After you increase this value, you can receive a stagnant response. Combine a higher max_concurrent_requests value with parallel workloads to see better transfer speeds overall. By default, the AWS CLI supports multithreading.
Note: Make sure that your machine has enough resources to support the maximum number of concurrent requests.
Use a VPC endpoint for Amazon S3
If your instance is in the same AWS Region as your Amazon S3 bucket, then use an Amazon Virtual Private Cloud (Amazon VPC) gateway endpoint.
You can also privately connect to a VPC without an internet gateway, NAT device, or VPN connection. Instances in a VPC don't require public IP addresses to communicate with your Amazon S3 bucket. When you use an Amazon VPC endpoint, the data traffic between your VPC and Amazon S3 routes on the AWS network.
Note: Amazon VPC endpoints for Amazon S3 don't support requests across different AWS Regions.
Use Amazon S3 Transfer Acceleration between geographically distant Regions
Data transfer speeds can increase when your instance and the Amazon S3 bucket are in geographically close Regions. If the instance and the bucket are in geographically distant Regions, then activate Amazon S3 Transfer Acceleration. Before you activate Amazon S3 Transfer Acceleration, review Amazon S3 pricing.
To determine if Transfer Acceleration improves data transfer speeds for your instance, use the Amazon S3 Transfer Acceleration speed comparison tool.
Note: When you use S3 Transfer Acceleration, you can't use the CopyObject API call across Regions.
Upgrade your Amazon EC2 instance type
High Amazon EC2 instance CPU utilization can impact data transfer speeds. You can change your instance to an instance type that provides greater memory and network performance.
Note: To ensure a reliable network connection between your Amazon EC2 instance and Amazon S3, choose an instance type with at least 10 GiBps of connectivity.
Use chunked transfers
If you're transferring large files, then use multipart uploads and ranged HTTP multipart requests.
Related information
Best practices design patterns: optimizing Amazon S3 performance