what is the best way to transfer data from on-prem to s3 over internet

0

what is the best way to transfer data from on-prem to s3 over internet, i have 2 option SFTP or s3 cp/sync. I cannot use datasync since it would require Public IP to deploy datasync agent. The initial data is around 400GB and once every week we will be transferring around 100-150GB data.

GB
asked a month ago371 views
4 Answers
1

Hello.

If it is via the Internet, I think it is better to use "aws s3 cp" or "aws s3 sync".
Since you can use multipart upload, you can upload large files at a certain speed.
https://repost.aws/knowledge-center/s3-multipart-upload-cli

For example, if you use a dedicated line connection such as DirectConnect instead of via the Internet, you can also use the following transfer methods.
https://docs.aws.amazon.com/datasync/latest/userguide/s3-cross-account-transfer.html

profile picture
EXPERT
answered a month ago
profile picture
EXPERT
Steve_M
reviewed a month ago
1

Hello,

Already mentioned, "aws s3 cp" or "aws s3 sync" are best way to transfer data to s3 via internet. To accelerate transfer speed, you can enable CRT(Common Runtime) with below command.

#aws configure set default.s3.preferred_transfer_client crt

https://aws.amazon.com/ko/blogs/storage/improving-amazon-s3-throughput-for-the-aws-cli-and-boto3-with-the-aws-common-runtime/

TA of s3 bucket can also be enabled if it is far from the source location and region.

AWS
answered a month ago
0

As Riku mentioned, aws s3 sync' or aws s3 cp` works well across internet.

Depending on use case, you may want to explore deploying Amazon S3 File Gateway to on-prem.

It provides a file server interface that supports NFS and SMB protocols. S3 File Gateway provides low-latency access to data through transparent local caching. A S3 File Gateway manages data transfer to and from AWS, buffers applications from network congestion, optimizes and streams data in parallel, and manages bandwidth consumption.

You deploy the gateway into your on-premises environment as a virtual machine (VM) running on VMware ESXi, Microsoft Hyper-V, or Linux Kernel-based Virtual Machine (KVM), or as a hardware appliance

Refer to How Amazon S3 File Gateway works for overview.

AWS
EXPERT
Mike_L
answered a month ago
profile picture
EXPERT
Steve_M
reviewed a month ago
0

Hello,

To use AWS DataSync, you wouldn't need public IP assigned to the agent. DataSync agent VM which is deployed on on-premises would need access to the DataSync public endpoints. DataSync agaent VM can be in a private network with private IP and access DataSync endpoints through a NAT etc. You can also activate the agent to VPC endpoints and use it if you have network connectivity from on-premises to AWS either through DirectConnect or site to site VPN.

https://docs.aws.amazon.com/datasync/latest/userguide/choose-service-endpoint.html

You may consider DataSync further evaluating your use-case. Based on the notes in the question, I would recommend using AWS CLI s3 cp/sync as mentioned above.

psp
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions