Should DataSync Agent be deployed in a separate VPC?

0

Context: I have a setup with an existing VPC with sensitive workloads, in terms of security. I would like to setup DataSync (ideally on AWS EC2) so that a task can synchronise an MS Azure Blob Storage by reading my source S3 bucket.

My Research: For a AWS to MS Azure data transfer, I am finding extremely difficult to find information on the Internet about where the DataSync Agent should be deployed. Can it be deployed on AWS? Or should it be deployed in the other Cloud, as shown there: https://docs.aws.amazon.com/datasync/latest/userguide/how-datasync-transfer-works.html

I understand that the Agent needs to be able to read S3, and the simplest way to do this is to create a VPC endpoint, where deploying the agent will also deploy 4 ENIs. Is that the only reason why it should connect to a VPC endpoint?

If it can be deployed in my AWS region, should I just consider creating a dedicated public subnet of my existing VPC (the one where existing workload is running)? Or, for isolation purposes, should I consider creating another dedicated VPC, with a public subnet in it?

Correct me if I'm wrong, but I am only considering public subnets to solve the problem of accessing the agent key over HTTP once deployed. I've read that AWS will subsequently close the Inbound 80 port.

Thank you.

2 Answers
0
Accepted Answer

(Deleted my previous answer as it was misleading, sorry about that)

My Research: For a AWS to MS Azure data transfer, I am finding extremely difficult to find information on the Internet about where the DataSync Agent should be deployed. Can it be deployed on AWS? Or should it be deployed in the other Cloud, as shown there: https://docs.aws.amazon.com/datasync/latest/userguide/how-datasync-transfer-works.html

The agent can be deployed in either AWS or Azure, both can work. For deploying the agent in Azure you can follow this guide.

I understand that the Agent needs to be able to read S3, and the simplest way to do this is to create a VPC endpoint, where deploying the agent will also deploy 4 ENIs. Is that the only reason why it should connect to a VPC endpoint?

You have several options here for the Agent to be able to communicate with the DataSync service and S3 when the agent is running on EC2:

  • Using the Public service endpoint by placing the agent in a public subnet and associating a public IP to it.
  • Using the Public service endpoint by placing the agent in a private subnet with route through a NAT gateway that is placed in a public subnet.
  • Using VPC endpoint. in this case the Agent can run in either public or private subnet as the VPC endpoint IP address will be from the VPC CIDR and reachable by local route entry. However, keep in mind that the agent will either way require internet access to reach Azure Blob, so a NAT gateway or public IP directly on the EC2 instance will be required anyway.

If it can be deployed in my AWS region, should I just consider creating a dedicated public subnet of my existing VPC (the one where existing payload is running)? Or, for isolation purposes, should I consider creating another dedicated VPC, with a public subnet in it?

Either can work, but if there is no real relation between the workloads you have in your existing VPC and this DataSync task I suggest to create additional VPC for separation of duties and lowering your radius blast in case of an event.

Correct me if I'm wrong, but I am only considering public subnets to solve the problem of accessing the agent key over HTTP once deployed. I've read that AWS will subsequently close the Inbound 80 port.

Inbound access to the Agent is required only for obtaining the activation key after which it is not required anymore and you can remove the port 80 inbound rule from the security group associated with the agent EC2 instance. Alternatively, you can also obtain the key manually and avoid opening inbound port altogether.

AWS
answered 9 days ago
profile pictureAWS
EXPERT
reviewed 9 days ago
  • I've now been able to deploy my agent, activate it, then create a location, and task that defines AWS S3 as Source, and MS Azure Blob Storage as destination. I've chosen to deploy the agent in a separate VPC, in a public subnet, protected by a SG that only opens those ports specified by doco.

  • (continued, due to 600 char limit)

    HOWEVER, uppercase... There was no such port 80 server running on the DataSync agent deployed by AMI ami-07c1689aa7fe4b763 as returned by recommended CLI command: aws ssm get-parameter --name /aws/service/datasync/ami --region eu-west-1

    Logging onto the Agent through SSM Session revealed that no server was listening onto 80, but 8080, confirmed by curl. I logged on as admin: sudo su - admin to gain access to the AWS Diagnostic console, to then manually activate agent (option 0), and paste key in AWS Console. No browser access, even 8080, was possible.

0

A benefit of deploying the agent in Azure would be compression from the agent in Azure cloud to AWS, this is dataset dependent but can help reduce egress costs if the dataset is compressible.

When the agent is deployed in EC2, private endpoint activation in the same availability zone as the DataSync endpoint is recommended to reduce costs from the agent EC2 communicating to the DataSync service over the public interface, which results in EC2 DTO costs.

Deployment information for the Agent with Azure can also be found in the DataSync docs [1] and relevant blog [2]. [1] https://docs.aws.amazon.com/datasync/latest/userguide/creating-azure-blob-location.html [2] https://aws.amazon.com/blogs/storage/migrating-azure-blob-storage-to-amazon-s3-using-aws-datasync/

AWS
answered 9 days ago
  • I should have specified that my use case forbids deployment in MS Azure, as a third-party manages that bit, and sells us endpoints only, not any other service.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions