Skip to content

secure Ground truth labeling job inside a vpc

0

Hi team,

I created a SageMaker Ground Truth labeling job and want to ensure it is secure. I followed this article: Run an Amazon SageMaker Ground Truth Labeling Job in an Amazon Virtual Private Cloud, but I'm not completely clear on it.

The article mentions that

your workers must access the worker portal from your VPC.

How is this possible? the workers receive an email with a link, such as https://g27kl2zxx.labeling.region.sagemaker.aws, to log into the labeling project, and they access it from their own laptops (outside the VPC?).

How can I use the instructions from the tutorial to secure my labeling job?

  • if I add aws:SourceVpc": "myVpcId" in the bucket policy, since workers are not logging in from within the VPC (from their laptops), will this not block workers from accessing the labeling job via the link they receive by email?

  • Additionally, should I also add the Ground Truth endpoint (from the Ground Truth service endpoints by region table) to the bucket policy for extra security?

any explanations/example policies would be greatly appreciated!

note: I created a private workTeam in ground truth connected to a workforce without a specific VPC configs in the workforce

thank you!

1 Answer
0

Thank you for your question about securing a SageMaker Ground Truth labeling job inside a VPC. I'll clarify some points and provide guidance on how to secure your labeling job effectively.

First, it's important to understand that there are two main scenarios for using Ground Truth in a VPC:

  1. Running a labeling job in a VPC
  2. Using Amazon VPC mode from a private worker portal

For your specific case, where workers are accessing the labeling job from their own laptops outside the VPC, you're correct that adding "aws:SourceVpc": "myVpcId" to the bucket policy would block access. This setting is more applicable when you're using a private workforce within your VPC.

To secure your labeling job for workers accessing it from outside your VPC, consider the following approaches:

  1. Use IP-restricted presigned URLs: This is a new feature that adds an extra layer of security by restricting access to presigned URLs based on the worker's IP address. You can enable this by specifying a WorkerAccessConfiguration object when creating or updating a workteam. This allows you to lock down access to specific IP addresses, enhancing data privacy and reducing the risk of unauthorized access.

  2. Encrypt your output data and storage volumes: Use AWS Key Management Service (KMS) to encrypt your output data and storage volumes. This ensures that your data is protected at rest.

  3. Configure workforce authentication and restrictions: Implement strong authentication methods for your workforce, such as email, SMS, or phone call verification. You can also restrict access to specific IP addresses or ranges to control who can access the labeling service.

  4. Use IAM policies and roles: Ensure that you have proper IAM permissions set up for users creating labeling jobs and for the execution role used to start the labeling job. Use the principle of least privilege when assigning permissions.

  5. Enable VPC endpoints for Ground Truth: Even if your workers are not in the VPC, you can still use VPC endpoints to ensure that communication between Ground Truth and other AWS services (like S3) stays within the AWS network, adding an extra layer of security.

Regarding adding the Ground Truth endpoint to the bucket policy, this is not typically necessary for extra security. Instead, focus on properly configuring your VPC endpoints and ensuring that your S3 bucket policy allows access from the Ground Truth service.

Remember, while you can't force external workers to access the labeling job from within your VPC, you can implement these security measures to protect your data and control access to your labeling tasks.
Sources
Using Amazon SageMaker Ground Truth in an Amazon Virtual Private Cloud - Amazon SageMaker
Use Amazon VPC Mode from a Private Worker Portal - Amazon SageMaker
Ground Truth Security and Permissions - Amazon SageMaker
Use IP-restricted presigned URLs to enhance security in Amazon SageMaker Ground Truth | AWS Machine Learning Blog

answered 2 years ago

  • where should i specify the workers sourceIp range? the accepted values for source IP is Enabled/Disbled so where is should specify my IP Range x.x.x.x/cidr?

    { "S3Presign": { "IamPolicyConstraints": { "SourceIp": "Enabled"|"Disabled", "VpcSourceIp": "Enabled"|"Disabled" } } }

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.