Skip to content

EFS access_point posix_user: How to add task_definition volume without CannotCreateContainerError: operation not permitted (UID:GID mismatch)

0

I have a EC2/ECS, that runs a single container/task. This task tries to bind a EFS access point for persistent storage. I want all the files on the EFS to be 1000:1000 by default. How you normally do this is by creating an EFS access_point and set posix_user, if the container has the directory as 0:0 inside the container? (The core of the problem is in the Volumes class at the end, in the reproducible example).

The Error

If the directory you're trying to mount INSIDE the container is owned by 1000:1000 already, everything works great! But if it's owned by 0:0, you get the following error (Indented just for readability):

CannotCreateContainerError:
    Error response from daemon:
        failed to copy file info for
            /var/lib/ecs/volumes/ecs-PosixBugStackValheimContainerNestedStackTaskDefinition8ABA8AF7-2-config-b4b9d9998e91d583f801:
        failed to chown
            /var/lib/ecs/volumes/ecs-PosixBugStackValheimContainerNestedStackTaskDefinition8ABA8AF7-2-config-b4b9d9998e91d583f801:
        lchown
            /var/lib/ecs/volumes/ecs-PosixBugStackValheimContainerNestedStackTaskDefinition8ABA8AF7-2-config-b4b9d9998e91d583f801:
        operation not permitted

The few things I could find on this error say to make sure the UID:GID should match to fix it, but is there anything you can do when the container is a third party that you can't change?

  • This GH Issue seems like a similar problem, does ECS/EFS have an equivalent "no_copy" flag I can test with?
  • Is there a way I can remove the pre-existing /data container path before mounting the access_point, so the two don't conflict?
  • Is there a way to add a second access_point for /, move the posix user to this instead of the EFS, but have it force all the files to match it's posix permissions?

What I've Tried

I've been at this for about a month, but here's what I remember trying:

  • A second access point with posix_user, but removing it from the EFS access point: This lets the container start up, but the files are still owned by root when I access them through the second one. It's only files created through the second one that would have the right permissions.
  • Just chowning the files: The problem is when to do it. If I do it when the EC2 starts, any files the container creates afterwards won't have the right permissions until the next restart.
  • Tried adding ecs.Capability.ALL to the container's linux parameters in case it was a kernel permissions issue instead, no luck

Reproducible CDK Code

I tried to make it as small as I could, but since you need a ECS Cluster / ASG, and those need a VPC, it kinda snowballed. Sorry about that.

The top of this file has two config's. I made it so the one not commented out is giving the above error, but you can switch to the commented out one to verify that if the container already has 1000:1000 permissions, everything works. Both containers are also 3rd party too.

./aws_posix_bug/aws_posix_bug_stack.py

from aws_cdk import (
    Stack,
    RemovalPolicy,
    NestedStack,
    aws_ecs as ecs,
    aws_ec2 as ec2,
    aws_efs as efs,
    aws_iam as iam,
    aws_autoscaling as autoscaling,
)
from constructs import Construct


# config = {
#     "Id": "Minecraft",
#     "Image": "itzg/minecraft-server",
#     "Environment": {
#         "EULA": "True",
#     },
#     "Path": "/data",
# }

config = {
    "Id": "Valheim",
    "Image": "lloesche/valheim-server",
    "Environment": {},
    "Path": "/config",
}

### Nested Stack info:
# https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.NestedStack.html
class Container(NestedStack):
    def __init__(
        self,
        scope: Construct,
        **kwargs
    ) -> None:
        super().__init__(scope, "ContainerNestedStack", **kwargs)
        container_id_alpha = "".join(e for e in config["Id"].title() if e.isalpha())
        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.TaskDefinition.html
        self.task_definition = ecs.Ec2TaskDefinition(self, "TaskDefinition")

        ## Details for add_container:
        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.TaskDefinition.html#addwbrcontainerid-props
        ## And what it returns:
        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.ContainerDefinition.html
        self.container = self.task_definition.add_container(
            container_id_alpha,
            command=["sleep", "infinity"],
            image=ecs.ContainerImage.from_registry(config["Image"]),
            essential=True,
            memory_reservation_mib=2*1024,
            environment=config["Environment"],
        )

class Volumes(NestedStack):
    def __init__(
        self,
        scope: Construct,
        vpc: ec2.Vpc,
        task_definition: ecs.Ec2TaskDefinition,
        container: ecs.ContainerDefinition,
        sg_efs_traffic: ec2.SecurityGroup,
        **kwargs,
    ) -> None:
        super().__init__(scope, "VolumesNestedStack", **kwargs)

        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_efs.AccessPointOptions.html#createacl
        self.efs_ap_acl = efs.Acl(owner_uid="1000", owner_gid="1000", permissions="755")
        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_efs.PosixUser.html
        posix_user = efs.PosixUser(uid=self.efs_ap_acl.owner_uid, gid=self.efs_ap_acl.owner_gid)

        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_efs.FileSystem.html
        self.efs_file_system = efs.FileSystem(
            self,
            f"Efs-{config["Id"]}",
            vpc=vpc,
            removal_policy=RemovalPolicy.DESTROY,
            security_group=sg_efs_traffic,
            allow_anonymous_access=False,
            enable_automatic_backups=False,
            encrypted=True,
        )
        self.efs_file_system.grant_read_write(task_definition.task_role)
        access_point_name = config["Path"].strip("/").replace("/", "-")
        ## Creating an access point:
        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_efs.FileSystem.html#addwbraccesswbrpointid-accesspointoptions
        ## What it returns:
        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_efs.AccessPoint.html
        container_access_point = self.efs_file_system.add_access_point(
            access_point_name,
            create_acl=self.efs_ap_acl,
            path=config["Path"],
            posix_user=posix_user,
        )

        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.TaskDefinition.html#aws_cdk.aws_ecs.TaskDefinition.add_volume
        task_definition.add_volume(
            name=access_point_name,
            # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.EfsVolumeConfiguration.html
            efs_volume_configuration=ecs.EfsVolumeConfiguration(
                file_system_id=self.efs_file_system.file_system_id,
                # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.AuthorizationConfig.html
                authorization_config=ecs.AuthorizationConfig(
                    access_point_id=container_access_point.access_point_id,
                    iam="ENABLED",
                ),
                transit_encryption="ENABLED",
            ),
        )
        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.ContainerDefinition.html#addwbrmountwbrpointsmountpoints
        container.add_mount_points(
            ecs.MountPoint(
                container_path=config["Path"],
                source_volume=access_point_name,
                read_only=False,
            )
        )

class EcsAsg(NestedStack):
    def __init__(
        self,
        scope: Construct,
        leaf_construct_id: str,
        vpc: ec2.Vpc,
        task_definition: ecs.Ec2TaskDefinition,
        sg_container_traffic: ec2.SecurityGroup,
        efs_file_system: efs.FileSystem,
        **kwargs,
    ) -> None:
        super().__init__(scope, "EcsAsgNestedStack", **kwargs)

        self.ecs_cluster = ecs.Cluster(
            self,
            "EcsCluster",
            cluster_name=f"{leaf_construct_id}-ecs-cluster",
            vpc=vpc,
        )

        self.ec2_role = iam.Role(
            self,
            "Ec2ExecutionRole",
            assumed_by=iam.ServicePrincipal("ec2.amazonaws.com"),
            description="The instance's permissions (HOST of the container)",
        )
        self.ec2_role.add_managed_policy(iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AmazonEC2ContainerServiceforEC2Role"))
        efs_file_system.grant_root_access(self.ec2_role)

        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ec2.LaunchTemplate.html
        asg_launch_template = ec2.LaunchTemplate(
            self,
            "AsgLaunchTemplate",
            instance_type=ec2.InstanceType("m5.large"),
            machine_image=ecs.EcsOptimizedImage.amazon_linux2023(),
            security_group=sg_container_traffic,
            role=self.ec2_role,
            http_tokens=ec2.LaunchTemplateHttpTokens.REQUIRED,
            require_imdsv2=True,
        )

        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_autoscaling.AutoScalingGroup.html
        self.auto_scaling_group = autoscaling.AutoScalingGroup(
            self,
            "Asg",
            vpc=vpc,
            launch_template=asg_launch_template,
            min_capacity=0,
            max_capacity=1,
            new_instances_protected_from_scale_in=False,
        )

        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.AsgCapacityProvider.html
        self.capacity_provider = ecs.AsgCapacityProvider(
            self,
            "AsgCapacityProvider",
            capacity_provider_name=f"{config["Id"]}-capacity-provider",
            auto_scaling_group=self.auto_scaling_group,
            enable_managed_termination_protection=False,
            enable_managed_draining=False,
            enable_managed_scaling=False,
        )
        self.ecs_cluster.add_asg_capacity_provider(self.capacity_provider)

        ## This creates a service using the EC2 launch type on an ECS cluster
        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.Ec2Service.html
        self.ec2_service = ecs.Ec2Service(
            self,
            "Ec2Service",
            cluster=self.ecs_cluster,
            task_definition=task_definition,
            enable_ecs_managed_tags=True,
            daemon=True,
            min_healthy_percent=0,
            max_healthy_percent=100,
        )

class AwsPosixBugStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ec2.Vpc.html
        self.vpc = ec2.Vpc(
            self,
            "Vpc",
            nat_gateways=0,
            max_azs=1,
            subnet_configuration=[
                ec2.SubnetConfiguration(
                    name=f"public-{construct_id}-sn",
                    subnet_type=ec2.SubnetType.PUBLIC,
                )
            ],
            restrict_default_security_group=True,
        )

        # https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ec2.SecurityGroup.html
        self.sg_container_traffic = ec2.SecurityGroup(
            self,
            "SgContainerTraffic",
            vpc=self.vpc,
            allow_all_outbound=True,
        )
        self.sg_efs_traffic = ec2.SecurityGroup(
            self,
            "SgEfsTraffic",
            vpc=self.vpc,
            allow_all_outbound=False,
        )
        self.sg_efs_traffic.connections.allow_from(
            self.sg_container_traffic,
            port_range=ec2.Port.tcp(2049),
        )
        ## Tie the Nested Stacks Finally:
        self.container_nested_stack = Container(
            self,
            description=f"Container Logic for {construct_id}",
        )
        self.volumes_nested_stack = Volumes(
            self,
            description=f"Volume Logic for {construct_id}",
            vpc=self.vpc,
            task_definition=self.container_nested_stack.task_definition,
            container=self.container_nested_stack.container,
            sg_efs_traffic=self.sg_efs_traffic,
        )
        self.ecs_asg_nested_stack = EcsAsg(
            self,
            description=f"Ec2Service Logic for {construct_id}",
            leaf_construct_id=construct_id,
            vpc=self.vpc,
            task_definition=self.container_nested_stack.task_definition,
            sg_container_traffic=self.sg_container_traffic,
            efs_file_system=self.volumes_nested_stack.efs_file_system,
        )

Thank you so much!

1 Answer
-1

The issue you're experiencing is related to a permission mismatch between the container's file system and the EFS access point. When the container has a directory owned by 0:0 (root) and you try to mount an EFS access point with POSIX user 1000:1000, you get the "operation not permitted" error due to this mismatch.

Here are some solutions you can try:

  1. Add mount options to allow other users access: You can modify your EFS volume configuration to include mount options that allow other users to access the mounted directory. Add the --allow-other option to your mount configuration.

In your Volumes class, modify the task_definition.add_volume call to include mount options:

task_definition.add_volume(
name=access_point_name,
efs_volume_configuration=ecs.EfsVolumeConfiguration(
file_system_id=self.efs_file_system.file_system_id,
authorization_config=ecs.AuthorizationConfig(
access_point_id=container_access_point.access_point_id,
iam="ENABLED",
),
transit_encryption="ENABLED",
),
# Add mount options here
mount_options=["allow-other"],
)
  1. Use container init process: Enable the init process in your container, which can help with permission handling:
self.container = self.task_definition.add_container(
container_id_alpha,
command=["sleep", "infinity"],
image=ecs.ContainerImage.from_registry(config["Image"]),
essential=True,
memory_reservation_mib=2*1024,
environment=config["Environment"],
linux_parameters=ecs.LinuxParameters(
self, f"{container_id_alpha}LinuxParams",
init_process_enabled=True,
),
)
  1. Use a lifecycle hook or entry point script: Add a custom entry point script to your task definition that runs before your main container process and handles the permission issues by creating the directory with the correct permissions or changing ownership.

  2. Create a custom container image: If possible, create a custom container image based on the third-party image that pre-configures the directory with the correct permissions.

  3. Use a different mount point: Instead of mounting directly to the container's existing directory, mount to a different path and then create a symbolic link or use a volume binding within the container.

Since you're working with third-party containers that you can't modify, options 1 and 2 are likely the most straightforward to implement first.
Sources
Creating an Amazon ECS task definition using the console - Amazon Elastic Container Service
Install Amazon S3 CSI driver on an Amazon EKS cluster | AWS re:Post

answered a year ago
  • For #1) mount_options doesn't appear under add_volumes()/

    For #2) Reading up on docker run --init (which looks like it's what that flag maps to), it's related to stopping zombie processes inside the container. I'm not sure how that'd help here.

    For #3) That'd only affect files that are already there on startup, not any that are created after.

    For #4) It's impossible for this project. I want other people to eventually use the project, and be able to just pass the container uri and go. That's the main reason I want to figure out how to make the files 1000:1000 too, to make it as easy to use as possible. Asking people to chown the files before being able to copy them in/out seems like a step I can avoid.

    For #5) Yes I can use a second mount point, but only files created through that mount point will have the right permissions. The files created through the container's mount point will still be owned by root.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.