Skip to content

How do I troubleshoot an AWS Replication Agent installation failure on my Linux server?

7 minute read
1

I want to install the AWS Replication Agent for AWS Application Migration Service or AWS Elastic Disaster Recovery, but the installation failed.

Resolution

To automatically identify issues when you install the Application Migration Service or Elastic Disaster Recovery replication agent in Linux source servers, use the AWSSupport-TroubleshootLinuxMGNDRSAgentLogs automation runbook. The runbook uses the AWS Replication Agent installation log files to provide a list of detected errors and how to resolve them.

Before you start the AWSSupport-TroubleshootLinuxMGNDRSAgentLogs runbook, make sure that your AWS Identity and Access Management (IAM) user or role has the required permissions. For more information, see Required IAM permissions on AWSSupport-TroubleshootLinuxMGNDRSAgentLogs. Also, upload the installer-path/aws_replication_agent_installer.log replication agent log file to an Amazon Simple Storage Service (Amazon S3) bucket.

To run AWSSupport-TroubleshootLinuxMGNDRSAgentLogs, see Instructions on AWSSupport-TroubleshootLinuxMGNDRSAgentLogs.

Configure the following input parameters for ServiceName (Required):

  • To use Application Migration Service, select AWS MGN.
  • To use Elastic Disaster Recovery, select AWS DRS.

Or, run the following command to manually identify AWS Replication Agent installation errors:

less +G installer-path/aws_replication_agent_installer.log

Note: Replace installer-path with the path used to install the replication agent.

Based on the error that you identified, use the following troubleshooting steps to resolve the issue.

"failed to map segment from shared object: Operation not permitted" error

The installation script uses the /tmp directory. If you set noexec on /tmp, then libz.so can't map segments and you receive the following error message:

"error while loading shared libraries: libz.so.1: failed to map segment from shared object: Operation not permitted"

To resolve this issue, run the following command to mount the volume with execute permissions:

# sudo mount /tmp -o remount,exec

If you don't want to remove noexec from the /tmp directory, then add the following environment variable to the command:

TMPDIR='my_temp_dir' AGENT INSTALLATION COMMAND

Note: Replace my_temp_dir with a directory that doesn't have noexec and AGENT INSTALLATION COMMAND with the command that you use to install the agent.

Example command:

TMPDIR='temp1' sudo chmod +x aws-replication-installer-init; sudo ./aws-replication-installer-init

"security token included in the request is expired" error

If your IAM role expires, then API calls fail to the Application Migration Service or Elastic Disaster Recovery endpoint and you receive the following error message:

"botocore.exceptions.ClientError: An error occurred (ExpiredTokenException) when calling the GetAgentInstallationAssetsForDrs operation: The security token included in the request is expired [installation_id: 1a9af9d3-9485-4e02-965e-611929428c61, agent_version: 3.7.0, mac_addresses: 206915885515739,206915885515740, _origin_client_type: installer]"

To resolve this issue, request temporary security credentials to generate a new token. Or, install the role with an access key or secret access key for Application Migration Service or Elastic Disaster Recovery.

"ssl.SSLCertVerificationError" error

If you use an earlier operating system (OS) version with Python 3.10 or later, then you might receive the following error message:

"ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997) - urllib.error.URLError: urlopen error unknown url type: https"

Earlier OS versions don't have the latest OpenSSL library that supports Python 3.10. For more information, see PEP 644 -- Require OpenSSL 1.1.1 or newer on the Python Enhancement Proposals website. In this scenario, the AWS Replication Agent installation can't verify the SSL certificate to the Application Migration Service or Elastic Disaster Recovery endpoint.

To avoid this issue, use an earlier version of Python, such as version 2.7 or 3.8.

Note: To resolve most "urllib" or "SSL" errors, use an earlier version of Python.

"botocore.exceptions.CredentialRetrievalError" error

When you modify the AWSElasticDisasterRecoveryAgentRole or AWSApplicationMigrationAgentRole IAM service role, you receive the following error message:

"botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from cert: Oct 17, 2022 9:38:54 AM com.amazonaws.cloudendure.credentials_provider.SharedMain createAndSaveJks"

To resolve this issue, update the permissions policy for the IAM service role based on the service that you use.

Application Migration Service permissions policy:

{      "Version": "2012-10-17",  
    "Statement": [  
        {  
            "Effect": "Allow",  
            "Principal": {  
                "Service": "mgn.amazonaws.com"  
            },  
            "Action": [  
                "sts:AssumeRole",  
                "sts:SetSourceIdentity"  
            ],  
            "Condition": {  
                "StringLike": {  
                    "sts:SourceIdentity": "s-*",  
                    "aws:SourceAccount": "AWS-Account-Number"  
                }  
            }  
        }  
    ]

Note: Replace AWS-Account-Number with your AWS account ID.

Elastic Disaster Recovery permissions policy:

{      "Version": "2012-10-17",  
    "Statement": [  
        {  
            "Effect": "Allow",  
            "Principal": {  
                "Service": "drs.amazonaws.com"  
            },  
            "Action": [  
                "sts:AssumeRole",  
                "sts:SetSourceIdentity"  
            ],  
            "Condition": {  
                "StringLike": {  
                    "sts:SourceIdentity": "s-*",  
                    "aws:SourceAccount": "AWS-Account-Number"  
                }  
            }  
        }  
    ]  
}

Note: Replace AWS-Account-Number with your account ID.

"A dependency job for aws-replication.target failed" error

If the /var directory has 754 permissions or there was an issue when you created a Linux group for the aws-replication user, then you receive the following error message:

"stderr: A dependency job for aws-replication.target failed. See 'journalctl -xe' for details"

To resolve the /var issue, run the following command:

sudo chmod 755 /var

To resolve the Linux group issue, complete the following steps:

  1. Uninstall the AWS Replication Agent for Application Migration Service or Elastic Disaster Recovery.

  2. Run the following commands to delete the aws-replication user and aws-replication group:

    sudo userdel aws-replication 
    sudo groupdel aws-replication
  3. Reinstall the AWS Replication Agent for Application Migration Service or Elastic Disaster Recovery.

For installation prerequisites for Application Migration Service, see Installation requirements. For Elastic Disaster Recovery, see Installation requirements for AWS Replication Agent.

"Exception in thread "main" com.amazonaws.services.drs.model.InternalServerException" error

If you deactivate the AWS Security Token Service (AWS STS) endpoint, then you receive the following error message:

"Exception in thread "main" com.amazonaws.services.drs.model.InternalServerException: An unexpected error has occurred (Service: Drs; Status Code: 500; Error Code: InternalServerException; Request ID: 4f4a76cb-aaec-44cc-a07a-c3579454ca55; Proxy: null"

This error occurs because Application Migration Service and Elastic Disaster Recovery call AWS STS to assume the role in the client account. To resolve this issue, activate the STS endpoint in the client.

"could not insert module ./aws-replication-driver.ko:" error

If you activated SecureBoot on the source server, then you receive one of the following error messages:

"insmod: ERROR: could not insert module ./aws-replication-driver.ko: Required key not available"

-or-

"insmod: ERROR: could not insert module ./aws-replication-driver.ko: Key was rejected by service"

You can't use SecureBoot in a Linux OS with Application Migration Service or Elastic Disaster Recovery.

To resolve this issue, deactivate SecureBoot for the Linux OS.

Note: Typically, you use the hypervisor to deactivate SecureBoot.

To check the SecureBoot status, run the following command:

sudo mokutil --sb-state

"could not insert module ./aws-replication-driver.ko: Cannot allocate memory" error

If your Linux OS doesn't have enough memory to install the agent, then you receive the following error message:

"insmod: ERROR: could not insert module ./aws-replication-driver.ko: Cannot allocate memory"

To resolve this issue, make sure that your OS has at least 300 MB of free memory when you run the installation. This issue might occur because of memory fragmentation. To resolve memory fragmentation issues, reboot the source server. Also, check whether security or antivirus software such as Falcon, Trend Micro, SentinelOne, or McAfee causes memory or kernel protection that can block the agent installation.

"Unexpected error while making agent driver! ", "Kernel development package ...missing from repositories", or "Kernel development or header package...did not install" errors

When you install the agent, the installation downloads a kernel-devel package that matches your current running kernel package. You can find the current package in the package repository that's configured in your Linux OS. If the agent installation can't install the kernel-devel package in the Linux OS's running kernel, then you receive one of the following error messages:

"Unexpected error while making agent driver! Are kernel linux headers installed correctly?"

-or-

"Kernel development package for '************' are missing from repositories"

-or-

"Kernel development or header package for ************ did not install"

To resolve this issue, review the installation log to check for repository access issues.

Then, search for and manually download the kernel-devel package based on your distribution:

  • For Red Hat Enterprise Linux (RHEL), CentOS, Oracle, and SUSE package directory, see Search on the RPM website.
  • For Debian, see Packages on the Debian website.
  • For Ubuntu, see Ubuntu packages search on the Ubuntu packages website.

After you download the package, run the installation again. The AWS Replication Agent also installs dependencies that the installation requires, such as make gcc perl tar gawk rpm. For more information, see Linux installation requirements.

Related information

Troubleshooting agent issues

Troubleshooting Elastic Disaster Recovery

Troubleshooting (Application Migration Service)

4 Comments

Thanks for sharing all the tips, I am still facing the issue " Kernel version is not supported", I am running Ubuntu 22.04 VM on Azure and trying to migrate it to AWS using MGN.

replied 3 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
MODERATOR
replied 3 years ago

Remounting the /tmp can break my already running process in Prod ENV, Can you please share other workaround or solution for this?

replied 3 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
MODERATOR
replied 3 years ago