dnf/yum both fails while being executed on instance bootstrap on Amazon Linux 2023

3

Basic description of the problem:

Both commands listed below executed while instance is bootstraping fails with message:

RPM: error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)

dnf --assumeyes --releasever=2023.0.20230503 update
yum install -y java-1.8.0-amazon-corretto jq ruby

If I connect to the instance after bootstrap and executed manually both commands, they run without problem.

What´s best approuch to get those two commands executed when instance launch?

Additional info:

The commands were created/launched this way:

First I create a LaunchTemplate for an Autoscaling Group.

The image is an Amazon Linux 2023, AMI-ID: ami-02396cdd13e9a1257

The userdata that runs the commands contains this:

## This script is intended to upgrade instance to the latest Amazon Linux release version.
UPDATE_RELEASE_VERSION=`dnf check-release-update 2>&1 |grep Version|grep -v Available|awk -F"[ :]" '{print $4}'`
if [[ -n "${!UPDATE_RELEASE_VERSION}" ]]
then
     echo "Updating Amazon Linux release to ${!UPDATE_RELEASE_VERSION}"
     sudo dnf --assumeyes --releasever=${!UPDATE_RELEASE_VERSION} update. ## This command fails
else
     echo "There is no newer release of Amazon Linux than the current one. Skipping update."
fi
yum install -y java-1.8.0-amazon-corretto jq ruby

The expression ${!UPDATE_RELEASE_VERSION} evaluates to 2023.0.20230503

And both dnf and yum fails with same message:

RPM: error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)

asked a year ago2908 views
8 Answers
0

Indeed I posted just an small excerpt of the script for the sake of simplicity and focus in what I think may be the problem.

Here is a fully functional excerpt of the script, followed by the errors seen at /var/log/cloud-init-output.log

Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0
--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

#!/bin/bash
AWS_REGION=us-east-1
echo "AWS_REGION=${!AWS_REGION}"
UPDATE_RELEASE_VERSION=`dnf check-release-update 2>&1 |grep Version|grep -v Available|awk -F"[ :]" '{print $4}'`
if [[ -n "${!UPDATE_RELEASE_VERSION}" ]]
then
  echo "Updating Amazon Linux release to ${!UPDATE_RELEASE_VERSION}"
  sudo dnf --assumeyes --releasever=${!UPDATE_RELEASE_VERSION} update
else
  echo "There is no newer release of Amazon Linux than the current one. Skipping update."
fi
dnf install --assumeyes java-1.8.0-amazon-corretto ruby3.2 jq
# Create user Kafka and Install Kafka from Apache repo 
adduser kafka
pushd /home/kafka/
curl "https://archive.apache.org/dist/kafka/2.4.1/kafka_2.13-2.4.1.tgz" --create-dirs -o "./downloads/kafka_2.13-2.4.1.tgz"
tar -xvzf ./downloads/kafka_2.13-2.4.1.tgz
mkdir /home/kafka/.aws
DEST=/home/kafka/.aws/config
echo "[default]" >> $DEST
echo "output = json" >> $DEST
echo "region = ${!AWS_REGION}" >> $DEST
chown -R kafka:kafka ./downloads ./kafka_2.13-2.4.1 ./.bash_profile ./.aws
popd
# Install codeploy agent
pushd /home/ec2-user
wget https://aws-codedeploy-us-east-1.s3.us-east-1.amazonaws.com/latest/install
chmod +x ./install
sudo ./install auto
popd
--//

The script finish run and inspecting /var/log/cloud-init-output.log show these errors while running commands at line 7 and 11 (dnf and yum installs respectively)

Cloud-init v. 22.2.2 running 'init' at Thu, 11 May 2023 17:33:00 +0000. Up 9.44 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++
ci-info: +--------+------+----------------------------+---------------+--------+-------------------+
ci-info: | Device |  Up  |          Address           |      Mask     | Scope  |     Hw-Address    |

... <log suppressed to avoid large message> ...

Cloud-init v. 22.2.2 running 'modules:config' at Thu, 11 May 2023 17:33:05 +0000. Up 14.36 seconds.
Cloud-init v. 22.2.2 running 'modules:final' at Thu, 11 May 2023 17:33:06 +0000. Up 15.59 seconds.
AWS_REGION=us-east-1
Updating Amazon Linux release to 2023.0.20230503
Amazon Linux 2023 repository                     18 MB/s |  13 MB     00:00
Amazon Linux 2023 Kernel Livepatch repository   387 kB/s | 156 kB     00:00
Dependencies resolved.
================================================================================
 Package                    Arch   Version                    Repository   Size
================================================================================
Installing:
 kernel                     x86_64 6.1.25-37.47.amzn2023      amazonlinux  31 M
Upgrading:
 amazon-linux-repo-s3       noarch 2023.0.20230503-0.amzn2023 amazonlinux  18 k
 bind-libs                  x86_64 32:9.16.38-1.amzn2023.0.1  amazonlinux 1.3 M
 bind-license               noarch 32:9.16.38-1.amzn2023.0.1  amazonlinux  16 k



... <log suppressed to avoid large message> ...



(22/24): bind-license-9.16.38-1.amzn2023.0.1.no 542 kB/s |  16 kB     00:00
(23/24): grub2-common-2.06-61.amzn2023.0.6.noar  25 MB/s | 1.8 MB     00:00
(24/24): grub2-pc-modules-2.06-61.amzn2023.0.6. 9.6 MB/s | 913 kB     00:00
--------------------------------------------------------------------------------
Total                                            29 MB/s |  42 MB     00:01
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
RPM: error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: Could not run transaction.


... <log suppressed to avoid large message> ...


The error shown above correspond to the execution of:

sudo dnf --assumeyes --releasever=${!UPDATE_RELEASE_VERSION} update

The other dnf command this time run successfully.

Doing some tests I tried to add a sleep 60 at begin of the script. Then commands runs without problems. Which makes me think that something in the bootstraping process is generating the rpm lock.

Perhaps some AWS initialization that takes place at same time as userdata get executed? In an older Amazon Linux 2 instance that I setup with similar script I saw yum complaining about lock too, but that version kept trying to acquire the lock until it succeeds. This dnf equivalent does not.

The sleep is not the ideal solution but can give some time to focus on other aspects of the project. Nevertheless, any solution to the lock problem will be appreciated.

answered a year ago
  • Perhaps some AWS initialization that takes place at same time as userdata get executed

    That could well be it, it shouldn't happen, but maybe on the odd occasion it does.

    Could you do the kafka download & install - which must take a few tens of seconds - before the dnf update, which would give dnf time to straighten out whatever it needs to do in the background.

    In the other issue the dnf update should have completely finished and cleaned up after itself before dnf install starts, but it's obviously not. If you put a sleep 2 before the dnf install does the problem persist?

  • RWC good suggestion doing kafka install first. Will try that and post here the result. Perhaps it gives more time to liberate the lock.

0

When using user data via cloud-init it is possible that other processes will be running simultaneously. If those processes are out of your control, it is best to just implement the required retry mechanisms in your own scripts. Here is an example of retrying yum installs:

max_attempts=5
attempt_num=1
success=false
while [ $success = false ] && [ $attempt_num -le $max_attempts ]; do
  echo "Trying yum install"
  yum update -y
  yum install java-1.8.0 java-17-amazon-corretto-devel.x86_64 wget telnet -y
  # Check the exit code of the command
  if [ $? -eq 0 ]; then
    echo "Yum install succeeded"
    success=true
  else
    echo "Attempt $attempt_num failed. Sleeping for 3 seconds and trying again..."
    sleep 3
    ((attempt_num++))
  fi
done

Related: https://repost.aws/questions/QU_tj7NQl6ReKoG53zzEqYOw/amazon-linux-2023-issue-with-installing-packages-with-cloud-init

profile pictureAWS
answered 5 months ago
0

This is likely just something that has crept in during a cut & paste, but your command that starts sudo dnf --assumeyes has got a dot at the end of it, which shouldn't be there.

This userdata script won't run anyway because cloud-init won't recognise it as a script, it needs to start with a shebang, e.g. #!/bin/bash.

I've tried standing up an EC2 with that same AMI & userdata, and the presence or absence of a shebang meant the software was or wasn't installed/updated.

Admittedly the error in my /var/log/cloud-init-output.log on those occasions was __init__.py[WARNING]: Unhandled non-multipart (text/x-not-multipart) userdata: 'b'## This script is intend'...' and not your one about the .rpm.lock file. So there may be more just the shebang that's a factor here. Is there more to your userdata script than just the ten lines you posted above?

profile picture
EXPERT
Steve_M
answered a year ago
  • RWC I posted a reply as a new answer because this comment space limited the characters I can write...

0

I've encountered the same issue moving some elastic beanstalk platform hooks to al2023. The issue in my case is you're not able to run rpm within a script that's already being run by rpm hence the lock file being unavailable.

We use platform hooks to execute arbitrary shell scripts at different points in the EBs life cycle and in AL2023 those life cycle hooks are actually ran by rpm I can only guess so it immediately limits us from installing packages with rpm as opposed to just some package or utility already existing in AL2023 repository via dnf/yum.

Hopefully this helps I found this same issue explanation on fedora when people run rpm scripts inside rpm scripts .

answered 8 months ago
  • @localpath I fixed this by moving scripts from prebuild -> predeploy

0

I have the same issue. I run dnf from a script called from user data and the dnf command fails with:


(43/45): glibc-headers-x86-2.34-52.amzn2023.0.3  24 MB/s | 448 kB     00:00
(44/45): perl-File-Find-1.37-477.amzn2023.0.5.n 795 kB/s |  26 kB     00:00
(45/45): gcc-11.3.1-4.amzn2023.0.3.x86_64.rpm    70 MB/s |  32 MB     00:00
--------------------------------------------------------------------------------
Total                                            64 MB/s |  59 MB     00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
RPM: error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: Could not run transaction.

In other words, the DNF command runs for a while, downloads artifacts, and then consistently fails. The is Amazon Linux 2023 al2023-ami-2023.1.20230809.0-kernel-6.1-x86_64.

Here is the source for the script:

#!/bin/env bash
# turn on echoing of commands and exit on errors
set -o xtrace
set -e
set -o pipefail
################### install_httpd.sh: script that will install httpd on the new instance

#===== Some initial output of state for debugging
# echo out all of the ENV variables that are supposed to be already set
echo "install_httpd.sh: Proof of life that the script ran" >> /var/opt/infor-install/userscript.txt
echo "install_httpd.sh: Proof of life that the script ran appended to /var/opt/infor-install/userscript.txt"
# echo "install_httpd.sh: Supposed to be set, INFOR_INSTALL_TOMCAT_VERSION=$INFOR_INSTALL_TOMCAT_VERSION"

#===== Do some package updates and install httpd.  Amazon Linux uses "dnf" which is the new "yum"
echo "install_httpd.sh: dnf update"
while pgrep yum || pgrep rpm || pgrep dnf; do sleep 5; echo "sleep"; done
dnf update -y
sleep 10
echo "install_httpd.sh: dnf install httpd"
while pgrep yum || pgrep rpm || pgrep dnf; do sleep 5; echo "sleep"; done
dnf install -y httpd-devel httpd

I added the pgrep command to try to make sure that yum, rpm, and dnf were not already running, but it does not help. And, the word "sleep" is not in the output.

answered 7 months ago
  • Was wondering if there was a solution for this. Having the same issue.

    #!/bin/bash
    
    set -e
    set -x
    
    sudo dnf update
    sudo sleep 5
    sudo dnf install docker -y
    

    Output

    ...
    ...
    Running transaction check
    Transaction check succeeded.
    Running transaction test
    Transaction test succeeded.
    Running transaction
    RPM: error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)
    The downloaded packages were saved in cache until the next successful transaction.
    You can remove cached packages by executing 'dnf clean packages'.
    Error: Could not run transaction.
    
0

Actually maybe found possible solution

sudo dnf upgrade --refresh rpm glibc
sudo rm /var/lib/rpm/.rpm.lock
dnf -y update
dnf install  <MY PACKAGES>

Ref: https://github.com/amazonlinux/amazon-linux-2023/issues/397

answered 7 months ago
  • I tried this and many other suggested methods and the only thing that worked for me was to stop the SSM agent as the very first command in userdata. Then start SSM agent at the end in userdata or cfn-init configset services. BTW using cloudformation and ami-03e34865d6f563985

    UserData:
     'Fn::Base64':
       !Sub |
         #!/bin/bash -xe
    
         systemctl stop amazon-ssm-agent
    
         dnf update -y aws-cfn-bootstrap
         ...
         dnf install <something>
         ...
         systemctl start amazon-ssm-agent
    
    services:
      sysvinit:
        amazon-ssm-agent:
          enabled: true
          ensureRunning: true
    
0

I've been trying to get to the bottom of this issue over the last 48 hours and I think it is caused by the SSM agent updater running at the same time as the user data scripts.

See: https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent-automatic-updates.html#ssm-agent-automatic-updates-console

I've disabled this in SSM Fleet Manager and the issue has gone away.

If you do disable the automatic SSM agent updates, it's important that you know the consequences of this and implement the updates in another way!

answered 5 months ago
0

I have the same issue with cloud-init

simple cloud-config like this

#cloud-config

repo_update: true
repo_upgrade: true
package_reboot_if_required: true

packages:
  - docker
  - postgresql15
  - python3-boto3

locale: en_AU
timezone: Australia/Brisbane

Gives me error

Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
RPM: error: can't create transaction lock on /var/lib/rpm/.rpm.lock (Resource temporarily unavailable)
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: Could not run transaction.

And I have approved this is random. Sometimes it works. Just recently, it becomes an issue. We are always using the latest AL2023.

Disable repo_upgrade doesn't help. Disable SSM auto upgrade doesn't help as well.

answered 21 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions