Skip to content

EMR step fails because of version issue

0

Hi Mate,

I have steps running on EMR, which was working till 13th January 2025. After that I tried running the job today and it started failing with Error like : AttributeError: module 'awscrt.checksums' has no attribute 'crc64nvme'.

emr-7.5.0

Cheers

asked 9 months ago684 views
3 Answers
1

The error you're encountering with EMR steps failing due to an AttributeError related to 'awscrt.checksums' is not a commonly reported issue for EMR 7.5.0. However, it appears to be related to a version compatibility problem with one of the AWS libraries or dependencies.

To troubleshoot this issue, you can try the following steps:

  1. Check if there have been any recent updates to the AWS SDK or related libraries that your EMR steps depend on. The error suggests a mismatch between the expected and available attributes in the 'awscrt.checksums' module.

  2. Review your EMR step code and ensure it's compatible with the EMR 7.5.0 environment. There might be changes in the underlying libraries that require adjustments in your code.

  3. Consider creating a bootstrap action that updates or installs the correct version of the AWS SDK or any other required libraries before your steps run.

  4. If possible, try running your steps on an earlier EMR version (e.g., 7.4.0 or 6.x) to see if the issue persists. This can help isolate whether it's specific to EMR 7.5.0 or a broader compatibility issue.

  5. Check the EMR and EC2 instance logs for any additional error messages or warnings that might provide more context about the failure.

If the issue persists after trying these steps, you may need to reach out to AWS Support for further assistance, as this seems to be a specific compatibility issue that may require deeper investigation.
Sources
EMR Cluster failure with "Failed to start the job flow due to an internal error" | AWS re:Post

answered 9 months ago
AWS
EXPERT
reviewed 9 months ago
0

We encountered the same problem and figured out, that it is a problem with the versions of boto3, botocore and awscrt. The problem also occurs in EMR serverless with the following versions installed (standard):

boto3==1.36.10 botocore==1.36.10 awscrt==0.19.19

We reproduced it by installing this versions on EMR 7.5.0 on EC2. It occurs when there is a manually uploaded SQL file (since there is an automatic check sum with algo crc64nvme on it), which then is read in in a python script using boto3 s3 client, reading in the file and put its input to spark.sql() method.

In EMR on EC2 we have a bootstrap that installs the latest boto3 version and altered the installation to also upgrade awscrt (notice: with boto3 and botocore 1.36.17 it works with the outdated awscrt==0.19.19). If you use the above versions of boto3 and botocore (1.36.10) and only upgrade the awscrt to the latest version it also works. So there should maybe be an upgrade of the awscrt packe in standard installation of EMR on EC2 and EMR serverless to tackle this issue.

answered 9 months ago
  • How did you bundle the upgraded version of boto3/awscrt? I tried bundling them as a zip and passing it using Spark's spark.submit.pyFiles configuration but that doesn't work. I get module missing errors.

    I found another way to solve the checksum issue is downgrading to EMR Serverless 7.4.

  • In EMR on EC2 we added a bootstrap script which updates boto3 and awscrt by using:

    sudo pip3 install --ignore-installed boto3[crt]

    On EMR Serverless we also downgraded the version, what actually helped to fix the issue. Hopefully it will be fixed in one of the next versions. It is on the roadmap as the customer service explained to me, but no details, when it will be fixed.

0

installed pkg version:

awscli 2.17.18 awscrt 0.19.19 boto 2.49.0 boto3 1.36.12 botocore 1.36.12 s3fs 0.4.2 s3transfer 0.11.2

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.