EC2 Image Builder WebDownload checksum doesn't match

0

I'm working on an Image Recipe, and I'm running into an issue with the WebDownload Checksum that has me stumped.

There's the component configuration that I have

"phases":
- "name": "build"
  "steps":
  - "action": "WebDownload"
    "inputs":
    - "algorithm": "SHA256"
      "checksum": "31c589c76d96965ddeec3e3d89c0bf5322513dbe3f523dcc8d2352c6167cdc71"
      "destination": "/tmp/pdflib.tar.gz"
      "source": "https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz"
    "maxAttempts": 1
    "name": "Download_PHPlib"
"schemaVersion": 1

I've verified the checksum on my laptop and on 2 different Amazon Linux 2023 EC2 instances using the following

curl https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz | sha256sum

which consistently results in

31c589c76d96965ddeec3e3d89c0bf5322513dbe3f523dcc8d2352c6167cdc71  -

If instead I do the following

curl https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz -o /tmp/phplib.tar.gz
sha256sum /tmp/pdflib.tar.gz

I consistently get

31c589c76d96965ddeec3e3d89c0bf5322513dbe3f523dcc8d2352c6167cdc71  /tmp/pdflib.tar.gz

However when I run an Image with the image recipe it fails, with the following logs

2024-01-18T05:18:38.262000+00:00 0.1.0/1 Phase build
2024-01-18T05:18:38.262000+00:00 0.1.0/1 Step Download_PHPlib
2024-01-18T05:18:38.264000+00:00 0.1.0/1 WebDownload: STARTED EXECUTION
2024-01-18T05:18:38.264000+00:00 0.1.0/1 WebDownload: Starting download
2024-01-18T05:18:38.264000+00:00 0.1.0/1 WebDownload: Source:https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz, Destination:/tmp/pdflib.tar.gz
2024-01-18T05:18:38.264000+00:00 0.1.0/1 WebDownload: Target destination - /tmp/pdflib.tar.gz
2024-01-18T05:18:38.264000+00:00 0.1.0/1 WebDownload: Creating directories /tmp
2024-01-18T05:18:38.264000+00:00 0.1.0/1 WebDownload: Created directories /tmp
2024-01-18T05:18:38.264000+00:00 0.1.0/1 WebDownload: Checking if destination /tmp/pdflib.tar.gz exists
2024-01-18T05:18:38.264000+00:00 0.1.0/1 WebDownload: Creating file /tmp/pdflib.tar.gz
2024-01-18T05:18:38.264000+00:00 0.1.0/1 WebDownload: Created file /tmp/pdflib.tar.gz
2024-01-18T05:18:38.776000+00:00 0.1.0/1 WebDownload: Received HTTP status code for HEAD request: 200
2024-01-18T05:18:38.776000+00:00 0.1.0/1 WebDownload: Size of source https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz = 48309824 bytes
2024-01-18T05:18:38.776000+00:00 0.1.0/1 WebDownload: Can download source https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz of size 48309824 bytes
2024-01-18T05:18:38.921000+00:00 0.1.0/1 WebDownload: Received HTTP status code for GET request: 200
2024-01-18T05:18:38.921000+00:00 0.1.0/1 WebDownload: Success! Received HTTP status code 200
2024-01-18T05:18:38.921000+00:00 0.1.0/1 WebDownload: Copying HTTP response body to file /tmp/pdflib.tar.gz
2024-01-18T05:18:41.739000+00:00 0.1.0/1 WebDownload: Copied HTTP response body to file /tmp/pdflib.tar.gz
2024-01-18T05:18:41.739000+00:00 0.1.0/1 WebDownload: Matching checksums with algorithm SHA256...
2024-01-18T05:18:41.924000+00:00 0.1.0/1 Waiting for command to complete (command id: eced9d6e-6336-43dd-a8c9-584d0835c902). Attempt number: 1.
2024-01-18T05:18:42.204000+00:00 0.1.0/1 WebDownload: Matching given checksum 31c589c76d96965ddeec3e3d89c0bf5322513dbe3f523dcc8d2352c6167cdc71 with calculated checksum b9f3f512ea646b7158f2ca27c3a3776a1c8faf817c746e6f6924b396b30b32db
2024-01-18T05:18:42.204000+00:00 0.1.0/1 WebDownload: [ ERROR ] Checksums do not match. Deleting file /tmp/pdflib.tar.gz
2024-01-18T05:18:42.219000+00:00 0.1.0/1 WebDownload: Ending download operation - Destination /tmp/pdflib.tar.gz, Error checksums do not match
2024-01-18T05:18:42.219000+00:00 0.1.0/1 WebDownload: [ ERROR ] Operation could not be completed due to error - Source https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz, Destination /tmp/pdflib.tar.gz, Error checksums do not match
2024-01-18T05:18:43.228000+00:00 0.1.0/1 Executor: FINISHED EXECUTION OF ALL DOCUMENTS

I'm really confused about where b9f3f512ea646b7158f2ca27c3a3776a1c8faf817c746e6f6924b396b30b32db checksum is coming from. I set the image builder infrastructure to not delete the instance on complete, so I ran the commands I showed above on the actual instance that ran and failed this job, and I get the same sha256sum result. I thought maybe it was not using sha256sum, but rather sha256hmac but that instead results in 1cf38be352b26de149250acda5af07ab1819449126a02f88f247ec6b147a6cb1 which doesn't match the SHA256 result that WebDownload is giving.

Is there some obscure implementation of SHA256 that this uses, or is there a bug?

  • To make sure that I was specifying SHA256 correctly, I attempted to specify SHA256SUM as the algorithm, and got the following error

    Invalid action module inputs in phase 'build', step 'Download_PHPlib'. Error: WebDownload: Given algorithm SHA256SUM for source https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz is not one of MD5, SHA1, SHA256, and SHA512
    

    So it seems like the SHA256 checksum algorithm that WebDownload uses is not consistent with sha256sum

asked a month ago105 views
1 Answer
1
Accepted Answer

Copying from my Slack response to Jamie: It looks like EC2 image builder is gunzipping the tarball and calculating the checksum on that:

% wget "https://www.pdflib.com/binaries/PDFlib/1001/PDFlib-10.0.1-Linux-x64-php.tar.gz"
...
2024-01-18 13:20:21 (3.74 MB/s) - ‘PDFlib-10.0.1-Linux-x64-php.tar.gz’ saved [48309824/48309824]
% shasum -a 256 PDFlib-10.0.1-Linux-x64-php.tar.gz
31c589c76d96965ddeec3e3d89c0bf5322513dbe3f523dcc8d2352c6167cdc71  PDFlib-10.0.1-Linux-x64-php.tar.gz
% gunzip PDFlib-10.0.1-Linux-x64-php.tar.gz
% shasum -a 256 PDFlib-10.0.1-Linux-x64-php.tar
b9f3f512ea646b7158f2ca27c3a3776a1c8faf817c746e6f6924b396b30b32db  PDFlib-10.0.1-Linux-x64-php.tar
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions