How to specify metadata in the job manifest file.

0

I am trying to use AWS S3 Batch operations to copy all versioned items from Bucket-1 to Bucket-2. I am using AWS S3 bucket Management > Inventory Management to generate Manifesto JSON file.

And then apply this JSON file in AWS S3 Batch Operation to copy data.

Additionally wanted to specify Object meta data from Manifesto file itself associated when item is copied to target location.

Query is How I can make changes in CSV file to mentioned to use this Object Meta data. For example wanted to set x-amz-meta-x-version where the value should be old versionId

here is my JSON file looks like manifesto.json

{
  "sourceBucket" : "dev-djool-xyubd4",
  "destinationBucket" : "arn:aws:s3:::dev-djool-s3-reports",
  "version" : "2016-11-30",
  "creationTimestamp" : "1712797200000",
  "fileFormat" : "CSV",
  "fileSchema" : "Bucket, Key, VersionId, IsLatest, IsDeleteMarker, LastModifiedDate, ETag",
  "files" : [ {
    "key" : "dev-djool-xyubd4/dev-djool-xyubd4-copy-job/data/4dba9155-e423-4352-b031-191c26e01cae.csv.gz",
    "size" : 29959,
    "MD5checksum" : "a50e27f226b1a68a7496466c9303bd6a"
  } ]
}

and my CSV file sample data

dev-djool-xyubd4	09c0f49d-6fde-4fb3-8b5d-0712a87b38a2	A0R3_1XRkN4trkDAAC0k_OQOsW2ZkE0c	TRUE	FALSE	2024-04-05T09:54:41.000Z	6c77fefb538025cabfaf8d9b38d00faa
dev-djool-xyubd4	09c0f49d-6fde-4fb3-8b5d-0712a87b38a2	iWH_.ve8lcAcgqlujJtD6TTY3yQk_BuF	FALSE	FALSE	2024-04-05T09:54:41.000Z	6c77fefb538025cabfaf8d9b38d00faa

profile picture
asked 18 days ago266 views
2 Answers
1
Accepted Answer

💡 You can add the desired metadata fields to the fileSchema section of the JSON file. For example, to include a custom x-amz-meta-x-version field, you would update the fileSchema to include that field, and then populate the corresponding column in the CSV data file with the appropriate version IDs.

Example:

{
  "sourceBucket" : "dev-djool-xyubd4",
  "destinationBucket" : "arn:aws:s3:::dev-djool-s3-reports",
  "version" : "2016-11-30",
  "creationTimestamp" : "1712797200000",
  "fileFormat" : "CSV",
  "fileSchema" : "Bucket, Key, VersionId, IsLatest, IsDeleteMarker, LastModifiedDate, ETag, x-amz-meta-x-version",
  "files" : [ {
    "key" : "dev-djool-xyubd4/dev-djool-xyubd4-copy-job/data/4dba9155-e423-4352-b031-191c26e01cae.csv.gz",
    "size" : 29959,
    "MD5checksum" : "a50e27f226b1a68a7496466c9303bd6a"
  } ]
}

Key Source:

profile picture
EXPERT
answered 15 days ago
  • Every time for any small changes in CSV or either manifesto file with hard coded checksum, I am getting this error.

    "Manifest checksum mismatch occurred. Expected: df8485b1964c44b3af09599e984b0b70. Calculated: b1495a57cddb893113a63bc108294ceb" Since this is manually alteration in CSV file then how I can handle checksum problem ?

  • @secondabhi_aws appears to have the solution you need. Please review their suggestion at Solution Manifest checksum mismatch and let us know if your issue persists. We're here to help!

  • @Osvaldo Marte, checksum issue is clear now. But AWS manifest.json file doesnt understood x-amz-meta-x-version field.

    Reasons for failure: Unknown task fields are present in the schema: [x-amz-meta-x-version] I tried exact same way you mentioned about CSV & manifest files. but no luck yet.

  • I was digging more and I think the main issue is that AWS S3 Batch Operations do not support directly applying custom metadata from the manifest file; instead, you need to use an AWS Lambda function to dynamically set metadata like x-amz-meta-x-version during the operation. The operations that can be configured (copy, tag, delete, etc.) do not include direct manipulation of object metadata based on entries in the manifest file.

    😑 I somehow missed this.

  • One possible solution is to establish an AWS Lambda Function that will be triggered by your S3 Batch Operation. Its purpose will be to copy each object and explicitly set the x-amz-meta-x-version metadata to the old version ID as specified in the operation's input.

0

Hi

Please follow the steps outlined in this re:Post thread and let me know if you face any difficulty.

I'd be more than happy to assist if there are any additional questions or challenges you see while following the steps.

Edit:

Earlier I missed the x-amz-meta-x-version part, which you mentioned. To better understand, I would prefer to see the inventory configuration settings and also will see if this metadata is tied to all the objects in your s3 bucket/prefix. It'd be tedious to remove the metadata from each object and also it'd create a new version of the object if you modify/add/remove the metadata. Hence, I'd prefer to remove "x-amz-meta-x-version" from manifest.json fileschema, save this and get a new checksum and size of manifest.json and update those in manifest.checksum file. Upload these files in the respective lcation and then run your s3 batch job. Long term, if metadata of objects are supposed to be that way only, then you may need to figure out the purpose of this user defined metadata and is it really required.

Let me know if it doesn't work(I'm sure it'd solve your problem), happy t o help further.

Abhishek

profile pictureAWS
EXPERT
answered 14 days ago
  • Hey Abhishek, Unknown task fields are present in the schema: [x-amz-meta-x-version] AWS doesnt understood x-amz-meta-x-version field. Could you please help for this issue.

  • Hello Dnyaneshwar, Please refer edit section in my answer and do the needful. It would solve the problem.

  • So, I dont know how I can share my Inventory Management configuration for your observation. But other your point is about does it really needed ? so, answer is yes, since I want to keep the reference of old versionId somewhere in newly copied items. And for that thought to use Object Meta Data fields with custom defined attribute. So, keeping x-amz-meta-x-version is might be not essential but keeping old respective versionId is what intention is to achieve. By the way without x-amz-meta-x-version this all scenario works perfectly fine, but this is not meeting the ultimate goal

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions