By using AWS re:Post, you agree to the Terms of Use

Running glue crawler on encrypted S3 objects present in different account


Hi All, We have a S3 bucket in Account A, with SSE-KMS encryption enabled. We wants to provide the access of the objects present in the bucket, to a glue crawler present in Account B. For this we have applied following steps:

  1. Added bucket policy in Account A to provide S3 objects access to AccountB

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::AccountB:root" }, "Action": "s3:GetObject", "Resource": "arn:aws:s3:::AccountA_Bucket/test/" }, { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::AccountB:root" }, "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::AccountA_Bucket", "Condition": { "StringLike": { "s3:prefix": "test/" } } } ] }

  2. Added KMS key policy to provide kms:Decrypt action to Account B

    { "Sid": "Allow use of the key", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::AccountB:root" }, "Action": "kms:Decrypt", "Resource": "*" }

  3. In Account B, created an IAM role for glue crawler, which has access to get objects from S3 in Account A and has access for kms:Decrypt of KMS key present in Account A.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::AccountA_Bucket/test/*" ] }, { "Action": [ "kms:Decrypt" ], "Effect": "Allow", "Resource": "KMSKeyARNOfAccountA" } ] }

After doing the above changes, the glue crawler is able to run successfully and create a table but the schema is not as expected and when we try to run Athena queries on the table created, we are receiving following error: HIVE_UNKNOWN_ERROR: serDe should not be accessed from a null StorageFormat.

I am thinking this is happening because the table created by glue crawler is based on the encrypted object, i.e. it does not decrypt the object before creating the table schema, as earlier when we don't have any encryption for the S3 bucket, table schema was created as expected and Athena queries were running on it.

My question is, what changes needs to be done so that glue crawler first decrypt the objects it is receiving from S3 bucket in Account A, before creating the table schema?

1 Answer

I have tested this scenario and did not face any issue while querying cross account SSE-KMS encrypted data via Athena. This issue seems to be customer table/data specific as the below mentioned error can occur if the table classification is UNKNOWN in most of the cases.

"HIVE_UNKNOWN_ERROR: serDe should not be accessed from a null StorageFormat"

For detailed troubleshooting of the issue, we recommend to open a support ticket with Athena PS team so we can check the Glue tables.

answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions