I am trying to use the aws dynamodb export to s3 but when I read the data in glue. the value of an entire column is being received as null.
I have tried multiple times doing same thing. And I ran a scan job to see if there are any rows with null values in my "voucherRefId", since this is a mandatory column, all values are filled (0 result in scan), but glue cannot find any data. I am guessing there is an issue with glue because when I opened the exported data in s3 as json. I can see the value against each item as
"voucherRefId":{"S":"comptia-network#EV#GBP"}"
voucherrefid missing in glue preview.
voucherrefid present is db.
Some other key info:
- Table was created using aws amplify.
- sparse index strategy being used at multiple columns
- voucherRefId is used as (index pk) for 3 different indexes.
- voucherRefId is consistent of {voucherId}#{type}#{region}
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
# Script generated for node DynamoDB table
DynamoDBtable_node1 = glueContext.create_dynamic_frame.from_options(
connection_type="dynamodb",
connection_options={
"dynamodb.export": "s3",
"dynamodb.s3.bucket": "aws-glue-assets-287544200199-us-east-1",
"dynamodb.s3.prefix": "temporary/fulltest/",
"dynamodb.tableArn": "arn:aws:dynamodb:us-east-1:000000000000:table/placeholder_arn",
"dynamodb.unnestDDBJson": True,
},
transformation_ctx="DynamoDBtable_node1",
)
# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
frame=DynamoDBtable_node1, mappings=[], transformation_ctx="ApplyMapping_node2"
)
# Script generated for node S3 bucket
S3bucket_node3 = glueContext.getSink(
path="s3://dt-ds-analytics/inventory-db/",
connection_type="s3",
updateBehavior="UPDATE_IN_DATABASE",
partitionKeys=[],
compression="gzip",
enableUpdateCatalog=True,
transformation_ctx="S3bucket_node3",
)
S3bucket_node3.setCatalogInfo(
catalogDatabase="dynamodb-tables", catalogTableName="Inventory"
)
S3bucket_node3
Share the code being generated for the ETL job
I have updated the question with the autogenerated visual ETL code
What is the Glue schema for
Inventory
table?{ "voucherRefId": string, "currency": string, "costPrice": float, "orderId": string, "soldOn": string "status": string, "yymm": string, "createdAt": string, "seenAt": string, "updatedAt": string, "userId": string, "isTake2Sale":boolean, "expiry": string, "id": string }
Here is the mapping I have added in glue. db has a few more columns some of them are AWSJSON type having some meta data.
This is how I am importing it, i have mapping step where i am mapping date columns to date columns.
Your code shows no mappings: