Glue column values as null when values contains #

0

I am trying to use the aws dynamodb export to s3 but when I read the data in glue. the value of an entire column is being received as null.

I have tried multiple times doing same thing. And I ran a scan job to see if there are any rows with null values in my "voucherRefId", since this is a mandatory column, all values are filled (0 result in scan), but glue cannot find any data. I am guessing there is an issue with glue because when I opened the exported data in s3 as json. I can see the value against each item as

"voucherRefId":{"S":"comptia-network#EV#GBP"}"

voucherrefid missing in glue preview. Enter image description here

voucherrefid present is db. Enter image description here

Some other key info:

  • Table was created using aws amplify.
  • sparse index strategy being used at multiple columns
  • voucherRefId is used as (index pk) for 3 different indexes.
  • voucherRefId is consistent of {voucherId}#{type}#{region}
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

# Script generated for node DynamoDB table
DynamoDBtable_node1 = glueContext.create_dynamic_frame.from_options(
    connection_type="dynamodb",
    connection_options={
        "dynamodb.export": "s3",
        "dynamodb.s3.bucket": "aws-glue-assets-287544200199-us-east-1",
        "dynamodb.s3.prefix": "temporary/fulltest/",
        "dynamodb.tableArn": "arn:aws:dynamodb:us-east-1:000000000000:table/placeholder_arn",
        "dynamodb.unnestDDBJson": True,
    },
    transformation_ctx="DynamoDBtable_node1",
)

# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
    frame=DynamoDBtable_node1, mappings=[], transformation_ctx="ApplyMapping_node2"
)

# Script generated for node S3 bucket
S3bucket_node3 = glueContext.getSink(
    path="s3://dt-ds-analytics/inventory-db/",
    connection_type="s3",
    updateBehavior="UPDATE_IN_DATABASE",
    partitionKeys=[],
    compression="gzip",
    enableUpdateCatalog=True,
    transformation_ctx="S3bucket_node3",
)
S3bucket_node3.setCatalogInfo(
    catalogDatabase="dynamodb-tables", catalogTableName="Inventory"
)
S3bucket_node3
  • Share the code being generated for the ETL job

  • I have updated the question with the autogenerated visual ETL code

  • What is the Glue schema for Inventory table?

  • { "voucherRefId": string, "currency": string, "costPrice": float, "orderId": string, "soldOn": string "status": string, "yymm": string, "createdAt": string, "seenAt": string, "updatedAt": string, "userId": string, "isTake2Sale":boolean, "expiry": string, "id": string }

    Here is the mapping I have added in glue. db has a few more columns some of them are AWSJSON type having some meta data.

    This is how I am importing it, i have mapping step where i am mapping date columns to date columns.

  • Your code shows no mappings:

    # Script generated for node ApplyMapping
    ApplyMapping_node2 = ApplyMapping.apply(
        frame=DynamoDBtable_node1, mappings=[], transformation_ctx="ApplyMapping_node2"
    )
    
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions