Glue column values as null when values contains #

I am trying to use the aws dynamodb export to s3 but when I read the data in glue. the value of an entire column is being received as null.

I have tried multiple times doing same thing. And I ran a scan job to see if there are any rows with null values in my "voucherRefId", since this is a mandatory column, all values are filled (0 result in scan), but glue cannot find any data. I am guessing there is an issue with glue because when I opened the exported data in s3 as json. I can see the value against each item as

"voucherRefId":{"S":"comptia-network#EV#GBP"}"

voucherrefid missing in glue preview. Enter image description here

voucherrefid present is db. Enter image description here

Some other key info:

Table was created using aws amplify.
sparse index strategy being used at multiple columns
voucherRefId is used as (index pk) for 3 different indexes.
voucherRefId is consistent of {voucherId}#{type}#{region}

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

# Script generated for node DynamoDB table
DynamoDBtable_node1 = glueContext.create_dynamic_frame.from_options(
    connection_type="dynamodb",
    connection_options={
        "dynamodb.export": "s3",
        "dynamodb.s3.bucket": "aws-glue-assets-287544200199-us-east-1",
        "dynamodb.s3.prefix": "temporary/fulltest/",
        "dynamodb.tableArn": "arn:aws:dynamodb:us-east-1:000000000000:table/placeholder_arn",
        "dynamodb.unnestDDBJson": True,
    },
    transformation_ctx="DynamoDBtable_node1",
)

# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
    frame=DynamoDBtable_node1, mappings=[], transformation_ctx="ApplyMapping_node2"
)

# Script generated for node S3 bucket
S3bucket_node3 = glueContext.getSink(
    path="s3://dt-ds-analytics/inventory-db/",
    connection_type="s3",
    updateBehavior="UPDATE_IN_DATABASE",
    partitionKeys=[],
    compression="gzip",
    enableUpdateCatalog=True,
    transformation_ctx="S3bucket_node3",
)
S3bucket_node3.setCatalogInfo(
    catalogDatabase="dynamodb-tables", catalogTableName="Inventory"
)
S3bucket_node3

Leeroy Hannigan EXPERT
8 months ago
Share the code being generated for the ETL job
harshitgupta9715
8 months ago
I have updated the question with the autogenerated visual ETL code
Leeroy Hannigan EXPERT
8 months ago
What is the Glue schema for Inventory table?
harshitgupta9715
8 months ago
{ "voucherRefId": string, "currency": string, "costPrice": float, "orderId": string, "soldOn": string "status": string, "yymm": string, "createdAt": string, "seenAt": string, "updatedAt": string, "userId": string, "isTake2Sale":boolean, "expiry": string, "id": string }

Here is the mapping I have added in glue. db has a few more columns some of them are AWSJSON type having some meta data.

This is how I am importing it, i have mapping step where i am mapping date columns to date columns.

Leeroy Hannigan EXPERT

8 months ago

Your code shows no mappings:

# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
    frame=DynamoDBtable_node1, mappings=[], transformation_ctx="ApplyMapping_node2"
)

Topics

Storage Analytics Database

Relevant content

Glue ETL converts NULL values of Snowflake columns to 0
Kyle Ahn
asked a year ago
glue treat empty string as null values
Accepted Answer
Jess
asked 2 years ago
Can AWS glue bookmark column (for jdbc)have null value?
rePost-User-0333183
asked a year ago
Athena Table timestamp column with null values
alvinz
asked a year ago
How can I troubleshoot "ERROR: null value in column violates not-null constraint" for my AWS DMS full load and CDC task?
AWS OFFICIALUpdated 7 months ago
How do I optimize my AWS Glue ETL workloads when reading from or writing to Amazon DynamoDB?
AWS OFFICIALUpdated 3 years ago
Why do I get the error "HIVE_BAD_DATA: Error parsing field value for field X for input string:" when I query JSON data in Amazon Athena?
AWS OFFICIALUpdated 3 months ago
How can I resolve the "Column not found" error with the Athena DynamoDB connector?
AWS OFFICIALUpdated 3 months ago
Push down queries when using the Google BigQuery Connector for AWS Glue
EXPERT
Fabrizio@AWS
published 2 years ago
Using DynamoDB as an Append-Only Database
EXPERT
Leeroy Hannigan
published 7 months ago