Glue column values as null when values contains #

0

I am trying to use the aws dynamodb export to s3 but when I read the data in glue. the value of an entire column is being received as null.

I have tried multiple times doing same thing. And I ran a scan job to see if there are any rows with null values in my "voucherRefId", since this is a mandatory column, all values are filled (0 result in scan), but glue cannot find any data. I am guessing there is an issue with glue because when I opened the exported data in s3 as json. I can see the value against each item as

"voucherRefId":{"S":"comptia-network#EV#GBP"}"

voucherrefid missing in glue preview. Enter image description here

voucherrefid present is db. Enter image description here

Some other key info:

  • Table was created using aws amplify.
  • sparse index strategy being used at multiple columns
  • voucherRefId is used as (index pk) for 3 different indexes.
  • voucherRefId is consistent of {voucherId}#{type}#{region}
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

# Script generated for node DynamoDB table
DynamoDBtable_node1 = glueContext.create_dynamic_frame.from_options(
    connection_type="dynamodb",
    connection_options={
        "dynamodb.export": "s3",
        "dynamodb.s3.bucket": "aws-glue-assets-287544200199-us-east-1",
        "dynamodb.s3.prefix": "temporary/fulltest/",
        "dynamodb.tableArn": "arn:aws:dynamodb:us-east-1:000000000000:table/placeholder_arn",
        "dynamodb.unnestDDBJson": True,
    },
    transformation_ctx="DynamoDBtable_node1",
)

# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
    frame=DynamoDBtable_node1, mappings=[], transformation_ctx="ApplyMapping_node2"
)

# Script generated for node S3 bucket
S3bucket_node3 = glueContext.getSink(
    path="s3://dt-ds-analytics/inventory-db/",
    connection_type="s3",
    updateBehavior="UPDATE_IN_DATABASE",
    partitionKeys=[],
    compression="gzip",
    enableUpdateCatalog=True,
    transformation_ctx="S3bucket_node3",
)
S3bucket_node3.setCatalogInfo(
    catalogDatabase="dynamodb-tables", catalogTableName="Inventory"
)
S3bucket_node3
  • Share the code being generated for the ETL job

  • I have updated the question with the autogenerated visual ETL code

  • What is the Glue schema for Inventory table?

  • { "voucherRefId": string, "currency": string, "costPrice": float, "orderId": string, "soldOn": string "status": string, "yymm": string, "createdAt": string, "seenAt": string, "updatedAt": string, "userId": string, "isTake2Sale":boolean, "expiry": string, "id": string }

    Here is the mapping I have added in glue. db has a few more columns some of them are AWSJSON type having some meta data.

    This is how I am importing it, i have mapping step where i am mapping date columns to date columns.

  • Your code shows no mappings:

    # Script generated for node ApplyMapping
    ApplyMapping_node2 = ApplyMapping.apply(
        frame=DynamoDBtable_node1, mappings=[], transformation_ctx="ApplyMapping_node2"
    )
    
gefragt vor 9 Monaten98 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen