Glue Job to transform JSON to Parquete. But the first JSON Object gets lost!

0

Hello,

i have an AWS Glue Job which Gets JSON Files from S3, Transform this JSON to Parquet und save them to S3 and Glue Table.

The JSON File contains an Array with JSON Objects. e.g. the following File

[
  {
    "Listen Date": "2023-08-27",
    "Show Title": "Title of Show Travel",
    "Episode Title": "Episode Nordirland",
    "Listener Country": "Germany",
    "Listener OS": "iOS",
    "Listener User Agent": "Podcasts/4022.700.8 CFNetwork/1410.0.3 Darwin/22.6.0",
    "Episode ID": "64e713c4de5a4f0011114111",
    "Show ID": "64905e9ec4784300116c1cb0",
    "Listens": "2"
  },
  {
    "Listen Date": "2023-08-27",
    "Show Title": "Title of Show Story",
    "Episode Title": "Episode Däumelinchen",
    "Listener Country": "Germany",
    "Listener OS": "iOS",
    "Listener User Agent": "Podcasts/4022.700.8 CFNetwork/1410.0.3 Darwin/22.6.0",
    "Episode ID": "64e5d556aae05200110c4d3b",
    "Show ID": "6492d63c895f9d0011789a66",
    "Listens": "1"
  }
]

The Data is transformed and saved to the correct S3 Folder and Glue Table. But the first JSON Object (in this case the object with Show ID 64905e9ec4784300116c1cb0 get lost!

If i have more then 2 objects in the array, all objects gets transformed, just the first one is skipped. if i resort the objects in the array, again the first one gets lost. So i assume that it's not a problem with the object/data.

My Glue Job is a Visual Standard Glue Job with 3 Steps. Step 1: Data Source S3 Bucket Step 2: Transformation (i figured out, the i have to rename my properties and remove the spaces, otherwise the values of all objects gets lost. (e.g. "Listen Date" => "listendate") Step 3: Data Target S3 Bucket

Has someone an Idea how to fix this issue? Step1

Step 2

Step 3

Stefan
asked 8 months ago355 views
2 Answers
0
Accepted Answer

Multiline has unfortunately brought nothing.

The Solution was to set the JsonPath to: $[*]

It's still a little bit confusing. Because: Without JSON Path – The Button: Infer Schema brings this Schema: Enter image description here

And i tried a lot of time to bring this deep Schema in a Flat list. Then i realized that the schema at Runetime already considered the array, and i doesn't need to handle this deep structure. but just lost the first entry.

With setting the JSON Path, the array is again correct considered, but the first entry is not lost. In my Opinion, without setting the json path the schema at runetime should be deep (as determined). Otherwise it's confusing and inconsistent.

Stefan
answered 8 months ago
0

Try marking the "Multiline" checkbox in the source, otherwise it sounds like a bug.

profile pictureAWS
EXPERT
answered 8 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions