Syntax error with Glue dynamic data frame

0

I tried to use glue ETL to convert nested json data to parquet. It works but since it does sampling, it couldn't determine data type of some fields and use struct for all possible value which changed the schema.

I tried to use ResolveChoice to force it use one type instead of struct. but I keep getting a syntax error. I followed the doc but still cannot figure it out. Here is my code, could someone help? what's the right syntax? does it support nested data?

resolvechoice2 = ResolveChoice.apply(frame = applymapping1, specs = [("in_reply_to_user_id", "project:long"),("user.id", "project:long"),("quoted_status.user.id", "project:long"),("entities.user_mentions.element.id", "project:long"),("entities.media.element.source_user_id", "project:long"),("retweeted_status.user.id", "project:long"),("extended_entities.media.element.source_user_id", "project:long")]), transformation_ctx = "resolvechoice2")

Syntax Error: File "/tmp/g-7d4adc26f6e5bb15ba8d86e7b4fced4ba08ca29d-6847579850030189659/script_2019-04-30-06-11-30.py", line 30 resolvechoice2 = ResolveChoice.apply(frame = applymapping1, specs = [("in_reply_to_user_id", "project:long"),("user.id", "project:long"),("quoted_status.user.id", "project:long"),("entities.user_mentions.element.id", "project:long"),("entities.media.element.source_user_id", "project:long"),("retweeted_status.user.id", "project:long"),("extended_entities.media.element.source_user_id", "project:long")]), transformation_ctx = "resolvechoice2") SyntaxError: invalid syntax

Thanks, Juan

MODERATOR
asked 5 years ago583 views
1 Answer
0
Accepted Answer
dynamicframe0_with_cast = datasource0.resolveChoice(
    specs = [("in_reply_to_user_id", "cast:long"),
                    ("user.id", "cast:long"),
                    ("quoted_status.user.id", "cast:long"),
                    ("entities.user_mentions[].id", "cast:long"),
                    ("entities.media[].source_user_id", "cast:long"),
                    ("retweeted_status.user.id", "cast:long"),
                    ("extended_entities.media[].source_user_id", "cast:long")])

When casting embedded collections, "[]" must be used when specifying array types, not ".element" which is in the printSchema output.

AWS
Tom_B
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions