Syntax error with Glue dynamic data frame

0

I tried to use glue ETL to convert nested json data to parquet. It works but since it does sampling, it couldn't determine data type of some fields and use struct for all possible value which changed the schema.

I tried to use ResolveChoice to force it use one type instead of struct. but I keep getting a syntax error. I followed the doc but still cannot figure it out. Here is my code, could someone help? what's the right syntax? does it support nested data?

resolvechoice2 = ResolveChoice.apply(frame = applymapping1, specs = [("in_reply_to_user_id", "project:long"),("user.id", "project:long"),("quoted_status.user.id", "project:long"),("entities.user_mentions.element.id", "project:long"),("entities.media.element.source_user_id", "project:long"),("retweeted_status.user.id", "project:long"),("extended_entities.media.element.source_user_id", "project:long")]), transformation_ctx = "resolvechoice2")

Syntax Error: File "/tmp/g-7d4adc26f6e5bb15ba8d86e7b4fced4ba08ca29d-6847579850030189659/script_2019-04-30-06-11-30.py", line 30 resolvechoice2 = ResolveChoice.apply(frame = applymapping1, specs = [("in_reply_to_user_id", "project:long"),("user.id", "project:long"),("quoted_status.user.id", "project:long"),("entities.user_mentions.element.id", "project:long"),("entities.media.element.source_user_id", "project:long"),("retweeted_status.user.id", "project:long"),("extended_entities.media.element.source_user_id", "project:long")]), transformation_ctx = "resolvechoice2") SyntaxError: invalid syntax

Thanks, Juan

モデレーター
質問済み 5年前603ビュー
1回答
0
承認された回答
dynamicframe0_with_cast = datasource0.resolveChoice(
    specs = [("in_reply_to_user_id", "cast:long"),
                    ("user.id", "cast:long"),
                    ("quoted_status.user.id", "cast:long"),
                    ("entities.user_mentions[].id", "cast:long"),
                    ("entities.media[].source_user_id", "cast:long"),
                    ("retweeted_status.user.id", "cast:long"),
                    ("extended_entities.media[].source_user_id", "cast:long")])

When casting embedded collections, "[]" must be used when specifying array types, not ".element" which is in the printSchema output.

AWS
Tom_B
回答済み 5年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン