Syntax error with Glue dynamic data frame

0

I tried to use glue ETL to convert nested json data to parquet. It works but since it does sampling, it couldn't determine data type of some fields and use struct for all possible value which changed the schema.

I tried to use ResolveChoice to force it use one type instead of struct. but I keep getting a syntax error. I followed the doc but still cannot figure it out. Here is my code, could someone help? what's the right syntax? does it support nested data?

resolvechoice2 = ResolveChoice.apply(frame = applymapping1, specs = [("in_reply_to_user_id", "project:long"),("user.id", "project:long"),("quoted_status.user.id", "project:long"),("entities.user_mentions.element.id", "project:long"),("entities.media.element.source_user_id", "project:long"),("retweeted_status.user.id", "project:long"),("extended_entities.media.element.source_user_id", "project:long")]), transformation_ctx = "resolvechoice2")

Syntax Error: File "/tmp/g-7d4adc26f6e5bb15ba8d86e7b4fced4ba08ca29d-6847579850030189659/script_2019-04-30-06-11-30.py", line 30 resolvechoice2 = ResolveChoice.apply(frame = applymapping1, specs = [("in_reply_to_user_id", "project:long"),("user.id", "project:long"),("quoted_status.user.id", "project:long"),("entities.user_mentions.element.id", "project:long"),("entities.media.element.source_user_id", "project:long"),("retweeted_status.user.id", "project:long"),("extended_entities.media.element.source_user_id", "project:long")]), transformation_ctx = "resolvechoice2") SyntaxError: invalid syntax

Thanks, Juan

审核人员
已提问 5 年前670 查看次数
1 回答
0
已接受的回答
dynamicframe0_with_cast = datasource0.resolveChoice(
    specs = [("in_reply_to_user_id", "cast:long"),
                    ("user.id", "cast:long"),
                    ("quoted_status.user.id", "cast:long"),
                    ("entities.user_mentions[].id", "cast:long"),
                    ("entities.media[].source_user_id", "cast:long"),
                    ("retweeted_status.user.id", "cast:long"),
                    ("extended_entities.media[].source_user_id", "cast:long")])

When casting embedded collections, "[]" must be used when specifying array types, not ".element" which is in the printSchema output.

AWS
Tom_B
已回答 5 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则