Syntax error with Glue dynamic data frame

0

I tried to use glue ETL to convert nested json data to parquet. It works but since it does sampling, it couldn't determine data type of some fields and use struct for all possible value which changed the schema.

I tried to use ResolveChoice to force it use one type instead of struct. but I keep getting a syntax error. I followed the doc but still cannot figure it out. Here is my code, could someone help? what's the right syntax? does it support nested data?

resolvechoice2 = ResolveChoice.apply(frame = applymapping1, specs = [("in_reply_to_user_id", "project:long"),("user.id", "project:long"),("quoted_status.user.id", "project:long"),("entities.user_mentions.element.id", "project:long"),("entities.media.element.source_user_id", "project:long"),("retweeted_status.user.id", "project:long"),("extended_entities.media.element.source_user_id", "project:long")]), transformation_ctx = "resolvechoice2")

Syntax Error: File "/tmp/g-7d4adc26f6e5bb15ba8d86e7b4fced4ba08ca29d-6847579850030189659/script_2019-04-30-06-11-30.py", line 30 resolvechoice2 = ResolveChoice.apply(frame = applymapping1, specs = [("in_reply_to_user_id", "project:long"),("user.id", "project:long"),("quoted_status.user.id", "project:long"),("entities.user_mentions.element.id", "project:long"),("entities.media.element.source_user_id", "project:long"),("retweeted_status.user.id", "project:long"),("extended_entities.media.element.source_user_id", "project:long")]), transformation_ctx = "resolvechoice2") SyntaxError: invalid syntax

Thanks, Juan

MODERADOR
feita há 5 anos670 visualizações
1 Resposta
0
Resposta aceita
dynamicframe0_with_cast = datasource0.resolveChoice(
    specs = [("in_reply_to_user_id", "cast:long"),
                    ("user.id", "cast:long"),
                    ("quoted_status.user.id", "cast:long"),
                    ("entities.user_mentions[].id", "cast:long"),
                    ("entities.media[].source_user_id", "cast:long"),
                    ("retweeted_status.user.id", "cast:long"),
                    ("extended_entities.media[].source_user_id", "cast:long")])

When casting embedded collections, "[]" must be used when specifying array types, not ".element" which is in the printSchema output.

AWS
Tom_B
respondido há 5 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas