- Mais recentes
- Mais votos
- Mais comentários
As far as I tried, I was able to run column specific queries against parquet table which contains dot in column name.
Table: parquet_table
root
|-- name: string
|-- url: string
|-- sample.key: string
Query:
SELECT "sample.key" FROM "parquet_table" limit 10;
SELECT * FROM "parquet_table" WHERE "sample.key" LIKE 'sample%' limit 10;
Can you explain bit more details? What schema does your table have? What query did you see errors in?
It appears that when querying with Athena I did not enclose column names with dots into double quotes, thus the error.
I am still interested in removing the dots from column names and thus would like to know what would be good approach of renaming multiple columns in AWS Glue. I changed my approach to first converting the DynamicDataframe to PySpark dataframe and then using piece that I found on stackoverflow.
new_column_name_list= list(map(lambda x: x.replace(".", "_"), df_relationalized.columns)) df_renamed = df_relationalized.toDF(*new_column_name_list)
Do you have any STRUCT datatype in your columns, does this solution also change the fields inside the struct ?
Conteúdo relevante
- AWS OFICIALAtualizada há 7 meses
- AWS OFICIALAtualizada há 3 anos
- AWS OFICIALAtualizada há 3 meses
- AWS OFICIALAtualizada há 2 anos