- 最新
- 投票最多
- 评论最多
As far as I tried, I was able to run column specific queries against parquet table which contains dot in column name.
Table: parquet_table
root
|-- name: string
|-- url: string
|-- sample.key: string
Query:
SELECT "sample.key" FROM "parquet_table" limit 10;
SELECT * FROM "parquet_table" WHERE "sample.key" LIKE 'sample%' limit 10;
Can you explain bit more details? What schema does your table have? What query did you see errors in?
It appears that when querying with Athena I did not enclose column names with dots into double quotes, thus the error.
I am still interested in removing the dots from column names and thus would like to know what would be good approach of renaming multiple columns in AWS Glue. I changed my approach to first converting the DynamicDataframe to PySpark dataframe and then using piece that I found on stackoverflow.
new_column_name_list= list(map(lambda x: x.replace(".", "_"), df_relationalized.columns)) df_renamed = df_relationalized.toDF(*new_column_name_list)
Do you have any STRUCT datatype in your columns, does this solution also change the fields inside the struct ?
相关内容
- AWS 官方已更新 3 年前
- AWS 官方已更新 3 年前
- AWS 官方已更新 2 年前