- 최신
- 최다 투표
- 가장 많은 댓글
As far as I tried, I was able to run column specific queries against parquet table which contains dot in column name.
Table: parquet_table
root
|-- name: string
|-- url: string
|-- sample.key: string
Query:
SELECT "sample.key" FROM "parquet_table" limit 10;
SELECT * FROM "parquet_table" WHERE "sample.key" LIKE 'sample%' limit 10;
Can you explain bit more details? What schema does your table have? What query did you see errors in?
It appears that when querying with Athena I did not enclose column names with dots into double quotes, thus the error.
I am still interested in removing the dots from column names and thus would like to know what would be good approach of renaming multiple columns in AWS Glue. I changed my approach to first converting the DynamicDataframe to PySpark dataframe and then using piece that I found on stackoverflow.
new_column_name_list= list(map(lambda x: x.replace(".", "_"), df_relationalized.columns)) df_renamed = df_relationalized.toDF(*new_column_name_list)
Do you have any STRUCT datatype in your columns, does this solution also change the fields inside the struct ?
관련 콘텐츠
- AWS 공식업데이트됨 2년 전
- AWS 공식업데이트됨 3년 전