- Newest
- Most votes
- Most comments
Adding columns with non-ASCII characters like Korean to an Iceberg table via Athena SQL can be challenging, as Athena may not directly support such column names in its SQL syntax. However, there are a few approaches you can consider to work around this limitation:
-
Use ASCII-compatible column names: Instead of using Korean characters directly, you could use romanized versions of the Korean words or English translations as column names. For example, instead of
진단결과, you could usejindan_gyeolgwaordiagnosis_result. This approach ensures compatibility with Athena's SQL syntax. -
Use AWS Glue ETL jobs: You can create an AWS Glue ETL job to update the table schema and add the column with Korean characters. In your Glue ETL script, you would read the existing table data, define the new schema with the Korean column name, and write the data back to the same table or a new one with the updated schema. When writing the data, make sure to set the appropriate job options to update the schema in the Glue Data Catalog.
-
Use Apache Spark with Amazon EMR: If you have access to Apache Spark through Amazon EMR, you can use Spark SQL to alter the table and add the column with Korean characters. Spark may have better support for non-ASCII column names.
-
Manual update through AWS Glue Console: As a last resort, you can manually edit the table schema through the AWS Glue Console. Navigate to the table in the Glue Data Catalog, edit the schema, and add the new column with the Korean name.
-
Use Iceberg Table API: If possible, you could use the Iceberg Table API directly to add the column, which might have better support for non-ASCII characters. However, this would require using a programming language like Java or Python rather than SQL.
Remember that even if you manage to add the column with Korean characters, you may face challenges when querying or manipulating this column in Athena later. It's generally recommended to use ASCII-compatible names for better compatibility across different tools and services in the AWS ecosystem.
If you absolutely need to use Korean characters in column names, consider using a combination of these approaches - perhaps use ASCII-compatible names in Athena and maintain a mapping to the Korean names in your application layer or documentation.
Sources
How to add partition fields to Iceberg table | AWS re:Post
ALTER TABLE ADD COLUMNS - Amazon Athena
Relevant content
- AWS OFFICIALUpdated 8 months ago
