Skip to content

How to add a column with Korean characters in Iceberg via Athena?

0

I'm currently working with an AWS Athena Iceberg table and trying to use ALTER TABLE ... ADD COLUMNS to add new columns. However, I need to add a column whose name is written in Korean (e.g., 진단결과), and I'm encountering a syntax error like: mismatched input 'COLUMNS'. Expecting: '.', 'ADD'

I suspect the error is due to non-ASCII characters (Korean in this case) in the column name. I’ve tried enclosing the column name in double quotes ("진단결과"), but it still fails.

Is there a recommended way to add non-English (e.g., Korean) column names to an Iceberg table via Athena SQL? If not supported, is there any workaround (like using Glue Data Catalog or Apache Iceberg APIs directly)?

Any guidance would be appreciated.

asked a year ago240 views
1 Answer
1

Adding columns with non-ASCII characters like Korean to an Iceberg table via Athena SQL can be challenging, as Athena may not directly support such column names in its SQL syntax. However, there are a few approaches you can consider to work around this limitation:

  1. Use ASCII-compatible column names: Instead of using Korean characters directly, you could use romanized versions of the Korean words or English translations as column names. For example, instead of 진단결과, you could use jindan_gyeolgwa or diagnosis_result. This approach ensures compatibility with Athena's SQL syntax.

  2. Use AWS Glue ETL jobs: You can create an AWS Glue ETL job to update the table schema and add the column with Korean characters. In your Glue ETL script, you would read the existing table data, define the new schema with the Korean column name, and write the data back to the same table or a new one with the updated schema. When writing the data, make sure to set the appropriate job options to update the schema in the Glue Data Catalog.

  3. Use Apache Spark with Amazon EMR: If you have access to Apache Spark through Amazon EMR, you can use Spark SQL to alter the table and add the column with Korean characters. Spark may have better support for non-ASCII column names.

  4. Manual update through AWS Glue Console: As a last resort, you can manually edit the table schema through the AWS Glue Console. Navigate to the table in the Glue Data Catalog, edit the schema, and add the new column with the Korean name.

  5. Use Iceberg Table API: If possible, you could use the Iceberg Table API directly to add the column, which might have better support for non-ASCII characters. However, this would require using a programming language like Java or Python rather than SQL.

Remember that even if you manage to add the column with Korean characters, you may face challenges when querying or manipulating this column in Athena later. It's generally recommended to use ASCII-compatible names for better compatibility across different tools and services in the AWS ecosystem.

If you absolutely need to use Korean characters in column names, consider using a combination of these approaches - perhaps use ASCII-compatible names in Athena and maintain a mapping to the Korean names in your application layer or documentation.
Sources
How to add partition fields to Iceberg table | AWS re:Post
ALTER TABLE ADD COLUMNS - Amazon Athena

answered a year ago
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.