Athena Table timestamp column with null values

0

I have some data in a csv in s3 in the format:

idconfirmed_atcancelled_atsuspended_at
a16784147162691678414716269
b1678316522493
c167823191578716782318982201678231915787

I've defined my table with the structure

column namedata type
idstring
confirmed_attimestamp
cancelled_attimestamp
suspended_attimestamp

but when I am using Athena to query my data, it fails on rows where cancelled_at or suspended_at are blank with the error Error parsing field value '' for field 1: For input string: ""

i am using OpenCsvSerde but have tried LazySimpleSerDe with 'serialization.null.format'='' but get the same error

is it possible to have athena support having a timestamp column that could be null / blank?

alvinz
已提问 1 年前2490 查看次数
1 回答
1

Hello,

Please note that errors that specify a null or empty input string ("For input string: "") happen when both of the following are true:

  • You're using Athena with OpenCSVSerDe, which means that your source data uses double quotes (") as the default quote character.
  • The source data contains null values ("") or empty cells.

You can refer the same in the below document:

 https://aws.amazon.com/premiumsupport/knowledge-center/athena-hive-bad-data-error-csv/

It seems that this error is occurring because there are empty cells for some of columns

As mentioned in the above document in order to mitigate the issue “Define each column as STRING. The parser in Athena parses the values from STRING into actual types based on what it finds. This prevents Athena from throwing an error when it finds null values (empty strings with double quotes and no spaces) or empty cells (no values or double quotes).”

AWS
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则