Athena Table timestamp column with null values

0

I have some data in a csv in s3 in the format:

idconfirmed_atcancelled_atsuspended_at
a16784147162691678414716269
b1678316522493
c167823191578716782318982201678231915787

I've defined my table with the structure

column namedata type
idstring
confirmed_attimestamp
cancelled_attimestamp
suspended_attimestamp

but when I am using Athena to query my data, it fails on rows where cancelled_at or suspended_at are blank with the error Error parsing field value '' for field 1: For input string: ""

i am using OpenCsvSerde but have tried LazySimpleSerDe with 'serialization.null.format'='' but get the same error

is it possible to have athena support having a timestamp column that could be null / blank?

alvinz
已提問 1 年前檢視次數 2493 次
1 個回答
1

Hello,

Please note that errors that specify a null or empty input string ("For input string: "") happen when both of the following are true:

  • You're using Athena with OpenCSVSerDe, which means that your source data uses double quotes (") as the default quote character.
  • The source data contains null values ("") or empty cells.

You can refer the same in the below document:

 https://aws.amazon.com/premiumsupport/knowledge-center/athena-hive-bad-data-error-csv/

It seems that this error is occurring because there are empty cells for some of columns

As mentioned in the above document in order to mitigate the issue “Define each column as STRING. The parser in Athena parses the values from STRING into actual types based on what it finds. This prevents Athena from throwing an error when it finds null values (empty strings with double quotes and no spaces) or empty cells (no values or double quotes).”

AWS
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南