Athena Table timestamp column with null values

0

I have some data in a csv in s3 in the format:

idconfirmed_atcancelled_atsuspended_at
a16784147162691678414716269
b1678316522493
c167823191578716782318982201678231915787

I've defined my table with the structure

column namedata type
idstring
confirmed_attimestamp
cancelled_attimestamp
suspended_attimestamp

but when I am using Athena to query my data, it fails on rows where cancelled_at or suspended_at are blank with the error Error parsing field value '' for field 1: For input string: ""

i am using OpenCsvSerde but have tried LazySimpleSerDe with 'serialization.null.format'='' but get the same error

is it possible to have athena support having a timestamp column that could be null / blank?

alvinz
asked a year ago2429 views
1 Answer
1

Hello,

Please note that errors that specify a null or empty input string ("For input string: "") happen when both of the following are true:

  • You're using Athena with OpenCSVSerDe, which means that your source data uses double quotes (") as the default quote character.
  • The source data contains null values ("") or empty cells.

You can refer the same in the below document:

 https://aws.amazon.com/premiumsupport/knowledge-center/athena-hive-bad-data-error-csv/

It seems that this error is occurring because there are empty cells for some of columns

As mentioned in the above document in order to mitigate the issue “Define each column as STRING. The parser in Athena parses the values from STRING into actual types based on what it finds. This prevents Athena from throwing an error when it finds null values (empty strings with double quotes and no spaces) or empty cells (no values or double quotes).”

AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions