Athena Table timestamp column with null values

0

I have some data in a csv in s3 in the format:

idconfirmed_atcancelled_atsuspended_at
a16784147162691678414716269
b1678316522493
c167823191578716782318982201678231915787

I've defined my table with the structure

column namedata type
idstring
confirmed_attimestamp
cancelled_attimestamp
suspended_attimestamp

but when I am using Athena to query my data, it fails on rows where cancelled_at or suspended_at are blank with the error Error parsing field value '' for field 1: For input string: ""

i am using OpenCsvSerde but have tried LazySimpleSerDe with 'serialization.null.format'='' but get the same error

is it possible to have athena support having a timestamp column that could be null / blank?

alvinz
feita há um ano2492 visualizações
1 Resposta
1

Hello,

Please note that errors that specify a null or empty input string ("For input string: "") happen when both of the following are true:

  • You're using Athena with OpenCSVSerDe, which means that your source data uses double quotes (") as the default quote character.
  • The source data contains null values ("") or empty cells.

You can refer the same in the below document:

 https://aws.amazon.com/premiumsupport/knowledge-center/athena-hive-bad-data-error-csv/

It seems that this error is occurring because there are empty cells for some of columns

As mentioned in the above document in order to mitigate the issue “Define each column as STRING. The parser in Athena parses the values from STRING into actual types based on what it finds. This prevents Athena from throwing an error when it finds null values (empty strings with double quotes and no spaces) or empty cells (no values or double quotes).”

AWS
respondido há um ano

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas