Amazon Athena error on querying DynamoDB exported data

0

Background

We've configured an export to s3 from dynamodb using the native dynamodb s3 export, and ION as the format output.

After this, we've created a table in Athena

CREATE EXTERNAL TABLE export_2022_07_01_v4 (
  `_PK` STRING,
  URL STRING,
  Item struct<
    `_PK`:string,
    URL:string
  >  
)
ROW FORMAT SERDE
 'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
  "ignore_malformed_ion" = "true"
)
STORED AS ION
LOCATION '...';

Querying this works all right for small simple queries, but attempting to produce a full output with

UNLOAD (
  SELECT Item.URL FROM "helioptileexports"."export_2022_07_01_v4"  WHERE Item.URL IS NOT NULL
 ) to '...'  WITH (format = 'TEXTFILE')

Results in this error

HIVE_CURSOR_ERROR: Syntax error at line 1 offset 2: invalid syntax [state:STATE_BEFORE_ANNOTATION_DATAGRAM on token:TOKEN_SYMBOL_OPERATOR]
This query ran against the "helioptileexports" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: f4ca5812-1194-41f1-bfda-00a1a5b2471b

Questions

  1. Is there a way to make Athena more tolerant of formatting errors on specific files? As shown in the example, we are attempting without success to use ignore_malformed_ion. Is there anything beyond that that can be done?
  2. Is this a bug on DynamoDB ION export process?
  3. Is there any mechanism or logging to identify the files which have the malformed data and remove them?
  • Try "ion.ignore_malformed" instead of "ignore_malformed_ion"

asked 2 years ago121 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions