Amazon Athena error on querying DynamoDB exported data

0

Background

We've configured an export to s3 from dynamodb using the native dynamodb s3 export, and ION as the format output.

After this, we've created a table in Athena

CREATE EXTERNAL TABLE export_2022_07_01_v4 (
  `_PK` STRING,
  URL STRING,
  Item struct<
    `_PK`:string,
    URL:string
  >  
)
ROW FORMAT SERDE
 'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
  "ignore_malformed_ion" = "true"
)
STORED AS ION
LOCATION '...';

Querying this works all right for small simple queries, but attempting to produce a full output with

UNLOAD (
  SELECT Item.URL FROM "helioptileexports"."export_2022_07_01_v4"  WHERE Item.URL IS NOT NULL
 ) to '...'  WITH (format = 'TEXTFILE')

Results in this error

HIVE_CURSOR_ERROR: Syntax error at line 1 offset 2: invalid syntax [state:STATE_BEFORE_ANNOTATION_DATAGRAM on token:TOKEN_SYMBOL_OPERATOR]
This query ran against the "helioptileexports" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: f4ca5812-1194-41f1-bfda-00a1a5b2471b

Questions

  1. Is there a way to make Athena more tolerant of formatting errors on specific files? As shown in the example, we are attempting without success to use ignore_malformed_ion. Is there anything beyond that that can be done?
  2. Is this a bug on DynamoDB ION export process?
  3. Is there any mechanism or logging to identify the files which have the malformed data and remove them?
  • Try "ion.ignore_malformed" instead of "ignore_malformed_ion"

已提问 2 年前125 查看次数
没有答案

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则