Amazon Athena error on querying DynamoDB exported data

0

Background

We've configured an export to s3 from dynamodb using the native dynamodb s3 export, and ION as the format output.

After this, we've created a table in Athena

CREATE EXTERNAL TABLE export_2022_07_01_v4 (
  `_PK` STRING,
  URL STRING,
  Item struct<
    `_PK`:string,
    URL:string
  >  
)
ROW FORMAT SERDE
 'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
  "ignore_malformed_ion" = "true"
)
STORED AS ION
LOCATION '...';

Querying this works all right for small simple queries, but attempting to produce a full output with

UNLOAD (
  SELECT Item.URL FROM "helioptileexports"."export_2022_07_01_v4"  WHERE Item.URL IS NOT NULL
 ) to '...'  WITH (format = 'TEXTFILE')

Results in this error

HIVE_CURSOR_ERROR: Syntax error at line 1 offset 2: invalid syntax [state:STATE_BEFORE_ANNOTATION_DATAGRAM on token:TOKEN_SYMBOL_OPERATOR]
This query ran against the "helioptileexports" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: f4ca5812-1194-41f1-bfda-00a1a5b2471b

Questions

  1. Is there a way to make Athena more tolerant of formatting errors on specific files? As shown in the example, we are attempting without success to use ignore_malformed_ion. Is there anything beyond that that can be done?
  2. Is this a bug on DynamoDB ION export process?
  3. Is there any mechanism or logging to identify the files which have the malformed data and remove them?
  • Try "ion.ignore_malformed" instead of "ignore_malformed_ion"

已提問 2 年前檢視次數 125 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南