Amazon Athena error on querying DynamoDB exported data

0

Background

We've configured an export to s3 from dynamodb using the native dynamodb s3 export, and ION as the format output.

After this, we've created a table in Athena

CREATE EXTERNAL TABLE export_2022_07_01_v4 (
  `_PK` STRING,
  URL STRING,
  Item struct<
    `_PK`:string,
    URL:string
  >  
)
ROW FORMAT SERDE
 'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
  "ignore_malformed_ion" = "true"
)
STORED AS ION
LOCATION '...';

Querying this works all right for small simple queries, but attempting to produce a full output with

UNLOAD (
  SELECT Item.URL FROM "helioptileexports"."export_2022_07_01_v4"  WHERE Item.URL IS NOT NULL
 ) to '...'  WITH (format = 'TEXTFILE')

Results in this error

HIVE_CURSOR_ERROR: Syntax error at line 1 offset 2: invalid syntax [state:STATE_BEFORE_ANNOTATION_DATAGRAM on token:TOKEN_SYMBOL_OPERATOR]
This query ran against the "helioptileexports" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: f4ca5812-1194-41f1-bfda-00a1a5b2471b

Questions

  1. Is there a way to make Athena more tolerant of formatting errors on specific files? As shown in the example, we are attempting without success to use ignore_malformed_ion. Is there anything beyond that that can be done?
  2. Is this a bug on DynamoDB ION export process?
  3. Is there any mechanism or logging to identify the files which have the malformed data and remove them?
  • Try "ion.ignore_malformed" instead of "ignore_malformed_ion"

gefragt vor 2 Jahren125 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen