Background
We've configured an export to s3 from dynamodb using the native dynamodb s3 export, and ION as the format output.
After this, we've created a table in Athena
CREATE EXTERNAL TABLE export_2022_07_01_v4 (
`_PK` STRING,
URL STRING,
Item struct<
`_PK`:string,
URL:string
>
)
ROW FORMAT SERDE
'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
"ignore_malformed_ion" = "true"
)
STORED AS ION
LOCATION '...';
Querying this works all right for small simple queries, but attempting to produce a full output with
UNLOAD (
SELECT Item.URL FROM "helioptileexports"."export_2022_07_01_v4" WHERE Item.URL IS NOT NULL
) to '...' WITH (format = 'TEXTFILE')
Results in this error
HIVE_CURSOR_ERROR: Syntax error at line 1 offset 2: invalid syntax [state:STATE_BEFORE_ANNOTATION_DATAGRAM on token:TOKEN_SYMBOL_OPERATOR]
This query ran against the "helioptileexports" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: f4ca5812-1194-41f1-bfda-00a1a5b2471b
Questions
- Is there a way to make Athena more tolerant of formatting errors on specific files? As shown in the example, we are attempting without success to use
ignore_malformed_ion
. Is there anything beyond that that can be done?
- Is this a bug on DynamoDB ION export process?
- Is there any mechanism or logging to identify the files which have the malformed data and remove them?
Try "ion.ignore_malformed" instead of "ignore_malformed_ion"