I'm using Athena to query some geospatial data encoded in GEOJSON. If I do it with uncompressed GEOJSON files, it works fine, but if I compress those files using gzip I get:
HIVE_CURSOR_ERROR: Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) is allowed between tokens at [Source: org.apache.hadoop.fs.FSDataInputStream@6acf4fe5: org.apache.hadoop.fs.BufferedFSInputStream@2793fa25; line: 1, column: 2]
Is it possible to use compressed geospatial data on Athena?
EDIT to include table create statement as requested:
CREATE EXTERNAL TABLE `locations`(
`id` bigint COMMENT 'from deserializer',
`boundaryshape` binary COMMENT 'from deserializer')
ROW FORMAT SERDE
'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT
'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://mydata/transformed/'
TBLPROPERTIES (
'classification'='json',
'last_modified_by'='hadoop',
'last_modified_time'='1674393793',
'transient_lastDdlTime'='1674393793',
'write.compression'='GZIP')
Can you paste the result of SHOW CREATE TABLE <your table name>?
Have updated with statement as requested
Please try the following: 1/ change GZIP to lowercase 'compressionType'='gzip', 2/ make sure 1 gzip file contains 1 json file only.
Have made that change but still getting the same error. The gzip file does contain only one file.