Hello,
First of all, thank you for adding first class support for Delta tables in Athena!
I'm trying to query a delta table from Athena engine version 3 where the table was created by doing a SHALLOW CLONE
operation within spark sql. That creates a new delta manifest log where the paths are absolute, rather than relative.
Where the original table manifest has a relative path like:
{
"add": {
"path": "part-00156-c812f51c-c290-499c-b3b5-f33642e8b428.c000.snappy.parquet",
...
}
The cloned table might live at s3://bucket/cloned_table/
and have a manifest entry where the paths are absolute like this:
{
"add": {
"path": "s3://bucket/original_table/part-00156-c812f51c-c290-499c-b3b5-f33642e8b428.c000.snappy.parquet",
...
}
To be clear, these are add
entries in the delta transaction manifest like _delta_log/xxxxxx.json
, not symlink_file_manifest files.
When I run an Athena query against the cloned table I get an error like:
HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split "s3://bucket/cloned_table/s3://bucket/original_table/part-00156-c812f51c-c290-499c-b3b5-f33642e8b428.c000.snappy.parquet (offset=0, length=67108864): io.trino.plugin.hive.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: XXX; S3 Extended Request ID: XXX; Proxy: null), S3 Extended Request ID: XXX (Path: s3://bucket/cloned_table/s3://bucket/original_table/part-00156-c812f51c-c290-499c-b3b5-f33642e8b428.c000.snappy.parquet ...)
I'm assuming that there's a limitation/bug in the Athena delta manifest handling-- it should recognize an absolute path and not append it to the table base location, but please let me know if I'm mistaken or if there's a workaround.
Note that the delta protocol specification does allow for absolute paths as documented here: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#add-file-and-remove-file
Thanks!