Skip to content

HIVE_UNKNOWN_ERROR: Invalid port in file system location

0

Using Airbyte "AWS Datalake" destination to push data into AWS Glue/S3. I have to use the S3 accesspoint ARN, since the location where Airbyte is running does not have open TCP:443 egress. ....so I have a number of VPC private links and private DNS configured. But since one cannot can get private DNS for s3.amazonaws.com. I can't get Airbyte to respect an S3 endpoint for the "S3 Data Lake" destination, so I switched to the "AWS Datalake" destination - which is working using the S3 accesspoint ARN for the data lake bucket. Tables are created in Glue and life would be good... Except, the table location is set as "s3://[the S3 accesspoint ARN]/[database]/[table]". (where the brackets denote information specific to my environment. Athena hates the Location and refuses any query, with the error: "HIVE_UNKNOWN_ERROR: Invalid port in file system location: ..." Guessing this may be an issue with Airbyte... but I need to get something to work.

  1. Is it ever possible to update the Location of a table?
  2. Is there some other method to get code (Java/Python) to access the AWS API through a VPC privatelink? maybe an env var I can force

Perhaps there's some other work-around I haven't thought of... I'm also asking Airbyte if there is some way to pre-create tables in AWS Glue, that Airbyte could push data to without resetting the table location.

asked a year ago166 views

1 Answer
0
Accepted Answer

When using S3 access points with Athena, there are several issues to address:

  1. Regarding updating the table location: Yes, you can update the location of an existing table in AWS Glue using the ALTER TABLE statement in Athena. The syntax would be:
ALTER TABLE database_name.table_name 
SET LOCATION 's3://proper-bucket-path/database/table'

However, the core issue is that Athena doesn't support using S3 access point ARNs directly in the LOCATION property. When specifying table locations in Athena, you should avoid using ARNs, HTTP notations, or special patterns.

  1. For accessing S3 through VPC privatelink, you have a few options:
  • Use S3 access point aliases instead of ARNs. Access points have auto-generated aliases that look like regular S3 bucket names and can be used in place of bucket names in most S3 operations.

  • If using the AWS CLI or SDK through your VPC privatelink, you can specify the endpoint URL parameter to route requests through your VPC endpoint while using standard bucket notation.

  • For Athena specifically, make sure your table locations use standard S3 paths (s3://bucket-name/path) rather than access point ARNs.

A potential workaround for your situation would be:

  1. Have Airbyte write to the access point
  2. After data is written, update the table location in Glue to use the standard S3 path format
  3. Configure proper IAM permissions to ensure Athena can access the data through the standard path

For Java/Python code accessing AWS APIs through a VPC privatelink, you can configure the AWS client to use a specific endpoint URL that points to your VPC endpoint.
Sources
Specify a table location in Amazon S3 - Amazon Athena
AWS PrivateLink for Amazon S3 - Amazon Simple Storage Service
Athena query failed when trying to access S3 from access point | AWS re:Post

answered a year ago

EXPERT

reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.