I am trying to follow the notebook here to create a AWS Athena database, fill it with data from Amazon reviews and execute a query.
https://github.com/data-science-on-aws/data-science-on-aws/blob/oreilly-book/04_ingest/03_Register_S3_TSV_With_Athena.ipynb
Creating a database and table works but querying fails. I have added the "AmazonAthenaFullAccess" permissions to my AmazonSageMaker-ExecutionRole but it seems that I am missing something.
The S3 bucket where the data resides is a private bucket created earlier in the tutorial.
Below is a minimal code example and the error message. Thanks in advance.
print(statement)
SELECT * FROM dsoaws.amazon_reviews_tsv
WHERE product_category = 'Digital_Software' LIMIT 100
df = pd.read_sql(statement, conn)
df.head(5)
/tmp/ipykernel_361/2446512133.py:1: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df = pd.read_sql(statement, conn)
---------------------------------------------------------------------------
OperationalError Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/pandas/io/sql.py:2262, in SQLiteDatabase.execute(self, sql, params)
2261 try:
-> 2262 cur.execute(sql, *args)
2263 return cur
File /opt/conda/lib/python3.10/site-packages/pyathena/cursor.py:121, in Cursor.execute(self, operation, parameters, work_group, s3_staging_dir, cache_size, cache_expiration_time, result_reuse_enable, result_reuse_minutes)
120 else:
--> 121 raise OperationalError(query_execution.state_change_reason)
122 return self
OperationalError: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 0GPA67ZPRGW096F1; S3 Extended Request ID: DMtFNEmNnRrWK38OHGNMHc8v60/yT0rxQP6TKuAxZ7StgJKvvBJ4tb1Utxnz4PvDcgdwKMbusmQ=; Proxy: null), S3 Extended Request ID: DMtFNEmNnRrWK38OHGNMHc8v60/yT0rxQP6TKuAxZ7StgJKvvBJ4tb1Utxnz4PvDcgdwKMbusmQ= (Bucket: data-science-on-aws, Key: amazon-reviews-pds/tsv/amazon_reviews_us_PC_v1_00.tsv.gz)
During handling of the above exception, another exception occurred:
[...]
DatabaseError: Execution failed on sql: SELECT * FROM dsoaws.amazon_reviews_tsv
WHERE product_category = 'Digital_Software' LIMIT 100
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 0GPA67ZPRGW096F1; S3 Extended Request ID: DMtFNEmNnRrWK38OHGNMHc8v60/yT0rxQP6TKuAxZ7StgJKvvBJ4tb1Utxnz4PvDcgdwKMbusmQ=; Proxy: null), S3 Extended Request ID: DMtFNEmNnRrWK38OHGNMHc8v60/yT0rxQP6TKuAxZ7StgJKvvBJ4tb1Utxnz4PvDcgdwKMbusmQ= (Bucket: data-science-on-aws, Key: amazon-reviews-pds/tsv/amazon_reviews_us_PC_v1_00.tsv.gz)
unable to rollback