Questions tagged with Amazon Athena

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Hi there! I'm streaming a bunch of json data into an s3 bucket via firehose. I was wondering what's the best practice for uniqueness checks on those individual json objects since firehose aggregates them into one s3 object. (If firehose didn't aggregate i guess i could identify the uniqueness of the objects by their respective key in the bucket) Thanks for reading!
2
answers
0
votes
44
views
asked 23 days ago
The parquet has a nested structure where a map contains a struct which is itself nested. i'm not sure if athen'as serDe can handle it, but i would like to find a way to read it with athena. The table creation worked fine, and the structure i described matches the parquet. CREATE EXTERNAL TABLE amitworkloads14( `name` string, .... `containers` map<string,struct< name:string, ...... stats:struct< lastCpuReq:double, .... oomEventsTimed:map<string,double>, cpuThrottleTimed:map<string,struct< avgThrottle:double, ... >>, ... > >>, `maxInstances` bigint, ... `runningMinutesByNodeGroup` map<string,bigint> ) STORED AS PARQUET LOCATION 's3://dev-customers-data/test/new-pqt-lib/workloads/' tblproperties ("parquet.compression"="SNAPPY") **The full error message:** HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://dev-customers-data/test/new-pqt-lib/workloads/new_workloads.parquet (offset=0, length=2227521): required group stats { ..................... } is not primitive This query ran against the "new-pqt-lib" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 4a72c1ed-fd99-4149-9f57-af240ccbef6b
0
answers
1
votes
37
views
asked 24 days ago
HI I have a paquet file which I am trying to read in Athena but is giving below error: SELECT * FROM "xxx_db"."bot_info" where bot_alias='test_bot'; HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://xxxxxx/xxxxxx/test_bot.parquet (offset=0, length=1198): java.io.IOException: can not read class org.apache.parquet.format.ColumnIndex: Required field 'null_pages' was not present! Struct: ColumnIndex(null_pages:null, min_values:[74 65 73 74 5F 62 6F 74], max_values:[74 65 73 74 5F 62 6F 74], boundary_order:null) But when I try query SELECT * FROM "xxx_db"."bot_info" limit 10. It works totally fine.
1
answers
0
votes
42
views
Rishabh
asked a month ago
Hi Team, I created an Athena table and used a quicksight dataset to connect to it. However, the quicksight page informed me that "The column(s) in this query are not found. Verify the column(s) specified in the dataset or in the custom SQL.". I checked the columns name from Athena and all looked good. What are the possible causes? Attached screenshot for your reference. ![Quicksight dataset error](/media/postImages/original/IMg-Av7H8jSb-lc4xfQgnejg) ![Athena table info](/media/postImages/original/IMBKXNw268RCG446aGdmNKBQ) Below screenshots are from Athena 'recent queries', If I run it manually, it worked. ![Enter image description here](/media/postImages/original/IMJM_bqxxuQGqWliIctQbxgA) ![Enter image description here](/media/postImages/original/IMDOqKb6YpSiGc22rK3g68_Q) Successfully with manually running: ![Enter image description here](/media/postImages/original/IMFaSotW-WRlqPqavij4gT7w)
0
answers
0
votes
40
views
asked a month ago
In my case, When query about information_schema.columns in athena, Result does not include [Comment] column Is there any update? or is it just temporary Error?
1
answers
0
votes
29
views
asked a month ago
All, I recently updated an AVRO Schema. I have checked that the updates made to the schema are backward compatible; and they are. Next, I saved an avro file w/ the new schema to an s3 Bucket and created a crawler. Please note that I have updated the crawler w/ the following configs: - Update the table definition in the data catalog - Update all new and existing partitions with metadata from the table - Mark the table as deprecated in the data catalog The crawler runs successfully and creates a table. However, when I run an Athena table against the created table I get this issue: `GENERIC_INTERNAL_ERROR: Unknown object inspector category: UNION` I have checked Amazon Documentation related to [GENERIC_INTERNAL_ERROR](https://aws.amazon.com/premiumsupport/knowledge-center/athena-generic-internal-error/) issue types; but there's nothing mentioned about this type of issue. Any ideas on how to resolve this?
0
answers
0
votes
16
views
tivo
asked a month ago
I have a database of large.csv files which I am trying to query but it does not fetch any results when I use the string variable with "like" function. Here is my query: abc is declared as INT pc is declared as STRING -- View Example CREATE OR REPLACE VIEW view_name AS SELECT abc FROM "test_log" WHERE pc like 'IG%';
1
answers
0
votes
37
views
asked a month ago
I am trying to delete entries from my Lake Formation Governed Table. I ran the commands via the SDK, and it all looked successful, but the linked Athena still sees the data that was supposedly deleted. Deleting the S3 resources after (since DeleteObject from the governed table doesn't adjust S3) now throws errors in Athena because the expected files are missing. Is there something wrong with my process of deleting from Lake Formation Governed tables?
1
answers
0
votes
26
views
rf
asked a month ago
Hey I am trying to use S3 Select on a gzipped file, containing JSONL records. My file is like this: ``` {"foo": 1, "bar": True...} {"foo": 1, "bar": True...} {"foo": 1, "bar": True...} ... ``` I am using the following python code: ``` stream = s3_client.select_object_content( Bucket=bucket_name, Key=object_key, ExpressionType="SQL", Expression="SELECT * FROM s3object s", InputSerialization={"JSON": {"Type": "LINES"}, "CompressionType": "GZIP"}, OutputSerialization={"JSON": {"RecordDelimiter": "\n"}}, RequestProgress={"Enabled": False}, ) for record in stream["Payload"]: ... ``` My problem is that `record` is not a single JSONL, it is 65K bytes from the file. I am able to query line-by-line when the file is not gzipped.
1
answers
0
votes
27
views
asked a month ago
When creating a new data set in QuickSight, I receive an error: ``` [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. Access denied when writing to location: s3://<data lake bucket>/athena-workgroups/<data mart-specific prefix>/output/<guid>.csv [Execution ID: <guid>] ``` I have the QS service role--`arn:aws:iam::<account id>:role/service-role/aws-quicksight-service-role-v0`. I *do not* have a role with the name `aws-quicksight-s3-consumers-role-v0`. That role has the 4 policies that are created when you attach Athena & an S3 bucket to QS: * `AWSQuickSightIAMPolicy` * `AWSQuickSightRDSPolicy` * `AWSQuickSightRedshiftPolicy` * `AWSQuickSightS3Policy` `AWSQuickSightS3Policy` contains this: ``` { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:ListAllMyBuckets", "Resource": "arn:aws:s3:::*" }, { "Action": [ "s3:ListBucket" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::<data lake bucket>" ] }, { "Action": [ "s3:GetObject", "s3:GetObjectVersion" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::<data lake bucket>/*" ] }, { "Action": [ "s3:ListBucketMultipartUploads", "s3:GetBucketLocation" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::<data lake bucket>" ] }, { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:AbortMultipartUpload", "s3:ListMultipartUploadParts" ], "Resource": [ "arn:aws:s3:::<data lake bucket>/*" ] } ] } ``` I've added the following policy/policies to the KMS key used on the data lake S3 bucket: ``` { "Sid": "Allow access to data for Athena.", "Effect": "Allow", "Principal": { "Service": "athena.amazonaws.com" }, "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*" ], "Resource": "*" }, { "Sid": "Allow access to data for QuickSight.", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<account id>:role/service-role/aws-quicksight-service-role-v0" }, "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*" ], "Resource": "*" }, { "Sid": "Allow access to data for QuickSight.", "Effect": "Allow", "Principal": { "Service": "quicksight.amazonaws.com" }, "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*" ], "Resource": "*" } ``` I'm at a bit of a loss for how to continue. I've got what feels like wide open permissions, but I still receive that error while trying to create a data set that targets anywhere in that S3 bucket.
0
answers
0
votes
40
views
asked a month ago
When we attempt to query a table and include boolean fields that may be null in the `select` clause, we receive an exception. (Note, when the records with a null are filtered out with `where table.boolColumn is not null`, the query succeeds.) In Athena, I get: ``` GENERIC_USER_ERROR: Encountered an exception[java.lang.NullPointerException] from your LambdaFunction[arn:aws:lambda:<redacted>:function:dynamodb2] executed in context[S3SpillLocation{bucket='<redacted>:', key='athena-spill/f91514ab-130d-4464-9c5f-cfef12dd2e91/e4763105-fb52-4106-b619-006b6d7f521b', directory=true}] with message[java.lang.NullPointerException] This query ran against the "default" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: f91514ab-130d-4464-9c5f-cfef12dd2e91 ``` In our CloudWatch log, I see this exception: ``` java.lang.NullPointerException: java.lang.NullPointerException java.lang.NullPointerException at com.amazonaws.athena.connectors.dynamodb.util.DDBTypeUtils.lambda$makeExtractor$4(DDBTypeUtils.java:433) at com.amazonaws.athena.connector.lambda.data.writers.fieldwriters.BitFieldWriter.write(BitFieldWriter.java:76) at com.amazonaws.athena.connector.lambda.data.writers.GeneratedRowWriter.writeRow(GeneratedRowWriter.java:116) at com.amazonaws.athena.connectors.dynamodb.DynamoDBRecordHandler.lambda$readWithConstraint$0(DynamoDBRecordHandler.java:207) at com.amazonaws.athena.connector.lambda.data.S3BlockSpiller.writeRows(S3BlockSpiller.java:183) at com.amazonaws.athena.connectors.dynamodb.DynamoDBRecordHandler.readWithConstraint(DynamoDBRecordHandler.java:207) at com.amazonaws.athena.connector.lambda.handlers.RecordHandler.doReadRecords(RecordHandler.java:192) at com.amazonaws.athena.connector.lambda.handlers.RecordHandler.doHandleRequest(RecordHandler.java:158) at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:138) at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:103) java.lang.NullPointerException: java.lang.NullPointerException java.lang.NullPointerException at com.amazonaws.athena.connectors.dynamodb.util.DDBTypeUtils.lambda$makeExtractor$4(DDBTypeUtils.java:433) at com.amazonaws.athena.connector.lambda.data.writers.fieldwriters.BitFieldWriter.write(BitFieldWriter.java:76) at com.amazonaws.athena.connector.lambda.data.writers.GeneratedRowWriter.writeRow(GeneratedRowWriter.java:116) at com.amazonaws.athena.connectors.dynamodb.DynamoDBRecordHandler.lambda$readWithConstraint$0(DynamoDBRecordHandler.java:207) at com.amazonaws.athena.connector.lambda.data.S3BlockSpiller.writeRows(S3BlockSpiller.java:183) at com.amazonaws.athena.connectors.dynamodb.DynamoDBRecordHandler.readWithConstraint(DynamoDBRecordHandler.java:207) at com.amazonaws.athena.connector.lambda.handlers.RecordHandler.doReadRecords(RecordHandler.java:192) at com.amazonaws.athena.connector.lambda.handlers.RecordHandler.doHandleRequest(RecordHandler.java:158) at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:138) at com.amazonaws.athena.connector.lambda.handlers.CompositeHandler.handleRequest(CompositeHandler.java:103) ```
1
answers
0
votes
41
views
asked a month ago
Hello. I've encountered an issue that might be related to not enough iam permissions, but I can't find any documentation about what policies I could be missing. My IAM Role has the following managed policies: ``` arn:aws:iam::aws:policy/service-role/AWSQuicksightAthenaAccess arn:aws:iam::aws:policy/AmazonS3FullAccess arn:aws:iam::aws:policy/AmazonAthenaFullAccess ``` I have the iam role configured and working in my .aws/config file and when I execute an athena query like this: ``` aws athena start-query-execution \ --query-string "SELECT * FROM my_schema.my_table WHERE dt='2022-08-17-10'" \ --query-execution-context Database=default,Catalog=awsdatacatalog \ --result-configuration OutputLocation=s3://aws-athena-query-results-my-athena-results123344-us-east-1/ \ --profile my-athena-role ``` I get a result like the following: ``` { "ResultSet": { "Rows": [ { "Data": [ { "VarCharValue": "dt" } ] }, { "Data": [ { "VarCharValue": "2022-08-17-10" } ] }, { "Data": [ { "VarCharValue": "2022-08-17-10" } ] }, { "Data": [ { "VarCharValue": "2022-08-17-10" } ] }, { "Data": [ { "VarCharValue": "2022-08-17-10" } ] }, ... ``` As you can see, only the column `dt` is returned in the resultset. That table contains many more columns. Running the same query without the IAM Role, with my default account that has full access to all resources produces the desired results. So it seems it's related to the limited permissions, but I can't figure out which ones I'm missing in my IAM Role. Thanks in advanced.
1
answers
0
votes
39
views
erickm
asked a month ago