By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon Athena

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Query DynamoDB Backup to S3 with Athena

I'm getting errors for a variety of attempts to read my S3 backup for DynamoDB. The data looks like this: ``` {"SensorId":{"s":"sensor-10"},"Temperature":{"n":"63.85"},"CurrentTime":{"n":"1661540734"}} {"SensorId":{"s":"sensor-10"},"Temperature":{"n":"63.84"},"CurrentTime":{"n":"1661540736"}} ``` I tried the following which succeeds, but then the query fails: ``` CREATE EXTERNAL TABLE IF NOT EXISTS sensor_data_presto ( sensorID struct<s:string>, temperature struct<n:float>, currentTime struct<n:bigint> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://bucket-name/'; ``` The following query, gets the following error: ``` SELECT DISTINCT sensorID.s FROM sensor_data_presto; Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 39 [character 40 line 1] ``` I also tried the following which succeeds, but then the query fails: ``` CREATE EXTERNAL TABLE IF NOT EXISTS `mydatabase`.`sensor_data_regex` ( sensorid string, temperature float, currenttime bigint ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='/^\{"SensorId":\{"s":("[a-z]*-[0-9]*"){1}\},"Temperature":\{"n":"([0-9]*[\.]?[0-9]*)"\}{1},"CurrentTime":\{"n":"([0-9]*){1}"\}\}$/g' ) LOCATION 's3://bucket-name/' TBLPROPERTIES ('has_encrypted_data'='false'); ``` I tried the above with and without the beginning / and ending /g. I tried it with and without the ending $. The following query gets the following errors: ``` SELECT * FROM sensor_data_regex; GENERIC_INTERNAL_ERROR: PatternSyntaxException thrown initializing deserializer org.apache.hadoop.hive.serde2.RegexSerDe. Cause: Illegal repetition near index 1 /^{"SensorId":{"s":("[a-z]*-[0-9]*"){1}},"Temperature":{"n":"([0-9]*[.]?[0-9]*)"}{1},"CurrentTime":{"n":"([0-9]*){1}"}}$/g ^ ``` I was trying to replicate the example in Chapter 7: DynamoDB from the book Seven Databases in Seven Weeks, 2nd Edition for use in my college class. I was able to update the entire pipeline to work on AWS until the final step that uses Athena. I would appreciate advice on correcting these errors or completing the pipeline (cost effectively) to query the DynamoDB data.
1
answers
0
votes
43
views
asked a month ago

HIVE_CURSOR_ERROR in Athena with GeoJSON data

I am trying to set up a table in Athena to query geospatial data, in GeoJSON format. I have been trying to follow the examples here: https://docs.aws.amazon.com/athena/latest/ug/geospatial-example-queries.html, but my real world case is not working and I get HIVE_CURSOR_ERROR every time I run a simple query. I have some GeoJSON data in folder in S3. Like this: ``` { "type": "FeatureCollection", "features": [ { "id": "0", "type": "Feature", "properties": { "ID": 123456789 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -5.31743910570464733, 58.73284229699937 ], [ -5.3173675298630483, 58.73284570013168 ], [ -5.3172547539133792, 58.73284991264882 ], [ -5.31722304203592152, 58.73285104690907 ], [ -5.3172311032208171, 58.73294883810302 ], [ -5.3174512653466437, 58.732947705224115 ], [ -5.31744743304667152, 58.73291410139787 ], [ -5.31743910570464733, 58.73284229699937 ] ] ] } }, { "id": "1", "type": "Feature", "properties": { "ID": 987654321 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -3.41743023894912347, 40.432765268202344 ], [ -3.41736141173642975, 40.432767724053 ], [ -3.41724749711747035, 40.432772009098655 ], [ -3.41721649151368283, 40.43277324408217 ], [ -3.41722304203592152, 40.43285104690907 ], [ -3.4172547539133792, 40.43284991264882 ], [ -3.4173675298630483, 40.43284570013168 ], [ -3.41743910570464733, 40.43284229699937 ], [ -3.41743140982736242, 40.432775538535216 ], [ -3.41743023894912347, 40.432765268202344 ] ] ] } } ] } ``` And I create the table in Athena following the example, like this: ``` CREATE external TABLE IF NOT EXISTS mydata ( Id bigint, BoundaryShape binary ) ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde' STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://mybucket/folder/sub/location/'; ``` However any query I run against it returns a HIVE_CURSOR_ERROR. Even simply `SELECT * FROM mydata`. What am I doing wrong?
1
answers
0
votes
21
views
asked a month ago

Problem to perform simple SQL query via Athena SDK

I tried to perform simple SQL query via Athena SDK. If I run the same SQL query in the web console, I got the correct result. Here is my input: ``` $athenaClient->startQueryExecution( array( 'QueryExecutionContext' => array( 'Catalog' => 'AwsDataCatalog', 'Database' => 'db_mydatabase', ), 'QueryString' => 'SELECT * FROM "db_mydatabase"."my_table"', 'ResultConfiguration' => array( 'OutputLocation' => 'S3://my-s3bucket/my-folder/', ), 'WorkGroup' => 'primary', ) ``` ----------------------------- When I query the result via GetQueryExecution(), I got this back: ``` [Status] => Array ( [State] => FAILED [StateChangeReason] => [ErrorCode: INTERNAL_ERROR_QUERY_ENGINE] Amazon Athena experienced an internal error while executing this query. Please contact AWS support for further assistance. You will not be charged for this query. We apologize for the inconvenience. [SubmissionDateTime] => Aws\Api\DateTimeResult Object ( [date] => 2022-08-24 15:03:52.024000 [timezone_type] => 3 [timezone] => UTC ) [CompletionDateTime] => Aws\Api\DateTimeResult Object ( [date] => 2022-08-24 15:03:54.168000 [timezone_type] => 3 [timezone] => UTC ) [AthenaError] => Array ( [ErrorCategory] => 1 [ErrorType] => 401 [Retryable] => [ErrorMessage] => [ErrorCode: INTERNAL_ERROR_QUERY_ENGINE] Amazon Athena experienced an internal error while executing this query. Please contact AWS support for further assistance. You will not be charged for this query. We apologize for the inconvenience. ) ) ``` I know I probably have some setting related issues, as I can get the correct results in Amazon Console. Thanks!
0
answers
0
votes
25
views
asked a month ago

HIVE_PARTITION_SCHEMA_MISMATCH

Error: HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'isfallbackkey' in table 'analytics.events' is declared as type 'boolean', but partition 'dt=2022-08-22/appid=jovo-game-starter' declared column 'translationkey' as type 'string'. Data was passed as JSON in this format: ``` {"type":"device_capabilities","supportScreen":true,"supportAudio":true,"supportLongformAudio":false,"supportVideo":false,"eventId":"668c9479-9eee-4025-8b9a-1323db06b21f","appId":"jovo-game-starter","eventDate":"2022-08-22T17:26:36.376Z","timestamp":1661189196,"locale":"en","timeZone":"America/Phoenix","userId":"a8ad82ba-bfac-4f93-a46d-aae37e842a7b","sessionId":"139163c0-fcf2-4bcc-9ece-a8e6ab5c322e"} {"type":"session_start","userId":"a8ad82ba-bfac-4f93-a46d-aae37e842a7b","sessionId":"139163c0-fcf2-4bcc-9ece-a8e6ab5c322e","eventId":"411e3abf-07fc-453c-9edd-a0a84f29b75f","appId":"jovo-game-starter","eventDate":"2022-08-22T17:26:36.383Z","timestamp":1661189196,"locale":"en","timeZone":"America/Phoenix"} {"type":"intent","userId":"a8ad82ba-bfac-4f93-a46d-aae37e842a7b","sessionId":"139163c0-fcf2-4bcc-9ece-a8e6ab5c322e","intent":"LAUNCH","eventId":"09287f39-e487-474b-bafc-c0c1b9f59959","appId":"jovo-game-starter","eventDate":"2022-08-22T17:26:36.387Z","timestamp":1661189196,"locale":"en","timeZone":"America/Phoenix"} {"type":"translation","translationKey":"start","isFallbackKey":false,"translationLanguage":"en","translationPlatform":"core","eventId":"15b87be7-5349-4a9e-b950-76bd76b63972","appId":"jovo-game-starter","eventDate":"2022-08-22T17:26:37.889Z","timestamp":1661189198,"locale":"en","timeZone":"America/Phoenix","userId":"a8ad82ba-bfac-4f93-a46d-aae37e842a7b","sessionId":"139163c0-fcf2-4bcc-9ece-a8e6ab5c322e"} ``` Using Kinesis Firehose dynamic partitioning: events/dt=!{partitionKeyFromQuery:dt}/appid=!{partitionKeyFromQuery:appid}/
0
answers
0
votes
34
views
asked a month ago

Query for S3 objects requested on a specific day - Athena?

Hi, We sometimes experience a large increase in GET requests to our production S3 bucket. This bucket contains customer assets that are frequently requested. Usually, these assets are requested via CloudFront but sometimes, in this instance, that doesn't look likely as the GET request rate via CloudFront is not correlative to the number of GET requests from the bucket. When this happens, I want to query for the objects that are requested on that specific day so I can find the `remoteip` requesting. The anatomy of an object key is: `public/asset/raw/<accountId>/<hash>/<timestamp>/<assetId>`. This bucket was created back in 2017 & thus contains over 40TBs data. I was previously using Athena console to get this information. An example query, if I was looking for the requests from 18/07/2022, would be: ```sql SELECT remoteip, requestdatetime, requester, requestid, operation, key, request_uri FROM "s3_access_logs_db"."mybucket_logs" WHERE operation='REST.GET.OBJECT' AND key LIKE 'public/asset/raw/<accountId>/%' AND bucket_name = '<bucketName> AND bucketowner = '<bucketOwner>' AND httpstatus = '200' AND date(parse_datetime(requestdatetime, 'dd/MMM/yyyy:HH:mm:ss Z')) = date '2022-07-18' ORDER BY requestdatetime DESC ``` However, due to the bucket being so large & how Athena queries, I will always get a "Please reduce your request rate" error message. I can't find a way to implement exponential backoff in Athena. I thought about adding another condition to only query objects that were created in a specific period but the issue with that is you may miss some object requests. For example, someone, still today, might be requesting an object that was created in 2018 & if I had a condition whereby I only got the requests for objects from 2019-2022, that would be missed. Even if I set the `key` condition to be an exact match (`=`) instead of a `LIKE`, the query still fails as the request rate is too high. Is there a better way to do this rather than Athena or is that my only option &, if so, what is a better way to go about this?
1
answers
0
votes
26
views
asked a month ago
1
answers
1
votes
61
views
asked a month ago