By using AWS re:Post, you agree to the Terms of Use

Questions tagged with S3 Select

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Query DynamoDB Backup to S3 with Athena

I'm getting errors for a variety of attempts to read my S3 backup for DynamoDB. The data looks like this: ``` {"SensorId":{"s":"sensor-10"},"Temperature":{"n":"63.85"},"CurrentTime":{"n":"1661540734"}} {"SensorId":{"s":"sensor-10"},"Temperature":{"n":"63.84"},"CurrentTime":{"n":"1661540736"}} ``` I tried the following which succeeds, but then the query fails: ``` CREATE EXTERNAL TABLE IF NOT EXISTS sensor_data_presto ( sensorID struct<s:string>, temperature struct<n:float>, currentTime struct<n:bigint> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://bucket-name/'; ``` The following query, gets the following error: ``` SELECT DISTINCT sensorID.s FROM sensor_data_presto; Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 39 [character 40 line 1] ``` I also tried the following which succeeds, but then the query fails: ``` CREATE EXTERNAL TABLE IF NOT EXISTS `mydatabase`.`sensor_data_regex` ( sensorid string, temperature float, currenttime bigint ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='/^\{"SensorId":\{"s":("[a-z]*-[0-9]*"){1}\},"Temperature":\{"n":"([0-9]*[\.]?[0-9]*)"\}{1},"CurrentTime":\{"n":"([0-9]*){1}"\}\}$/g' ) LOCATION 's3://bucket-name/' TBLPROPERTIES ('has_encrypted_data'='false'); ``` I tried the above with and without the beginning / and ending /g. I tried it with and without the ending $. The following query gets the following errors: ``` SELECT * FROM sensor_data_regex; GENERIC_INTERNAL_ERROR: PatternSyntaxException thrown initializing deserializer org.apache.hadoop.hive.serde2.RegexSerDe. Cause: Illegal repetition near index 1 /^{"SensorId":{"s":("[a-z]*-[0-9]*"){1}},"Temperature":{"n":"([0-9]*[.]?[0-9]*)"}{1},"CurrentTime":{"n":"([0-9]*){1}"}}$/g ^ ``` I was trying to replicate the example in Chapter 7: DynamoDB from the book Seven Databases in Seven Weeks, 2nd Edition for use in my college class. I was able to update the entire pipeline to work on AWS until the final step that uses Athena. I would appreciate advice on correcting these errors or completing the pipeline (cost effectively) to query the DynamoDB data.
1
answers
0
votes
44
views
asked a month ago