Query S3 data using Athena without Glue Data Catalog?

0

Is it possible to query S3 data (semi-structured json files) using s3 data without Glue Catalog?

basically on query time, choose the s3 location i want to query.

s3 location - s3://process/2023/08/01/0001.json , etc (there are hundreds of JSON files in this folder).

can i query this folder of files using Athena without using Glue catalog at all? this s3 path can change (today, i want to query 08/01 folder; tmrw, i want to query a different folder).

example, I want to query a specific folder using Athena, run a couple transformations in the SELECT and then output the results to a new parquet file elsewhere. Is a table required here? can i drop it after my export is complete?

I can't find much documentation on how to do this.

Gabe
asked 6 months ago949 views
2 Answers
1

No , Athena uses the AWS Glue Data Catalog to store and retrieve table metadata for the Amazon S3 data.

Read - https://docs.aws.amazon.com/athena/latest/ug/glue-athena.html

Yes - It is possible to query S3 data (semi-structured json files) using s3 data without Glue Catalog via S3 select.

Does S3 select option work for you ?

Read this - > https://aws.amazon.com/blogs/storage/querying-data-without-servers-or-databases-using-amazon-s3-select/

Read more - > https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-select-sql-reference-select.html

AWS
EXPERT
answered 6 months ago
profile picture
EXPERT
reviewed 6 months ago
0

Sure, here's an example using the OpenStreetMaps public dataset:

# Create db
CREATE DATABASE osm;

# Create table
CREATE EXTERNAL TABLE planet (
  id BIGINT,
  type STRING,
  tags MAP<STRING,STRING>,
  lat DECIMAL(9,7),
  lon DECIMAL(10,7),
  nds ARRAY<STRUCT<ref: BIGINT>>,
  members ARRAY<STRUCT<type: STRING, ref: BIGINT, role: STRING>>,
  changeset BIGINT,
  timestamp TIMESTAMP,
  uid BIGINT,
  user STRING,
  version BIGINT
)
STORED AS ORCFILE
LOCATION 's3://osm-pds/planet/';

# Test Query
select * from planet limit 100;
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions