Query S3 data using Athena without Glue Data Catalog?

0

Is it possible to query S3 data (semi-structured json files) using s3 data without Glue Catalog?

basically on query time, choose the s3 location i want to query.

s3 location - s3://process/2023/08/01/0001.json , etc (there are hundreds of JSON files in this folder).

can i query this folder of files using Athena without using Glue catalog at all? this s3 path can change (today, i want to query 08/01 folder; tmrw, i want to query a different folder).

example, I want to query a specific folder using Athena, run a couple transformations in the SELECT and then output the results to a new parquet file elsewhere. Is a table required here? can i drop it after my export is complete?

I can't find much documentation on how to do this.

Gabe
gefragt vor 6 Monaten1010 Aufrufe
2 Antworten
1

No , Athena uses the AWS Glue Data Catalog to store and retrieve table metadata for the Amazon S3 data.

Read - https://docs.aws.amazon.com/athena/latest/ug/glue-athena.html

Yes - It is possible to query S3 data (semi-structured json files) using s3 data without Glue Catalog via S3 select.

Does S3 select option work for you ?

Read this - > https://aws.amazon.com/blogs/storage/querying-data-without-servers-or-databases-using-amazon-s3-select/

Read more - > https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-select-sql-reference-select.html

AWS
EXPERTE
beantwortet vor 6 Monaten
profile picture
EXPERTE
überprüft vor 6 Monaten
0

Sure, here's an example using the OpenStreetMaps public dataset:

# Create db
CREATE DATABASE osm;

# Create table
CREATE EXTERNAL TABLE planet (
  id BIGINT,
  type STRING,
  tags MAP<STRING,STRING>,
  lat DECIMAL(9,7),
  lon DECIMAL(10,7),
  nds ARRAY<STRUCT<ref: BIGINT>>,
  members ARRAY<STRUCT<type: STRING, ref: BIGINT, role: STRING>>,
  changeset BIGINT,
  timestamp TIMESTAMP,
  uid BIGINT,
  user STRING,
  version BIGINT
)
STORED AS ORCFILE
LOCATION 's3://osm-pds/planet/';

# Test Query
select * from planet limit 100;
beantwortet vor einem Monat

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen