How to filter by file type in a Glue table?

0

I have an S3 bucket with mixed files (CSV and JSON), written there by an external process that is outside of my control.

I need to create two Glue tables:

  1. Contains all of the data from CSVs
  2. Contains all of the data from JSONs

Is this possible?

profile picture
m0ltar
질문됨 2달 전132회 조회
1개 답변
0

The only standard way I know to do that is creating a symlink table (it has a manifest that points to the specific files, which you would need to update). But normally you would want to convert to columnar formats, standardize, transform, partitions or do other things before you create the table, take that chance to separate the files and generate good tables. I would save my users the pain of using CSV/JSON tables.

profile pictureAWS
전문가
답변함 2달 전
  • The only standard way I know to do that is creating a symlink table (it has a manifest that points to the specific files, which you would need to update).

    Could you please elaborate on this. I am not too familiar with the concept of a "symlink table".

    But normally you would want to convert to columnar formats, standardize, transform, partitions or do other things before you create the table

    And to do that, we want to use Athena, potentially via dbt. The idea was to define base raw tables, with partitions, then compact the data using Athena.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠