AWS Glue Crawler Scalability for Large Number of Delta Tables

Question: We currently have approximately 100 tables in delta format, partitioned by yyyy, mm, dd, hh, mm. Our current process involves reading these delta tables via a crawler, cataloging them, and utilizing spectrum tables in Redshift for building business logic.

However, we are encountering scalability limitations due to the maximum of 10 tables per crawler. As we continue to add more tables, adding additional crawlers becomes cumbersome. Additionally, the data volume on some of these tables is substantial, with up to 500k records per hour.

Considering these constraints, what would be the optimal approach to read the delta tables in parallel via the crawler? Can we configure the crawler to utilize an RDS database for improved scalability? Any insights or best practices would be appreciated.

Hamzah Chaudhry EXPERT
il y a un mois
Could you share how you're creating these Delta tables? Where is the source data coming from for these tables?
pkgp-aws
il y a un mois
We are creating the delta tables via Glue ETL. Source - API.

Sujets

Analytique Base de données

Balises

Analytique AWS Glue Extraire, transformer et charger des données Amazon Redshift

Langue

English

pkgp-aws

demandé il y a un mois356 vues

Aucune réponse

Le plus récent
Le plus de votes
La plupart des commentaires

Contenus pertinents

account is currently blocked and not recognized as a valid account
Yves Boah
demandé il y a un an
Memory limit for Lambda
plesaint
demandé il y a 7 mois
error 403 / cloud front
nico-HPF
demandé il y a 2 mois
LIghstail Instance in suspens
rePost-User-6587585
demandé il y a un an
Comment empêcher le crawler AWS Glue de créer plusieurs tables ?
AWS OFFICIELA mis à jour il y a un an
Pourquoi ma requête Athena échoue-t-elle avec l'erreur « HIVE_INVALID_METADATA: Hive metadata for table is invalid: Table descriptor contains duplicate columns » ?
AWS OFFICIELA mis à jour il y a 2 ans
Pourquoi certaines de mes tables AWS Glue n'apparaissent-elles pas dans Athena ?
AWS OFFICIELA mis à jour il y a un an
Pourquoi mon crawler AWS Glue échoue-t-il avec une exception de service interne ?
AWS OFFICIELA mis à jour il y a un an