Cannot connect to MongoDB tables in catalog

1

Hi all, I'm trying to use AWS Glue to run ETL scripts, extracting data from MongoDB collections stored in AWS DocumentDB, and loading it into PostgreSQL tables in AWS RDS.

For ease of use in the Glue jobs themselves, I created a connection to my MongoDB and added the tables I will read from to the Data catalog. When creating the connection, the AWS console only let me proceed when the connection URL was of the form mongodb://<host>:<port>/<database>.

In the Python script, I tried to create a dynamic from from the catalog, by calling

glueContext.create_dynamic_frame_from_catalog(
    database='database', # Database name as defined in the catalog
    table_name='table_name', # Table name as defined in the catalog
    additional_options={
        'database': 'database', # Database name in MongoDB
        'collection': 'collection', # Collection name in MongoDB
        'ssl': 'true'
    }
)

However, this results in an error:

pyspark.sql.utils.IllegalArgumentException: "requirement failed: Invalid uri: 'mongodb://<username>:<password>@<host>:<port>/<database>/?ssl=true'"

Notice the '/' after <database>, which I believe is the cause of the exception being thrown.

I tried setting <database> to both the database that contains the collection I want to query and the admin database, to no avail. And, like I said previously, leaving it empty didn't work either. Any idea on what I am doing wrong? How can I connect to my DB?

posta 2 anni fa268 visualizzazioni
1 Risposta
0

You could write your connection code in the following manner to fix the issue: documentdb_write_uri = 'mongodb://yourdocumentdbcluster.amazonaws.com:27017' write_documentdb_options = { "uri": documentdb_write_uri, "database": "yourdbname", "collection": "yourcollectionname", "username": "###", "password": "###" }

You can follow the below mentioned reference document for further information : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-documentdb

AWS
con risposta 10 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande