Cannot connect to MongoDB tables in catalog

1

Hi all, I'm trying to use AWS Glue to run ETL scripts, extracting data from MongoDB collections stored in AWS DocumentDB, and loading it into PostgreSQL tables in AWS RDS.

For ease of use in the Glue jobs themselves, I created a connection to my MongoDB and added the tables I will read from to the Data catalog. When creating the connection, the AWS console only let me proceed when the connection URL was of the form mongodb://<host>:<port>/<database>.

In the Python script, I tried to create a dynamic from from the catalog, by calling

glueContext.create_dynamic_frame_from_catalog(
    database='database', # Database name as defined in the catalog
    table_name='table_name', # Table name as defined in the catalog
    additional_options={
        'database': 'database', # Database name in MongoDB
        'collection': 'collection', # Collection name in MongoDB
        'ssl': 'true'
    }
)

However, this results in an error:

pyspark.sql.utils.IllegalArgumentException: "requirement failed: Invalid uri: 'mongodb://<username>:<password>@<host>:<port>/<database>/?ssl=true'"

Notice the '/' after <database>, which I believe is the cause of the exception being thrown.

I tried setting <database> to both the database that contains the collection I want to query and the admin database, to no avail. And, like I said previously, leaving it empty didn't work either. Any idea on what I am doing wrong? How can I connect to my DB?

  • Even I have the same problem. I believe it has to do with whether TLS is enabled or not for your mongoDB cluster. To determine if TLS is enabled or not , see https://docs.aws.amazon.com/documentdb/latest/developerguide/connect_programmatically.html#connect_programmatically-tls_enabled. My added question to yours is, how to pass the public key for Amazon DocumentDB named rds-combined-ca-bundle.pem so I can use create_dynamic_frame_from_catalog method. Is it at all possible? FYI, I am able to connect to my MongoDB Cluster (v 4.0) using create_dynamic_frame_from_options function, but then the credentials and other parameters need to be passed, and the benefit of using the catalog is lost.

preguntada hace 2 años268 visualizaciones
1 Respuesta
0

You could write your connection code in the following manner to fix the issue: documentdb_write_uri = 'mongodb://yourdocumentdbcluster.amazonaws.com:27017' write_documentdb_options = { "uri": documentdb_write_uri, "database": "yourdbname", "collection": "yourcollectionname", "username": "###", "password": "###" }

You can follow the below mentioned reference document for further information : https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-documentdb

AWS
respondido hace 10 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas