glueContext.create_dynamic_frame.from_catalog(...) not using supplied JDBC connection

0

I have created a Glue Connection to my PostgreSQL (RDS) instance. I have created and run a Crawler for a table to create a corresponding table in my Glue Data Catalog. I have been unable to read from that catalog from either Visual ETL, a Glue notebook, or a SageMaker notebook. In all three cases, it seems that the connection, which is defined for the Data Catalog copy of the table, is not being used. For instance, my latest attempt was a Visual ETL job to copy the Data Catalog table to S3. Visual ETL created the following:

node_name = glueContext.create_dynamic_frame.from_catalog(database="default", table_name="my_table_name", transformation_ctx="ctx_name")

Once run, this results in:

An error occurred while calling o146.getCatalogSource. We don't support this connection type: null

Does Glue support reading from Data Catalog tables that are backed by PostgreSQL? Thank you in advance for any help!

  • J.
asked 2 months ago227 views
1 Answer
-1

Yes, AWS Glue does support reading from Data Catalog tables that are backed by PostgreSQL databases. However, the error you're encountering suggests there might be an issue with how the connection is being utilized or configured.

When working with PostgreSQL in AWS Glue, it's important to ensure that the connection is properly set up and that the necessary information is being passed to the Glue job. Here are a few things to check:

  1. Verify that the connection type for your PostgreSQL RDS instance is correctly set to "JDBC" in the Glue connection settings.

  2. Make sure the JDBC URL, username, and password are correctly configured in the connection properties.

  3. Check that the VPC, subnet, and security group settings in the Glue connection match those of your RDS instance.

  4. Ensure that the IAM role associated with your Glue job has the necessary permissions to access both the Glue Data Catalog and the RDS instance.

  5. When creating your dynamic frame, you may need to explicitly specify the connection name. Try modifying your code to include the connection_type parameter:

    node_name = glueContext.create_dynamic_frame.from_catalog(
        database="default",
        table_name="my_table_name",
        transformation_ctx="ctx_name",
        connection_type="postgresql"
    )
  6. If you're using a custom JDBC driver for PostgreSQL, make sure it's properly uploaded and configured in your Glue environment.

  7. Check if there are any version compatibility issues between your PostgreSQL database and the JDBC driver being used by Glue.

If you've verified all these points and are still encountering issues, it may be worth testing the connection directly using the AWS Glue connection test feature or the AWSSupport-TroubleshootGlueConnection runbook in AWS Systems Manager. These tools can help identify any network, IAM, or credential issues that might be preventing the connection from working properly.

Remember that AWS Glue Studio creates unified connections for PostgreSQL data sources, which may require additional steps for accessing Secrets Manager and VPC resources. Make sure you've completed any necessary additional configuration for these unified connections.

If the problem persists after checking these items, you may need to contact AWS support for further assistance, as there could be an underlying issue with the Glue service or how it's interacting with your specific PostgreSQL setup.
Sources
AWS Glue connection properties - AWS Glue
Troublehsoot AWS Glue connection to a JDBC data source | AWS re:Post
Using custom connectors and connections with AWS Glue Studio - AWS Glue

profile picture
answered 2 months ago
  • As mentioned, I have created a connection to the DB. I have created Glue jobs that can successfully read from the DB, directly. I have created a Crawler that can read from the DB and has created a Data Catalog table with the correct schema.

    The problem is that I am unable to read from that table. Here is a more complete example:

    from_db = glueContext.create_dynamic_frame.from_options(
        connection_type = "postgresql",
        connection_options = {
            "useConnectionProperties": "true",
            "dbtable": f"public.product_detail",
            "connectionName": "Sitara PostgreSQL Connection",
        },
        transformation_ctx = 'product_detail_ctx'
    )
    
    print('Printing schema of PG table:')
    from_db.printSchema()
    
    from_catalog = glueContext.create_dynamic_frame.from_catalog(
        database="default",
        table_name="sitara_public_product_detail",
        transformation_ctx="ctx_name",
        connection_type = "postgresql",
        connection_options = {
            "useConnectionProperties": "true",
            "dbtable": f"public.product_detail",
            "connectionName": "Sitara PostgreSQL Connection",
        },
    )
    
    print('Printing schema of Data Catalog table:')
    node_name.printSchema()
    

    This successfully prints out the from_db schema but fails after defining from_catalog. In the logs, I see:

    DataCatalogWrapper: got connection from glue client with name: PostgreSQL Connection and type: POSTGRESQL

    Just before:

    getCatalogSource: We don't support this connection type: null

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions