AWS Glue: Crawler worked with connection but using it in Glue Jobs result to "while calling o145.pyWriteDynamicFrame. The connection attempt failed."

Question

When I created a crawler to crawl an RDS (Postgres), it was able to connect and crawl one table I specified.  When I created a job, using the node type "AWS Glue Data Catalog table with PostgreSQL as the data target" and pointing to the database and table, it won't connect to the target.  It is giving me the "An error occurred while calling o145.pyWriteDynamicFrame. The connection attempt failed."  I've checked the security group and subnet of the RDS and the connection in Glue.  What else should I be checking?

Answer

Hi,

I see that you are receiving the following error while trying to connect to RDS as target using glue job.

"An error occurred while calling o145.pyWriteDynamicFrame. The connection attempt failed."

Common causes for this error can be caused by the subnet the Job is running not being able to connect to the RDS instance or the Security Group of the RDS instance not allowing access to the Security Group being used by the Glue Job. As a first reference, we have a step-by-step guide on how to set up the environment for access the RDS data stores, which covers the configurations that are needed here - https://docs.aws.amazon.com/glue/latest/dg/setup-vpc-for-glue-access.html

As you mentioned your glue crawler was working with RDS using same glue connection then I would also request you to check your data source with the glue job if you have used another connection for reading the data catalog table. Then kindly note that, AWS Glue supports one connection per job. If you specify more than one connection in a job, AWS Glue uses the first connection only. If your job requires access to more than one virtual private cloud (VPC) you have to create a dedicated VPC for Glue connection and then configure a peering connection with the VPCs where you have your data source. Please check below link for more details on same.

https://aws.amazon.com/premiumsupport/knowledge-center/connection-timeout-glue-redshift-rds/

https://aws.amazon.com/blogs/big-data/connecting-to-and-running-etl-jobs-across-multiple-vpcs-using-a-dedicated-aws-glue-vpc/

If the issue still persist, then please open a support case with AWS providing the connection details and code snippet used - https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case

Thank you.

Answer

Following up just in case people are wondering about this:  What I actually did was started with a new canvas, filled in the target first (and went backwards) and my source last.  This fixed the issue and the run was successful.  Very quirky.  Don't know if AWS knows about this issue.

AWS Glue: Crawler worked with connection but using it in Glue Jobs result to "while calling o145.pyWriteDynamicFrame. The connection attempt failed."

관련 콘텐츠