AWS Glue + RDS Postgres

0

Hello, can anyone give any advice on this. I created the very simple test Glue job: Source - RDS Postgres, Destination - S3 bucket. Run takes about 23 minuts and ends with timeout error. In the log I see "AWSGlueJobExecutor; Status Code: 400; Error Code: AccessDeniedException", but if I replace Source with S3 (instead of RDS Postgres) - so, the source is S3 and the destanation is S3 it ends successfully. And if I replace Destination with RDS Postgres, so the Source is RDS Postgres and the Destination is RDS Postgres - it ends successfully too. Here is the log file: https://drive.google.com/file/d/1O8ZdksZmHI7VKDz2n2JGJ2Qh-LNBlP4-/view?usp=drive_link. So what should I do ? P.S. And what do you think about Glue in general - for me it seems very unfriendly and buggy.

Pavel
asked 15 days ago593 views
3 Answers
0

Hello.

Is the subnet where RDS is located a private subnet?
In that case, try setting up an S3 gateway VPC endpoint.
It is possible that there is no route to access S3 after connecting to RDS, resulting in an error.
https://docs.aws.amazon.com/glue/latest/dg/vpc-endpoints-s3.html

profile picture
EXPERT
answered 15 days ago
profile picture
EXPERT
reviewed 9 days ago
  • RDS belongs to VPC consisting of two subnets, both of them are Public.

  • Glue does not have a public IP even if connected to a VPC, so it cannot access S3 by default. So we need to set up an S3 VPC endpoint to be able to access S3. https://docs.aws.amazon.com/glue/latest/dg/start-connecting.html

    If a job needs to run in your VPC subnet—for example, transforming data from a JDBC data store in a private subnet—AWS Glue sets up elastic network interfaces that enable your jobs to connect securely to other resources within your VPC. Each elastic network interface is assigned a private IP address from the IP address range within the subnet you specified. No public IP addresses are assigned. Security groups specified in the AWS Glue connection are applied on each of the elastic network interfaces. For more information, see Setting up Amazon VPC for JDBC connections to Amazon RDS data stores from AWS Glue.

  • The Endpoint does exist: Service Name = "com.amazonaws.us-east-1.s3", with Route Table including both public Subnets of the VPC. Endpoint Status is "Available", Type is "Gateway". Route Table "Main" sign = No.

0

Hi Pavel,

Could you check if the IAM role assumed by the Glue ETL job includes permissions to your “b20240516” S3 bucket?

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Action": [
				"s3:GetObject",
				"s3:PutObject",
				"s3:DeleteObject"
			],
			"Resource": [
				"arn:aws:s3:::b20240516*/*"
			]
		}
	]
}

The only way I managed to reproduce your error was because of lack of permissions.

If you have that policy already, please could you also attach the logs for the scenario "source is S3 and the destination is S3"?

AWS
answered 12 days ago
profile picture
EXPERT
reviewed 9 days ago
0

Thank you, Monica_A and Riku_Kobayashi for trying to help. Obviously the problem was that Endpoint is for Service Name = "com.amazonaws.us-east-1.s3" (and there were no other suitable values for choice) but all my S3 buckets were NOT in "us-east-1". After I replaced S3 bucket to another one that is located in "us-east-1" the job run completed successfully. Anyway all I want to say - I hate AWS Glue :-), the simplest things, that should take couple of minutes, take days and additionally you need to fight with different Glue bugs. I would be so happy to work with Azure Data Factory, but unfortunately in current project I need to work with Glue.

Pavel
answered 12 days ago
profile picture
EXPERT
reviewed 9 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions