Questions tagged with Aurora PostgreSQL
Content language: English
Sort by most recent
Aws Glue Job PySpark - Bookmarks not working as expected. I have everything enabled with Job.Init and Job.Commit along with my DataFrames using transformation_ctx property. Also the primary key on the tables are sequential.
I expect on the first run to insert customer "corey". On the next run it should not include customer "corey". Instead it does and fails the jdbc from options write.
Error: "An error occurred while calling o122.pyWriteDynamicFrame. ERROR: duplicate key value violates unique constraint "customer_customerid_key"" --it should not be inserted the same customer here...why?!?!
Here is the postgres table for customer
```
CREATE TABLE poc.customer (
customerid int4 NOT NULL,
firstname varchar NOT NULL,
lastname varchar NULL,
myid serial PRIMARY KEY,
CONSTRAINT customer_customerid_key UNIQUE (customerid)
);
```
SCRIPT
```
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql import DataFrame
from awsglue.dynamicframe import DynamicFrame
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
dynamoDbConnection = glueContext.create_dynamic_frame.from_catalog(
database="corey-reporting-db",
table_name="cust-corey_customer",
transformation_ctx="dynamoDbConnection_node1",
)
#dynamoDbConnection.printSchema()
#dynamoDbConnection.show()
relationalized_json = dynamoDbConnection.relationalize(root_table_name='customer', staging_path='s3://*********/data/ddb/corey-customer/job-output', transformation_ctx = "relationalized_json")
#relationalized_json.keys()
customer = relationalized_json.select('customer')
#customer.printSchema()
#customer.show()
customerAddress = relationalized_json.select('customer_addresses')
#customerAddress.printSchema()
#customerAddress.show()
customerPhoneNumbers = relationalized_json.select('customer_phonenumbers')
#customerPhoneNumbers.printSchema()
#customerPhoneNumbers.show()
customerMapping = ApplyMapping.apply(frame = customer, mappings = [
("customerId", "string", "customerId", "int"),
("firstname", "string", "firstname", "string"),
("lastname", "string", "lastname", "string")
], transformation_ctx = "customerMapping")
#customerMapping.printSchema()
#customerMapping.show()
customerAddressMapping = ApplyMapping.apply(frame = customerAddress, mappings = [
("id", "long", "customerId", "int"),
("`addresses.val.zip`", "string", "zip", "string"),
("`addresses.val.state`", "string", "state", "string"),
("`addresses.val.type`", "string", "type", "string"),
("`addresses.val.street`", "string", "street", "string")
], transformation_ctx = "customerAddressMapping")
#customerAddressMapping.printSchema()
#customerAddressMapping.show()
customerPhoneMapping = ApplyMapping.apply(frame = customerPhoneNumbers, mappings = [
("id", "long", "customerId", "int"),
("`phonenumbers.val.type`", "string", "type", "string"),
("`phonenumbers.val.value`", "string", "value", "string")
], transformation_ctx = "customerPhoneMapping")
#customerPhoneMapping.printSchema()
#customerPhoneMapping.show()
customerSink = glueContext.write_dynamic_frame.from_options(
frame = customerMapping,
connection_type = 'postgresql',
connection_options={
"url": "jdbc:postgresql://********.us-east-1.rds.amazonaws.com:5432/corey_reporting",
"dbtable": "poc.customer",
"user": "postgres",
"password": "********"
},
transformation_ctx = "customerSink"
)
customerAddressSink = glueContext.write_dynamic_frame.from_options(
frame = customerAddressMapping,
connection_type = 'postgresql',
connection_options={
"url": "jdbc:postgresql://*******.us-east-1.rds.amazonaws.com:5432/corey_reporting",
"dbtable": "poc.address",
"user": "postgres",
"password": "****"
},
transformation_ctx = "customerAddressSink"
)
customerPhoneSink = glueContext.write_dynamic_frame.from_options(
frame = customerPhoneMapping,
connection_type = 'postgresql',
connection_options={
"url": "jdbc:postgresql://*********.us-east-1.rds.amazonaws.com:5432/corey_reporting",
"dbtable": "poc.phonenumber",
"user": "postgres",
"password": "****"
},
transformation_ctx = "customerPhoneSink"
)
job.commit()
```
I'm trying to migrate instance aurora PostgreSQL none partition table to same aurora PostgreSQL instance partitioned table(source, target endpoint instance is same). none partition table size is too big!
I'm thinking of DMS to minimize downtime when moving data to partitioning tables. is it possible?
I have 2 questions
* Does Babelfish for aurora support TLS 1.2 encryption in flight?
* what port does it use for encryption if so 1433? is that port changeable?
Hello,
I am trying to connect to an Aurora RDS cluster located in our private subnets from our managed grafana as a data source.
I have created a Managed Grafana Interface VPC endpoint in the RDS cluster VPC, and for the same region our managed grafana was created at (for `com.amazonaws.us-east-2.grafana` service). I have also attached this endpoint to all the private subnets the RDS cluster's subnet group is configured on.
When I test the connection to our RDS cluster in Grafana it fails with the following text:
```
<html> <head><title>504 Gateway Time-out</title></head> <body> <center><h1>504 Gateway Time-out</h1></center> </body> </html> <!-- a padding to disable MSIE and Chrome friendly error page --> <!-- a padding to disable MSIE and Chrome friendly error page --> <!-- a padding to disable MSIE and Chrome friendly error page --> <!-- a padding to disable MSIE and Chrome friendly error page --> <!-- a padding to disable MSIE and Chrome friendly error page --> <!-- a padding to disable MSIE and Chrome friendly error page -->
Gateway Time-out
```
I'm specifying the endpoint and port of a reader instance, which otherwise being accessed from our bastion host is reachable.
The reader instance's VPC security group allows TCP incoming traffic from `0.0.0.0/0` towards port `5432` and allows all outbound traffic.
What could be the problem with my setup?
Thank you for your time.
Fetch hardware details(Physical Ram, Number of Cores, Operating System) of an RDS(Postgresql, Aurora) instance. There are APIs to check the details of CPU usage, Memory usage, etc. But, the use case which I want to achieve is not the amount of resource consumed but the capacity of the instance where RDS is running. Is there any way to achieve this?
We jumped on the wagon of Aurora Serverless v1, Postgresql, as the use-case for us with micro-services and many databases was perfect together with AppSync.
Only too late did we realize how bad Aurora Serverless v1 works with scaling, as it can't scale when there are open transactions. Also there is a hard limit of 500 connections maximum of the Data-API.
This causes a lot of DB outages for us...
We were part of the early adopters of Aurora Serverless v2 and tested the scaling which works nicely (it's what v1 should have been)...
The lack of the Data-API in v2 surprised us as we can't upgrade because of AppSync.
Can you share any details if Data-API will be available on v2, and when, or if you will fix v1's flawed scaling?
I've created Aurora PostgreSQL with 'db.r5.2xlarge' instance type which has 64GiB memory.
I checked 'Enhanced Monitoring' for one of the RDS instance, checked graph for - Total Memory, Active Memory, Free Memory.
My expectation was at any point in time, sum of Free Memory and Active Memory should be equal to Total Memory.
But, in monitoring graph, I can see sum of Free and Active memory is ~18GiB. And total memory is 62.13GiB.

I want to understand if I'm reading metrices correctly.
Why there is so much difference between sum of active , free memory And total memory?
When trying to add the cron.task_running_timeout to increase the timeout for pg_cron I am getting the error:
> Error: error modifying DB Cluster Parameter Group: InvalidParameterValue: Could not find parameter with name: cron.task_running_timeout
Does this mean the parameter is unsupported on RDS? Do I need to set the parameter at the session level?
Thanks!
Hello. We want migrate to AuroraDB.
1. We have postgres RDS database instance in us-east-1
2. Created Aurora Read-replica in same region
3. Try to promote read-replica
See blank screen in UI.
Tried to do this from multiple locations, with multiple browsers, with cleared cache, with incognito - result the same. Error still appear 24h and still not fixed. Feedback with issue sent twice.
Errors in browser console:
```
`browser-polyfill.js:1163 Uncaught (in promise) Error: Cannot read properties of undefined (reading 'allowlisted')
at wrappedSendMessageCallback (browser-polyfill.js:1163:16)`
```
```
DevTools failed to load source map: Could not load content for https://dado4e41kbenk.cloudfront.net/xxxxxxxxxxx/chunkxxxxx.js.map: HTTP error: status code 403, net::ERR_HTTP_RESPONSE_CODE_FAILURE
```
```
chunk2477.js:2 TypeError: Cannot read properties of null (reading 'Engine')
```
Pls help.
I need to load my data from S3 Bucket into the Aurora PostgreSQL tables. I read https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_PostgreSQL.S3Import.html#USER_PostgreSQL.S3Import.Reference this documentation and I will try. But, do we have any other way to be able to handle this case?
Especially I want to ask, Is there any LOAD command which exists on Aurora PostgreSQL?
Thanks!
I use a Aurora PGSQL cluster ( 4 nodes in total ). The database is partitioned by month, with the largest partition for that table being around 1.3TB of data. One of the columns within the table is a JSONB type. I'm wanting to enable GIN indexing on the column so that I query by fields within the JSONB object.
I am creating the GIN Index concurrently as to not affect live traffic. I have been able to create a GIN Index within the QA environment because the data is relatively small. However when I try to create the GIN index within production, the server runs out of temp storage whilst building that index ( [see here](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Managing.html#AuroraPostgreSQL.Managing.TempStorage) for list of temp storage available per node size ).
An easy solution for this would be to say 'just scale up the nodes within the cluster', that way there is more temp space to build the GIN index in. This would likely work, but it seems a little overkill. If I scaled the nodes up just to GIN index creation, then scaled them down, I would be in a position whereby there is not currently sufficient hardware to rebuild that GIN Index should it need rebuilding - this seems like a smell in production..
If I scaled the instances up and left them scaled up, the instances would be massively overprovisioned and it would be very expensive.
I'm curious as to if there is any workaround for this temp space issue so that I would not have to scale up the servers so drastically. Also if I scaled the servers up, then scaled them down after the GIN indexing completes, would this be risky, is there any reason why the GIN index would have to completely rebuild after the initial build.
Thanks
I have a PostgreSQL database created in Amazon RDS. In that database, there is table with a column whose data type is 'time with time zone'. When I crawl this table using AWS Glue Crawler, this column's data type is shown as 'timestamp'. When I try to edit the schema, there is no 'time' datatype in the list. What should I do to solve the issue? Any help is appreciated.
Thank you