RDS Aurora PostgreSQL crashes with segmentation fault

0

Hey guys. I have the following issues. During normal operation and after initial load, once a day, pglogical crashes on the destination side, which is RDS aurora PGSQL 14.8 using pglogical. 2.4.2:

#Destination side 2024-01-25 13:17:09 UTC::@:[537]:LOG: background worker "pglogical apply 131082:4047160452" (PID 6709) was terminated by signal 11: Segmentation fault 2024-01-25 13:17:09 UTC::@:[537]:LOG: terminating any other active server processes 2024-01-25 13:17:09 UTC::@:[537]:FATAL: Can't handle storage runtime process crash 2024-01-25 13:17:09 UTC::@:[537]:LOG: database system is shutess crash 2024-01-25 13:17:09 UTC::@:[537]:LOG: database system is shut down

After that this initial error, the cluster enters into continuous rebooting and crashing, causing significant CPU usage and resources.

On source side we have some queries which are done couple seconds before that crash, but they don't seem to cause the problem as after re-creating the environment and re-executing the queries, the problem doesn't occur.

On the source cluster we are having these errors after the initial error on the destination: 2024-01-25 13:17:09 UTC:(63772):user@database_name:[26536]:LOG: could not receive data from client: Connection reset by peer 2024-01-25 13:17:09 UTC:(63772):user@database_name:[26536]:STATEMENT: START_REPLICATION SLOT "replication_slot_name" LOGICAL 12/28C9A430 (expected_encoding 'UTF8', min_proto_version '1', max_proto_version '1', startup_params_format '1', "binary.want_internal_basetypes" '1', "binary.want_binary_basetypes" '1', "binary.basetypes_major_version" '1400', "binary.sizeof_datum" '8', "binary.sizeof_int" '4', "binary.sizeof_long" '8', "binary.bigendian" '0', "binary.float4_byval" '0', "binary.float8_byval" '1', "binary.integer_datetimes" '0', "hooks.setup_function" 'pglogical.pglogical_hooks_setup', "pglogical.forward_origins" '"all"', "pglogical.replication_set_names" 'tenant_service', "relmeta_cache_size" '-1', pg_version '140008', pglogical_version '2.4.2', pglogical_version_num '20402', pglogical_apply_pid '6709') 2024-01-25 13:17:09 UTC:*(63772):user@database_name:[26536]:LOG: unexpected EOF on standby connection

Not sure if important to note but, the pglogical replicates between different AWS account.

Source and Destination: RDS Aurora PostgreSQL 14.8 pglogical: 2.4.2

Source: 1 Writer 1 Reader

Destination: 1 Writer

Julien
asked 3 months ago342 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions