Aurora PostgreSQL 9.6.12: Segmentation fault and restart

0

Large production database running stable on PostgreSQL 9.6.6/8 for a for a year. Today upgraded to 9.6.12 and in the first few hours encountered two segmentation faults causing the database to restart. Generally …

LOG: Segmentation fault
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
FATAL:  Can't handle storage runtime process crash
LOG:  database system is shut down
FATAL:  the database system is in recovery mode

The same query was logged in both case (basic INNER JOINs).

Is this a known issue? Any advice to bypass it?

Dump/Restore is prohibitive. I'd like to get ahead of this before the work-week starts.

질문됨 5년 전1966회 조회
10개 답변
0

We continue to see the identical segfault/restart pattern every 6 hours or so.

We're rebuilding the large indexes and trying to narrow to a reproducible case.

A similar case, but on a different stack: https://github.com/postgrespro/pg_pathman/issues/193

답변함 5년 전
0

We copied the db to a test instance and can reproduce the segfault with a one-line SELECT statement.

답변함 5년 전
0

Can we grab a stack trace?

답변함 5년 전
0

Hi Northrock. I am a development manager for Aurora PostgreSQL. We are very sorry for the issue you have encountered on 9.6.12. Our engineers have identified the problem and we have prepared a patch release. This is being deployed to our production regions now and will appear as available maintenance for your cluster once it reaches your region.

If you would like to send me a pm with your region and cluster name we can provide some additional options in terms of patching.

AWS
답변함 5년 전
0

That's great news! Once deployed, I'll recheck our reproducible case and report back quickly.

답변함 5년 전
0

The issue appears to be fully resolved.

Thanks for the special support. Much appreciated.

답변함 5년 전
0

Hello,

We just upgraded from 9.6.8 to 9.6.12 last night, and we immediately started seeing crashes with a frequency of about once/hour.

Is there another fix/upgrade available for the 9.6-compatible Aurora series?

답변함 5년 전
0

Updating Aurora 1.5.0 to 1.5.1 resolved our issue.

SELECT AURORA_VERSION();
답변함 5년 전
0

Hello AWS,

Check my thread,. We are on Aurora 10.7 and we are running into this all the time, at few times a week. It just bought down our live site again.

https://forums.aws.amazon.com/thread.jspa?threadID=303997

VBK
답변함 5년 전
0

Roll out the AURORA update faster, AWS Support!

This same problem started randomly hitting our 9.6 clusters this week and we just can only watch them restart until things clear up.

Typically we start seeing random queries start taking longer than normal and some take 10 or more minutes. Normally the same query would average under 2 seconds.

I've had to delete and reindex some indices due to errors like: Attempting to read past EOF of relation "base/16402/24527244". blockno=6400 nblocks=3834

Aurora is on version 1.5.0 -- which is something only AWS can update.

답변함 5년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠