Redshift Serverless Workgroup crashed and now Namespace metrics for are unavailable. How to recover?

0

We had a very interesting but worrying incident with a staging environment in Redshift Serverless. Upon execution of a stored procedure that used DDL, Redshift Serverless got stuck in an unusable state.

  • According to pg_locks, there were several locks on the various tables used in the procedure. The accompanying processes could not be terminated
  • There were no running queries anymore.
  • I couldn't drop the schema or tables because of the aforementioned table locks.
  • Looking at resource monitoring, the Redshift Serverless Workgroup was using max RPU resources at a constant rate.
  • After a while, no connection could be made anymore to the workgroup, everything was just completely unresponsive.
  • As a last resort, the workgroup was destroyed and a new workgroup was attached to the namespace.

This seems to have worked, the behaviour is normal again. However, all Cloudwatch metrics related to the namespace have stopped reporting since the incident. Meaning we can't currently monitor storage or resource usage, there is just no data available.

So my question is twofold:

  • Is there any way to recover Cloudwatch metrics operation for this namespace?
  • How could this have happened and more importantly, prevented? It seems quite worrying Redshift Serverless could end up in such a condition without having the right tools to resolve the incident.
Bart
已提問 9 個月前檢視次數 311 次
1 個回答
0

The metrics currently available for Amazon Redshift Serverless namespace include TotalTableCount and DataStorage.

It's important to note that the Amazon Redshift Serverless workgroup is directly linked to this namespace. If your workgroup is unavailable, the related databases will also be inaccessible, subsequently rendering the TotalTableCount and DataStorage metrics inapplicable.

In order to have visibility into these metrics through CloudWatch, we recommend attaching the Amazon Redshift Serverless namespace to a workgroup. This step will ensure that the metrics - TotalTableCount and DataStorage - become accessible, providing you with valuable insights.

Thank you for your attention, and we look forward to enhancing your experience with Amazon Redshift Serverless metrics.

已回答 9 個月前
  • Thank you for your response. To clarify, the namespace is attached to a new workgroup and Cloudwatch Metrics of neither Workgroup or Namespace are available. E.g. DataStorage, TotalTableCount, ComputeCapacity, ComputeSeconds all have stopped reporting data.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南