Redshift Serverless Workgroup crashed and now Namespace metrics for are unavailable. How to recover?


We had a very interesting but worrying incident with a staging environment in Redshift Serverless. Upon execution of a stored procedure that used DDL, Redshift Serverless got stuck in an unusable state.

  • According to pg_locks, there were several locks on the various tables used in the procedure. The accompanying processes could not be terminated
  • There were no running queries anymore.
  • I couldn't drop the schema or tables because of the aforementioned table locks.
  • Looking at resource monitoring, the Redshift Serverless Workgroup was using max RPU resources at a constant rate.
  • After a while, no connection could be made anymore to the workgroup, everything was just completely unresponsive.
  • As a last resort, the workgroup was destroyed and a new workgroup was attached to the namespace.

This seems to have worked, the behaviour is normal again. However, all Cloudwatch metrics related to the namespace have stopped reporting since the incident. Meaning we can't currently monitor storage or resource usage, there is just no data available.

So my question is twofold:

  • Is there any way to recover Cloudwatch metrics operation for this namespace?
  • How could this have happened and more importantly, prevented? It seems quite worrying Redshift Serverless could end up in such a condition without having the right tools to resolve the incident.
asked 8 months ago293 views
1 Answer

The metrics currently available for Amazon Redshift Serverless namespace include TotalTableCount and DataStorage.

It's important to note that the Amazon Redshift Serverless workgroup is directly linked to this namespace. If your workgroup is unavailable, the related databases will also be inaccessible, subsequently rendering the TotalTableCount and DataStorage metrics inapplicable.

In order to have visibility into these metrics through CloudWatch, we recommend attaching the Amazon Redshift Serverless namespace to a workgroup. This step will ensure that the metrics - TotalTableCount and DataStorage - become accessible, providing you with valuable insights.

Thank you for your attention, and we look forward to enhancing your experience with Amazon Redshift Serverless metrics.

answered 8 months ago
  • Thank you for your response. To clarify, the namespace is attached to a new workgroup and Cloudwatch Metrics of neither Workgroup or Namespace are available. E.g. DataStorage, TotalTableCount, ComputeCapacity, ComputeSeconds all have stopped reporting data.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions