Get Hands-on with Amazon EKS - Workshop Event Series
Whether you're taking your first steps with Kubernetes or you're an experienced practitioner looking to sharpen your skills, our Amazon EKS workshop series delivers practical, real-world experience that moves you forward. Learn directly from AWS solutions architects and EKS specialists through hands-on sessions designed to build your confidence with Kubernetes. Register now and start building with Amazon EKS!
How do I troubleshoot a failed or canceled query in Amazon Redshift?
I want to troubleshoot why my query in Amazon Redshift failed or canceled unexpectedly.
Short description
Amazon Redshift can fail or cancel a query for the following reasons:
- Amazon Redshift workload management (WLM) query monitoring rules (QMRs)
- Statement timeout value
- ABORT, PG_CANCEL_BACKEND, or PG_TERMINATE_BACKEND requests
- Network issues
- Cluster maintenance upgrades
- Internal processing errors
- ASSERT errors
Resolution
Note: The following steps apply only to Amazon Redshift provisioned clusters, not to Amazon Redshift Serverless.
WLM QMRs
You can use QMRs to define metric-based performance boundaries for your queues. QMRs can also specify the actions that Amazon Redshift takes when a query exceeds WLM time limits.
QMR specifies an abort rule
If a query cancels because of the ABORT action that's specified in a QMR, then the query returns the following error: "ERROR: Query (500029) cancelled by WLM abort action of Query Monitoring Rule "testrule"."
To check whether an ABORT action canceled a query, run the following query:
SELECT * FROM stl_wlm_rule_action WHERE action = 'abort';
The query output lists all queries that the ABORT action canceled. If your query ID is listed in the output, then check the WLM QMR parameter that caused the ABORT action.
No available queues for the query to hop to
WLM can hop a query if the query monitoring rule specifies a "hop" action. When WLM hops a query, it routes the query to the next matching queue according to the WLM queue assignment rules.
If the query doesn't match a queue definition, then the query cancels. A canceled query isn't reassigned to the default queue. For more information, see Properties for the WLM configuration parameter.
Note: You can hop queries only in a manual WLM configuration.
If WLM hops a query, but no matching queues are available, then the canceled query returns the following error message: "ERROR: Query (500104) canceled on user's request and ran out of wlm queues for restart."
If you receive the error message, then check the user-defined queues. Run the following query:
SELECT * FROM stl_wlm_query WHERE query=<query-id>;
Note: Replace query-id with the canceled query ID.
The results show service_class numbers 6-13 for your user-defined queues. For example, service_class 6 might list Queue1 in the WLM configuration, and service_class 7 might list Queue2.
To check how service_class numbers map to your queues, run the following query:
SELECT * FROM stv_wlm_service_class_config WHERE service_class>5;
After you get the queue mapping information, check the WLM configuration in the Amazon Redshift console. Verify that the queues match the WLM configuration. You can only hop a query if there's a matching queue available for the user group or query group configuration.
For more information, see WLM query queue hopping.
Statement timeout value
The statement_timeout value is the maximum amount of time that a query can run before Amazon Redshift terminates it. If a query exceeds statement timeout, then Amazon Redshift cancels queries submitted during the session with the following error message: "ERROR: Query (150) cancelled on user's request".
To verify whether Amazon Redshift canceled a query because of statement timeout, run the following query:
SELECT * FROM svl_statementtext WHERE text ILIKE '%set%statement_timeout%to%' AND pid IN (SELECT pid FROM STL_QUERY WHERE query = <queryid>);
Note: Replace queryid with the canceled query ID.
You can also set statement timeouts in the cluster parameter group. Check your cluster parameter group and any statement_timeout configuration settings for additional confirmation. For more information, see Modifying a parameter group.
Abort, cancel, or terminate requests
To check if a user manually stopped or canceled a particular query, run the following query with the relevant query ID:
SELECT * FROM svl_statementtext WHERE text ILIKE '%cancel%' AND xid in (SELECT xid FROM STL_QUERY WHERE query = <queryid>); SELECT * FROM svl_statementtext WHERE text ILIKE '%abort%' AND xid IN (SELECT xid FROM STL_QUERY WHERE query = <queryid>);
Note: Replace queryid with the canceled query ID.
If the query ID appears in the output, then a user stopped or canceled the query.
Note: Users can only terminate their own session. A superuser can terminate all sessions.
A user can stop individual queries if they cancel or terminate the process that runs the query. The following examples are commands that can cancel or terminate a query:
- PG_TERMINATE_BACKEND
- PG_CANCEL_BACKEND
- CANCEL
If one of these commands cancels or terminates a process, then Amazon Redshift logs an entry in SVL_TERMINATE. Check the SVL_TERMINATE logs to confirm that the query stopped because a session terminated. The following example is a query that returns the relevant details from the SVL_TERMINATE logs:
SELECT * FROM svl_terminate WHERE pid=(SELECT pid FROM STL_QUERY WHERE query=500534);
Network issues
Some queries cancel because of underlying network issues. The following error messages indicate potential network problems:
- Internal pubytes: failed to send errors
- SocketTimeoutException errors
- Connection timed out errors
- "Unexpected EOF: Server closed the connection unexpectedly" errors
- "Could not translate host name "" to address" errors
- "SSL connection has been closed unexpectedly" errors
Note: These errors typically occur when there are interruptions in network connectivity between your client and the service endpoint.
To verify whether a network issue causes your query to cancel, run the following query to check the STL_CONNECTION_LOG entries:
SELECT * FROM stl_connection_log WHERE pid IN(SELECT pid FROM STL_QUERY WHERE query = <query_id>);
Note: Replace query_id with the canceled query ID.
Maintenance upgrades
If scheduled maintenance occurs while your query runs, then your cluster requires a cluster reboot, and the query terminates and rolls back. It's a best practice to schedule long-running operations (such as large data loads or the VACUUM operation) to avoid maintenance windows. For more information, see Schedule around maintenance windows.
To check if there was maintenance on your Amazon Redshift cluster, use the Events tab in the Amazon Redshift console.
Internal processing errors
The STL_ERROR table records internal processing errors generated by Amazon Redshift. The STL_ERROR table doesn't record SQL errors or messages.
To verify whether an internal error canceled your query, run the following query to check the STL_ERROR entries:
SELECT * FROM stl_error WHERE pid=<query pid>;
Note: Replace query pid with your process ID.
ASSERT errors
Queries can stop when an ASSERT error occurs. Unlike the previous reasons, issues with the query itself can prompt an ASSERT error. If you receive an ASSERT error, then contact AWS Support for assistance.
- Topics
- Analytics
- Tags
- Amazon Redshift
- Language
- English

Relevant content
- asked 10 months ago
- asked 5 years ago
AWS OFFICIALUpdated a year ago
AWS OFFICIALUpdated 8 months ago