To understand how RDS monitors "Idle in Transaction" sessions and how to resolve.
How RDS monitors "Idle in Transaction" sessions
Amazon RDS Performance Insights (paid tier) proactively monitors the metric idle_in_transaction_max_time, which tracks the longest-running transaction in an idle in transaction state. The alert triggers when this metric exceeds a threshold. However, the system uses a multi-step process to resolve the alert. Let's go into the detailed steps.
Detection:
The alert opens if the metric reports non-zero data points (indicating active idle transactions).
Resolution:
After the metric returns to zero (no idle transactions), RDS continues monitoring for additional time (buffer period) to ensure no new spikes occur. Only then is the alert marked as resolved. In case, Internal AWS monitoring observed intermittent spikes in idle_in_transaction_max_time,then even though manual checks (e.g., querying pg_stat_activity) show no active sessions, transient spikes may keep the metric alert active. These spikes could occur due to:
- Short-lived application issues: For example, a connection pooler briefly leaving transactions open.
- Sampling intervals: The metric is collected periodically (e.g., every 5 minutes), so manual checks might miss short-lived spikes.
Key Factors Contributing to the Alert:
- Default Parameter Settings:
The default idle_in_transaction_session_timeout (86,400,000 ms/24 hours) allows sessions to stay idle longer than the alert threshold. Thus, alerts can fire long before the parameter terminates sessions.
- Application behaviour:
Idle transactions often stem from:
- Uncommitted/rolled-back transactions in application code.
- Disabled AUTOCOMMIT in clients like psql or JDBC.
- Long-running operations within open transactions.
Resolution Steps:
- Adjust idle_in_transaction_session_timeout and reduce the parameter to align with your alert threshold.
The following command changes idle_in_transaction_session_timeout for a specific DB parameter group.
aws rds modify-db-parameter-group \ --db-parameter-group-name <parameter_group_name> \ --parameters "ParameterName=idle_in_transaction_session_timeout,ParameterValue=<new_value>,ApplyMethod=immediate"
- Monitor metrics proactively using the below:
Use Performance Insights to track:
* idle_in_transaction_max_time
* idle_in_transaction_count
- Set up CloudWatch alarms for these metrics to monitor spikes.
-
Audit Application Code:
- Ensure AUTOCOMMIT is enabled in clients.
- Add explicit COMMIT/ROLLBACK statements after transactions.
- Handle errors properly to avoid orphaned transactions.
-
Query for Idle Sessions:
SELECT now() - state_change as idle_in_transaction_duration, now() - xact_start as xact_duration,*
FROM pg_stat_activity
WHERE state = 'idle in transaction'
AND xact_start is not null
ORDER BY 1 DESC;
- Terminate long-running sessions with SELECT pg_terminate_backend(pid);
Why recommendations persist after mitigation:
The RDS recommendation system has a built-in buffer to avoid sending multiple similar recommendations to customers. Currently, this buffer is set to 24 hours, but it may be adjusted over time based on our learnings and customer feedback.
References:
[+] How DevOps Guru for RDS works - https://docs.aws.amazon.com/devops-guru/latest/userguide/working-with-rds.overview.how-it-works.html