- Newest
- Most votes
- Most comments
The behavior you're experiencing is related to how AWS Systems Manager Session Manager handles WebSocket connections and session resumption. Here's what's happening:
WebSocket Recycling and Session Resumption: The idle session timeout timer resets when Session Manager receives client-side inputs, including session reconnection (ResumeSession). These reconnections can occur due to network interruptions, browser tab management, or WebSocket disconnections. Because these events reset the idle timer, a session might remain active longer than the configured timeout period even without direct terminal commands.
The Silent Failure Pattern: Your logs show that the WebSocket connections are being closed approximately every 60 minutes with a normal closure (close 1000). Both sides attempt to reconnect, but when the EC2-side session-manager-plugin fails to reconnect successfully, the session becomes terminated on the service side. When your on-prem agent then tries to recreate the data channel, it receives "Session is already terminated" because the session has already been ended.
Why This Happens Intermittently: The issue is that while ResumeSession can handle idle timeouts and normal WebSocket recycling, there's a race condition or timing issue where the plugin occasionally fails to reconnect before the session is considered terminated. Once a session is terminated, it cannot be resumed - the connection is permanently ended and the data channel is closed.
Recommendations:
- Consider implementing session monitoring and automatic restart logic in your application layer to detect when the tunnel dies and establish a new session
- If you need strict control over session duration, you can use the Maximum session duration setting to enforce limits, though this won't solve the reconnection issue
- Monitor for connection failures and implement retry logic at the application level rather than relying solely on the plugin's reconnection mechanism
The fundamental issue is that long-lived port forwarding sessions (12-24 hours) are experiencing occasional reconnection failures during the periodic WebSocket recycling, and once the session is terminated, it cannot be recovered.
Sources
Specify an idle session timeout value - AWS Systems Manager
terminate_session - Boto3 1.42.50 documentation
Relevant content
- asked 3 years ago
- asked 3 years ago
- AWS OFFICIALUpdated 3 months ago
