- Newest
- Most votes
- Most comments
Based on your description, here are the most likely causes and solutions for your SSH connection issue:
Common causes to investigate:
-
Network ACL (NACL) restrictions - Even though your security group allows SSH on port 22, check if the subnet's Network ACL is blocking inbound traffic on port 22 or outbound traffic on ephemeral ports (1024-65535). NACLs are stateless, so both inbound and outbound rules matter.
-
Route table configuration - Verify that your subnet's route table has a route to an Internet Gateway (0.0.0.0/0 → igw-xxxxx) if you're trying to connect from the internet. Without this, your instance cannot receive traffic from outside the VPC.
-
SSH service not running - The instance OS might not have the SSH daemon running or properly configured. This can happen if the instance didn't initialize correctly or if there was a configuration issue during launch.
-
Wrong username - While you're using "ubuntu" which is correct for Ubuntu AMIs, verify you're using the right username for your specific AMI (e.g., "ec2-user" for Amazon Linux, "admin" for Debian).
-
Instance has no public IP in private subnet - Confirm your instance actually has a public IP address assigned and is in a public subnet. An instance in a private subnet without proper NAT configuration won't be reachable from the internet.
-
Firewall within the OS - The operating system itself might have firewall rules (like iptables or firewalld) blocking SSH connections.
Recommended troubleshooting steps:
- Use EC2 Instance Connect or Systems Manager Session Manager as alternative connection methods to access the instance and check SSH service status
- Review VPC Flow Logs to see if traffic is being rejected at the network level
- Check the system log and instance screenshot from the EC2 console to identify any boot or configuration issues
- Verify the instance is in a public subnet with proper internet gateway routing
- Ensure your local firewall or corporate network isn't blocking outbound SSH connections
Since your status checks are passing (2/2), the instance itself is running, which points to a networking configuration issue rather than an instance health problem.
Your Blocker: Payload Passing Fix
The issue is syntax. <$.body> is input template syntax, not target parameter syntax. Use the .$ suffix:
"Environment": [
{
"Name": "RAG_JOB_PAYLOAD",
"Value.$": "$.body"
}
]
"Value.$" tells Pipes to evaluate the JSONPath. Without .$, it's treated as a literal string. Your worker then parses it:
import json, os
payload = json.loads(os.environ["RAG_JOB_PAYLOAD"])
If Value.$ doesn't work in ECS overrides (known limitation in some configurations), use one of the alternatives below.
Recommended Alternatives
Option A: SQS > Lambda > ECS RunTask (Simplest)
Lambda receives the SQS body natively, calls ecs.run_task() with container overrides. Full control, no JSONPath issues, Lambda runs under 1 second, cost is negligible.
Option B: SQS > Step Functions > ECS RunTask (Most Robust)
Use ecs:runTask.sync — Step Functions waits for task completion, has built-in retry and error handling, and Value.$ works reliably. Best if you need completion tracking without polling.
Option C: Store Payload in MongoDB, SQS Triggers Only (Most Decoupled)
API writes the job to MongoDB, sends only job_id to SQS. ECS worker fetches payload from DB. Simplest Pipe config (no overrides), no payload size limits, and the worker updates status in the same DB the frontend polls.
Comparison
- Pipes to ECS — Lowest cost, but payload passing is tricky and hard to debug.
- Lambda to ECS — Easy payload passing, easy debugging, near-zero cost.
- Step Functions to ECS — Easy payload passing, built-in completion tracking and retries, visual execution history.
- DB + Trigger — Most decoupled, no payload passing needed, worker fetches from DB.
Recommendation
Option A (Lambda to ECS) for simplicity. Option C (DB + Trigger) if you're already polling MongoDB for status.
Avoid the always-running ECS worker polling SQS — adds idle cost.
Tips:
- Set SQS visibility timeout longer than your longest job.
- Add a dead letter queue for failed messages.
- Use Fargate Spot for worker tasks (up to 70% savings since they're not user-facing).
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 3 years ago
