- Newest
- Most votes
- Most comments
1. Check ECS Service Connect Configuration
Ensure that your ECS Service Connect configuration is correctly set up. Double-check the following:
Service Discovery: Verify that service discovery is properly configured and that service A can correctly resolve service B.
Service Connect Definition: Ensure that the services are correctly defined and connected in your Service Connect configuration.
2. Validate Metric Collection
Sometimes, discrepancies in metrics might be due to a delay or issue in metric collection. Make sure:
CloudWatch Metrics: Look for any anomalies or gaps in CloudWatch metrics that might indicate a problem with metric collection.
Metric Aggregation: Ensure that the metrics are not being aggregated in a way that could cause confusion (e.g., if you’re aggregating over longer time periods).
3. Inspect Application and Load Balancer Logs
Check the application logs of both services and any load balancer logs that might be in use:
Service Logs: Verify the request and response times in the logs of service B to ensure they align with the 16-second processing time you expect.
Load Balancer Logs: If you're using a load balancer in front of your ECS services, check its logs to see if it provides additional insights into request handling times.
4. Review ECS Task Definitions and Health Checks
Ensure that:
Task Definitions: The task definitions for both services are correctly configured and up-to-date.
Health Checks: Verify that the health checks for your services are correctly configured and that they do not interfere with the response time metrics.
5. Cross-Verify with Other Metrics
Sometimes cross-verifying with other metrics or logs can provide additional insights:
AWS X-Ray: If you’re using AWS X-Ray, it can help trace requests across services and provide more granular timing information. Custom Metrics: Consider adding custom metrics or logs to your application code to provide more precise timing information.
It sounds like you have thoroughly investigated several aspects of your ECS Service Connect setup and are still encountering issues with the TargetResponseTime metric being consistently reported as 30 seconds, despite other indicators suggesting a response time closer to 16 seconds.
Given your findings, let's consider a few additional avenues to explore:
1. Metric Collection and Reporting Issue
Since the TargetResponseTime metric consistently shows 30,000 ms regardless of actual request times and aggregation attempts, it's possible there might be an issue with how this metric is being reported or collected. Here are a few things you might consider:
CloudWatch Metric Data Delay: There could be a delay in metric reporting or aggregation. Although unusual, CloudWatch metrics sometimes experience delays. If you haven’t already, try waiting a bit longer and re-checking the metrics.
Metric Source: Ensure that the metric source is correctly configured and there are no misconfigurations causing incorrect reporting. Sometimes, discrepancies can arise from how the metric is aggregated or reported.
2. ECS Service Connect and Target Response Time
Review the ECS Service Connect documentation and configuration to ensure that the metric being observed is indeed the one intended. Service Connect might have specific nuances in how metrics are reported:
Service Connect Metrics Documentation: Check AWS documentation for any known issues or peculiarities with TargetResponseTime metrics for ECS Service Connect. There might be notes or known issues that could explain the behavior you're seeing.
Service Configuration: Re-check the configuration of the Service Connect proxy. Sometimes, incorrect configurations can lead to unexpected behavior in metrics.
3. Metric Interpretation and Calibration
There might be an issue with how the metric is interpreted:
Metric Calibration: If possible, attempt to calibrate or validate the metric by running controlled tests or synthetic workloads to see if the metric behavior aligns with expectations.
Compare with Other Metrics: Although TargetResponseTime might be unreliable, look at other related metrics (like RequestLatency or TargetProcessingTime) for consistency. If these metrics show correct values, it might suggest an issue specific to TargetResponseTime.
- Metric Collection and Reporting Issue
The service B is already running for almost a day and the metric is the same. The other services I mentioned are already running for a few months and their metrics are also the same.
I'm not sure what you mean by metric source, the metric is reported by Service Connect Proxy and according to docs the metric is:
The latency of the application request processing. The time elapsed, in milliseconds, after the request reached the Service Connect proxy in the target task until a response from the target application is received back to the proxy.https://docs.aws.amazon.com/AmazonECS/latest/developerguide/available-metrics.html
- ECS Service Connect and Target Response Time
As mentioned above the docs about the metric say it is time in milliseconds between when request arrives to the Service connect proxy until the proxy gets response from the target container. I see no notes specific to this metric explaining my issues.
- Metric Interpretation and Calibration
As mentioned in my response above, I have periodic synthetic workloads in place and also I tried manually inducing the workload between services A and B with exact same results.
There are no metrics
RequestLatencyorTargetProcessingTimeyou mention anywhere in AWS documentation.target_processing_timeis a value emitted by a Load Balancer and the value for these requests for it is ~17 seconds also as mentioned above.
The constant 30,000ms reported by the TargetResponseTime metric in AWS ECS Service Connect may be due to default timeout settings, network overhead, or Service Connect-specific behavior. This metric might not fully reflect the actual application processing time, which could explain the discrepancy with the 16-second response seen in logs.
It’s recommended to verify timeout settings, consider network factors, and possibly use alternative metrics for more accurate performance monitoring. If the issue persists, contacting AWS support may help clarify the behavior.
Relevant content
- asked a year ago
- asked 3 months ago
- AWS OFFICIALUpdated 10 months ago

All requests from service A to service B succeed, their responses are as expected and service A runs completely correctly.
Any aggregation and/or statistic I try to apply to the TargetResponseTime metric always showns 30000ms. I have a synthetic canary set up against the service A which runs every 5 minutes. These are the requests which take ~16s but the entries in the metric correlating with these request say 30s.
Application logs of service A and B say 16-17 seconds for these requests. ALB in front of service A has in it's Load balancer access logs
target_processing_timeof ~17 seconds.As far as I understand it, container health checks for service B are run directly against the container, not through the ECS Service Connect Proxy and as such should not in any way contribute to this metric.
Unfortunately I am not using X-Ray for these service.
One additional piece of information that might be relevant, is that there are other services similar to service B that service A is calling. Their requests are much faster than service B's but their
TargetResponseTimemetric also has curious values. Those are always rounded to some 10s of ms (50ms, 100ms, 250ms) and they never fluctuate they just remain constant (even when inspecting Minimum statistic over 1s)can you check once below I've posted another answer