EndpointConnectionError from Python app when trying to get sampling rules

0

I have a Python / Flask application running in a Docker container under Elastic Beanstalk. The X-Ray daemon (v3.2.0) is installed inside the container. The daemon's logs show that it is successfully sending batches of data to the X-Ray service. However, my application logs show a failure when the Python X-Ray SDK (v2.6.0) attempts to get sampling rules, with a stack dump ending as follows:

File "/usr/local/lib/python3.8/site-packages/botocore/endpoint.py", line 269, in _send
return self.http_session.send(request)
File "/usr/local/lib/python3.8/site-packages/botocore/httpsession.py", line 283, in send
raise EndpointConnectionError(endpoint_url=request.url, error=e)
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:2000/GetSamplingRules"
[2021-02-11 19:53:22 +0000] [17] [INFO] No effective centralized sampling rule match. Fallback to local rules.

I can confirm that the daemon is running, although a request to the URL shown above returns a 403 Forbidden error. I don't know if that's expected or not.

Can anyone suggest what might be going on here? Is there an incompatibility between the 3.x daemon that's running and the 2.x SDK? Any help would be greatly appreciated. Thank you!

asked 3 years ago1006 views
4 Answers
0

Hi jgarbers1
The X-Ray daemon needs to expose 2 ports (UDP and TCP) when it is running in a container. The UDP port is used by the X-Ray SDK to send trace data to daemon whereas the TCP port is used for fetching the sampling rules. From your description of the problem, it seems like you may have opened the daemon's UDP port but not the TCP port.
Can you verify this and try exposing both the ports? If you're using a Dockerfile, you can follow this doc to do so: https://docs.aws.amazon.com/xray/latest/devguide/xray-daemon-ecs.html#xray-daemon-ecs-build

If you still experiance the issue, please provide details on the configuration of your app and daemon.
Thanks!

AWS
answered 3 years ago
0

Thanks for the help, prashataws! I'll review the information at that link today. In the meantime, though, to clarify:the application using the SDK and the daemon are both running in the same container, so it doesn't seem like any ports would need to be exposed...? I know this is somewhat at odds with the Docker "one process per container" guideline, but I didn't want to take on the task of converting both my front end and worker applications into multi-container EB projects.

Could the "could not connect" situation just be transient, if the SDK is trying to connect to it before it's completely up and running?

answered 3 years ago
0

I see. If you have the app and the daemon running in the same container and you didn't need to expose the UDP port for sending segments to daemon, then you may not need to expose the TCP port for sampling rules as well. But it would be good to try with exposing the ports in my opinion.
What makes you think this issue could be transient? Do you see the error only during the first few calls to your application and then it works fine afterwards? Ideally the daemon process should be up and running before the application starts creating segments/subsegments.

AWS
answered 3 years ago
0

I'll experiment with the ports shortly. It's been a few weeks since I had the problem, and my notes aren't detailed enough for me to recall whether the errors sort of went away or not. I do start the daemon before launching my app, but it's possible that the daemon is still getting things together at the time the app starts trying to talk to it. I'll be revising the system later in the week and follow up here if I'm still having issues. Thanks again for the help!

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions