Managed Collector for Prometheus Failing to Scrape without Logs

0

I'm struggling to add a managed Prometheus collector to my EKS cluster.

Initially, I could get some data from the scraper, which I created with the default configuration. I couldn't modify it to include custom metrics so I kept debugging, and found that my cluster is set to 'public' only in its endpoint access setting while the documentation says I must include 'private'.

However, when I switched to 'private' or 'public and private' then recreate the scraper, it no longer logs any data to the Promethues workspace. Interestingly, the only way I could make it work is switch the setting back to 'public', create the scraper, then change it again to 'public and private'.

More importantly, I can't see what went wrong anywhere in CloudWatch, even though I assigned a log stream and set the setting to "ALL" in the associated Prometheus workspace.

I don't know how I could proceed when I can't see any feedback when it doesn't work, and following the documentation seemingly making more issues than solving them.

Any advice would be appreciated. Thanks!

1 Answer
-1

It sounds like you're encountering some challenges with setting up a managed Prometheus collector for your EKS cluster. Let's address your concerns step by step:

  1. Endpoint Access Settings: The documentation is correct in stating that your EKS cluster's endpoint access should include 'private'. However, it's important to note that it doesn't have to be exclusively private. The 'public and private' setting should work as well, which aligns with your observation that it worked when you switched to this setting.

  2. Scraper Configuration: When you create or modify a scraper, it's crucial to ensure that the configuration aligns with your cluster's current settings. If you change the cluster's endpoint access, you may need to recreate the scraper to reflect these changes.

  3. Lack of Logs: The absence of logs in CloudWatch when the scraper isn't working is concerning. Even with the log stream assigned and the setting set to "ALL" in the Prometheus workspace, you should be seeing some activity. This could indicate a connectivity issue between the scraper and your cluster.

  4. Troubleshooting Steps:

    • Double-check that your VPC has DNS hostnames and DNS resolution enabled.
    • Ensure that the IAM roles and permissions are correctly set up for the scraper to access your cluster.
    • Verify that the security groups and network ACLs allow the necessary traffic between the scraper and your cluster.
    • Check if there are any VPC endpoints that might be interfering with the connection.
  5. Alternative Approach: If you continue to face issues with the managed collector, you might want to consider using a self-hosted Prometheus server within your cluster and configure it to remote write to your Amazon Managed Service for Prometheus workspace. This approach gives you more control and visibility into the collection process.

  6. Custom Metrics: For including custom metrics, you typically need to ensure that your applications are exposing metrics in a format that Prometheus can scrape. This usually involves setting up /metrics endpoints in your applications and configuring service discovery in Prometheus.

Given the complexity of your setup and the lack of visible logs, it might be beneficial to reach out to AWS support for a more in-depth investigation of your specific configuration. They can help identify any misconfigurations or compatibility issues that may not be immediately apparent.
Sources
Using an AWS managed collector - Amazon Managed Service for Prometheus
Ingest Prometheus metrics to the workspace - Amazon Managed Service for Prometheus

profile picture
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions