Synthetics Heartbeat Canary shows success, logs indicate failure

0

Hello, I'm trying to sort out a Synthetics Heartbeat Canary that is monitoring our vendor's web services. We've pointed the canary at a collection of endpoint swagger pages as they don't require authentication.

We found an issue today where one of the services went down, but the canary kept showing things as working.

On closer inspection, the HTML page for the swagger page was being returned, but the page was requesting data via a javascript GET request for and that request was failing. ERROR: Request failed. ErrorText: net::ERR_ABORTED. Request: GET https://***************/api/ReferenceData/v1/ReferenceData-API-Swagger/swagger.json; ResourceType: fetch

I don't see any place in the autogenerated template where "ERROR: Request failed." is output to the logs. How do we capture this error and fail the canary?

The script I'm using is as follows:

const synthetics = require('Synthetics');
const log = require('SyntheticsLogger');
const syntheticsConfiguration = synthetics.getConfiguration();
const syntheticsLogHelper = require('SyntheticsLogHelper');

const loadBlueprint = async function () {

    const urls = [
        'https://*************/api/Accounting/v1/swagger/index.html',
        'https://*************/api/ReferenceData/v1/swagger/index.html',
        'https://*************/api/ReportWizard/v1/swagger/index.html',
        'https://*************/api/DataImport/v1/swagger/index.html'
    ];

    // Set screenshot option
    const takeScreenshot = true;

    /* Disabling default step screen shots taken during Synthetics.executeStep() calls
     * Step will be used to publish metrics on time taken to load dom content but
     * Screenshots will be taken outside the executeStep to allow for page to completely load with domcontentloaded
     * You can change it to load, networkidle0, networkidle2 depending on what works best for you.
     */
    syntheticsConfiguration.disableStepScreenshots();
    syntheticsConfiguration.setConfig({
       continueOnStepFailure: true,
       includeRequestHeaders: true, // Enable if headers should be displayed in HAR
       includeResponseHeaders: true, // Enable if headers should be displayed in HAR
       restrictedHeaders: [], // Value of these headers will be redacted from logs and reports
       restrictedUrlParameters: [] // Values of these url parameters will be redacted from logs and reports

    });

    let page = await synthetics.getPage();

    for (const url of urls) {
        await loadUrl(page, url, takeScreenshot);
    }
};

// Reset the page in-between
const resetPage = async function(page) {
    try {
        await page.goto('about:blank',{waitUntil: ['load', 'networkidle0'], timeout: 30000} );
    } catch (e) {
        synthetics.addExecutionError('Unable to open a blank page. ', e);
    }
}

const loadUrl = async function (page, url, takeScreenshot) {
    let stepName = null;
    let domcontentloaded = false;

    try {
        stepName = new URL(url).hostname;
    } catch (e) {
        const errorString = `Error parsing url: ${url}. ${e}`;
        log.error(errorString);
        /* If we fail to parse the URL, don't emit a metric with a stepName based on it.
           It may not be a legal CloudWatch metric dimension name and we may not have an alarms
           setup on the malformed URL stepName.  Instead, fail this step which will
           show up in the logs and will fail the overall canary and alarm on the overall canary
           success rate.
        */
        throw e;
    }

    await synthetics.executeStep(stepName, async function () {
        const sanitizedUrl = syntheticsLogHelper.getSanitizedUrl(url);

        /* You can customize the wait condition here. For instance, using 'networkidle2' or 'networkidle0' to load page completely.
           networkidle0: Navigation is successful when the page has had no network requests for half a second. This might never happen if page is constantly loading multiple resources.
           networkidle2: Navigation is successful when the page has no more then 2 network requests for half a second.
           domcontentloaded: It's fired as soon as the page DOM has been loaded, without waiting for resources to finish loading. If needed add explicit wait with await new Promise(r => setTimeout(r, milliseconds))
        */
        const response = await page.goto(url, { waitUntil: ['domcontentloaded'], timeout: 30000});
        if (response) {
            domcontentloaded = true;
            const status = response.status();
            const statusText = response.statusText();

            logResponseString = `Response from url: ${sanitizedUrl}  Status: ${status}  Status Text: ${statusText}`;

            //If the response status code is not a 2xx success code
            if (response.status() < 200 || response.status() > 299) {
                throw new Error(`Failed to load url: ${sanitizedUrl} ${response.status()} ${response.statusText()}`);
            }
        } else {
            const logNoResponseString = `No response returned for url: ${sanitizedUrl}`;
            log.error(logNoResponseString);
            throw new Error(logNoResponseString);
        }
    });

    // Wait for 15 seconds to let page load fully before taking screenshot.
    if (domcontentloaded && takeScreenshot) {
        await new Promise(r => setTimeout(r, 15000));
        await synthetics.takeScreenshot(stepName, 'loaded');
        await resetPage(page);
    }
};

const urls = [];

exports.handler = async () => {
    return await loadBlueprint();
};```
asked 10 months ago617 views
2 Answers
1
Accepted Answer

I solved my issue by making the following changes to the Heartbeat script. It adds a "requestfailed" handler and waits for networkidle0.

const loadUrl = async function (page, url, takeScreenshot) {
    let stepName = null;
    let domcontentloaded = false;

    page.on('requestfailed', request => {
        let requestFailedError = `url: ${request.url()}, errText: ${request.failure().errorText}, method: ${request.method()}`
        log.error(requestFailedError);
        
        synthetics.addExecutionError(requestFailedError, null);
    });

    try {
        stepName = new URL(url).hostname;
    } catch (e) {
        const errorString = `Error parsing url: ${url}. ${e}`;
        log.error(errorString);
        /* If we fail to parse the URL, don't emit a metric with a stepName based on it.
           It may not be a legal CloudWatch metric dimension name and we may not have an alarms
           setup on the malformed URL stepName.  Instead, fail this step which will
           show up in the logs and will fail the overall canary and alarm on the overall canary
           success rate.
        */
        throw e;
    }

    await synthetics.executeStep(stepName, async function () {
        const sanitizedUrl = syntheticsLogHelper.getSanitizedUrl(url);

        /* You can customize the wait condition here. For instance, using 'networkidle2' or 'networkidle0' to load page completely.
           networkidle0: Navigation is successful when the page has had no network requests for half a second. This might never happen if page is constantly loading multiple resources.
           networkidle2: Navigation is successful when the page has no more then 2 network requests for half a second.
           domcontentloaded: It's fired as soon as the page DOM has been loaded, without waiting for resources to finish loading. If needed add explicit wait with await new Promise(r => setTimeout(r, milliseconds))
        */
       const response = await page.goto(url, { waitUntil: ['networkidle0'], timeout: 30000});
answered 9 months ago
0

The code for a heartbeat blueprint canary is to check the existence of a single webpage, and if the webpage pretty much responds with a 200 OK or not.

If the frontend webservers are down, we'll be able to see that the canary will show it as Failed. However, if a component of the webpage is unavailable, the canary will not mark it as a Failure.

I'd recommend that you try other blueprints for your purposes. In this case, it looks like you want to know whenever the website doesn't look the same exact way it did the last time. This is possible using the Visual monitoring blueprint as it can allow you to compare screenshots of the website and use a baseline screenshot for comparison.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries_Blueprints.html#CloudWatch_Synthetics_Canaries_Blueprints_VisualTesting

AWS
SUPPORT ENGINEER
Dishant
answered 9 months ago
  • Thanks. I was hoping there would be a way to add it via scripting to just fail the heartbeat if any of the out-of-band HTTP requests fail, as its clear the runtime knows about it and is logging it.

    A visual comparison should work but could possibly get tripped up if the vendor changes their API definition.

    Its probably a weird edge case as I'm still not sure why our vendor's swagger UI page will load while it can't reach its own data file and the API isn't responding, usually as you had said, its either up or down and heartbeat has been great for that.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions