Skip to content

focus table (AWS Glue) used by Focus Dashboard has no data from source accounts

0

We have successfully deployed Foundational Cost Intelligence Dashboards (CID) and all the source accounts configured are appearing on the QuickSight dashboards. The deployment followed the instructions provided in the AWS CID workshop (https://catalog.workshops.aws/awscid/en-US/dashboards/foundational/cudos-cid-kpi/deploy) using CloudFormation Stack instructions.

Additionally, we deployed the Focus Dashboard (using AWS Focus Dashboard Guide https://catalog.workshops.aws/awscid/en-US/dashboards/additional/focus) via the command line: (cid-cmd deploy --dashboard-id focus-dashboard --focus_database_name cid_data_export --focus_table_name focus --athena_database_name cid_data_export --region sa-east-1). However, only a subset of the configured source accounts is appearing on the dashboard, even after 72 hours post-deployment.

Investigation findings:

  1. the base table (focus table) for the Focus Dashboard views / datasets contains only a subset of the expected source accounts, when compared with cur2 table used by Foundational dashboards
  2. The AWS Glue Crawler responsible for updating the focus table (cid-DataExportFOCUSCrawler)is running successfully, with no errors observed in CloudWatch logs.
  3. All source accounts are correctly replicating up-to-date records (in Parquet format) to the centralized S3 bucket in the Data Collection Account (at s3://${DestinationS3}/focus/). This data serves as input for the cid-DataExportFOCUSCrawler.
  4. CloudWatch logs confirm that the crawler is processing each account folder under s3://${DestinationS3}/focus/.

Troubleshooting Steps Taken

To address the issue, we removed the Focus Dashboard management by updating the CID-DataExports-Destination stack with Focus = No and then redeployed it with Focus = Yes. This process recreated the following resources:

  • FOCUSCrawler
  • FOCUSTable
  • LocalFOCUSviaCustomResource

Following this redeployment:

  • The FOCUSCrawler has been running as scheduled with no errors reported.
  • The Crawler Run Details confirm that individual source account tables are being updated. However, the focus table remains empty, with no data ingested even after waiting several days

We are using the default AWS CloudFormation templates provided in the CID Dashboard documentation, with no modifications.

At this point, we are unsure where else to check. Any insights or recommendations would be greatly appreciated.

2 Answers
1

Based on the information provided, it appears that you're experiencing an issue with the Focus Dashboard where the focus table in AWS Glue is not being populated with data from all source accounts, despite the crawler running successfully and processing the account folders.

This situation is similar to issues that have been encountered with other CID (Cloud Intelligence Dashboards) deployments. The root cause might be related to how the Glue crawler is interpreting the schema of the data in the S3 bucket.

Here are some potential steps to troubleshoot and resolve the issue:

  1. Check the Glue crawler configuration: Ensure that the crawler is set to recognize new folders as partitions rather than new tables. You may need to set the table level to 2 in the crawler configuration, which would force it to recognize newer folders as partitions instead of tables.

  2. Examine the schema differences: Compare the schema of the existing focus table with the schema of the new data being processed. If there are significant differences, the crawler might be trying to create new tables instead of updating the existing one.

  3. Recreate the crawler: If the crawler has already established metadata for some account folders as separate tables, it may be necessary to delete and recreate the crawler to remove these stale associations.

  4. Use the CID update command: Try running the command "cid-cmd update --force --recursive" in AWS CloudShell. This will update the dashboard to the latest version and recreate underlying resources, including the Glue crawler. Be cautious with this approach if you have made any customizations to Athena views or QuickSight datasets.

  5. Check IAM permissions: Ensure that the IAM role used by the Glue crawler has sufficient permissions to perform all necessary actions on the focus table and related resources.

  6. Verify table metadata: Check if the focus table has all required metadata attributes, particularly the TableType attribute. If it's missing, you may need to update the table definition to include it.

If these steps don't resolve the issue, you may need to engage AWS support for further assistance, as there could be underlying complexities specific to your deployment that require more in-depth investigation.
Sources
Backfilled CUR data and Additional Orgs not being displayed | AWS re:Post
Table incompatible when using AWS Glue with Athena in Amazon QuickSight - Amazon QuickSight
AWS_FEEDS TABLE_NOT_FOUND discrepancy | AWS re:Post

answered 9 months ago
EXPERT
reviewed 9 months ago
0

For the items 2 to 6, it is all clear.

**for the item 1: **

  • I set table level to 2 at Advanced Options of "Set output and scheduling" while editing the Focus Crawler.
  • I also left only a subset of accounts at s3://${DestinationS3}/focus/ (and backed up all data from other source accounts). Then I successfully ran Focus crawler, by producing data at focus table.
  • I started adding one by one of the remaining account source data. For some of them, the crawler again separate output into tables (instead of partitions on focus table). When I remove the one that caused the issue, it works successfully again. I compared source table schema (parquet file) for the "bad" sources and "good" sources. They are the same!

What else should I check on the source account data (parquet files) so that I could prevent Glue issue of splitting output into multiple tables ?

I also noticed that some of the accounts that were not being processed by the crawler had lowercase folder names on the S3 data replicated from source accounts (billing_period=2025-02/ instead of BILLING_PERIOD=2025-02/). However all the source accounts were configured to send FOCUS data to Data Collection account through the same Cloudformation stack!! (https://catalog.workshops.aws/awscid/en-US/dashboards/foundational/cudos-cid-kpi/deploy#step-2.-in-managementpayersource-account-create-cur-2.0-and-replication)

That generated Crawler error "Folder partition keys do not match table partition keys, skipped folder"

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.