How can I resolve the "Column not found" error with the Athena DynamoDB connector?

Lecture de 5 minute(s)
1

When I use the Amazon Athena DynamoDB connector, I get a "Column not found" error.

Short description

The "Column not found" error occurs because of the limited capability of the Athena DynamoDB connector's built-in schema inference. The Athena DynamoDB connector uses an AWS Lambda function to infer the table schema. Then, it analyzes only a sample of the data in the table.

Because the function analyzes only a sample of the data, the schema can miss data in columns that aren't part of the sample data. Incorrectly formatted data on the Amazon DynamoDB table can also cause the "Column not found" error.

Important:

  • The DynamoDB table can't include camel case, capital letters, or data types that Athena doesn't support.
  • AWS Glue and Athena can't read camel case, capital letters, or special characters other than the underscore.

Resolution

To use AWS Glue to infer the schema from the DynamoDB table, complete the following steps:

  1. Use an AWS Glue crawler to create an AWS Glue table.
  2. Review the DynamoDB table schema to make sure that the crawler correctly discovered the data.
  3. Use the columnMapping property to prevent missing data columns.

Note: Database names, table names, and column names can't be longer than 255 characters. Acceptable characters include lowercase letters, numbers, and the underscore character.

Use an AWS Glue crawler to create an AWS Glue table

Complete the following steps:

  1. Open the AWS Glue console.
  2. In the navigation pane, under Data catalog, choose Crawlers.
  3. On the Crawlers screen, choose Create crawler.
  4. On the Set crawler properties page, under Crawler details, enter a name for the crawler. Then, choose Next.
  5. For Data source configuration, choose Not yet. Then, choose Add a data source.
  6. On the Add data source dialog, enter the following information:
    For Data source, choose DynamoDB.
    For Table name, enter the name of your DynamoDB table.
    Choose Add a DynamoDB data source. Then, choose Next.
  7. On the Configure security settings page, under IAM role, choose an existing IAM role. Or, choose Create new IAM role. Then, choose Next.
    Note: The IAM role must have the required permissions.
  8. On the Set output and scheduling page, under Output configuration, choose Add database. A new tab opens to the Create a database page.
  9. On the Create a database page, enter the following information:
    For Name, enter a name for your database.
    For Location, set the value to dynamo-db-flag. Then, choose Create database. Return to the Set output and scheduling tab.
    Note: For Athena to recognize the table, you must use the same value as the dynamo-db-flag.
  10. On the Set output and scheduling page, for Target database, select your database. Then, choose Next.
  11. Choose Create crawler.
  12. On the crawler page, choose run crawler.

Review the table schema

Complete the following steps:

  1. Open the AWS Glue console.
  2. In the navigation pane, under Data catalog, choose Databases.
  3. Select your database.
  4. Under Tables, select your table.
  5. Review the column to make sure that columns and data types are discovered correctly.

Configure the columnMapping property

Note: Column mapping parameters apply only to top-level column names. They don't apply to nested fields.

Complete the following steps:

  1. Open the AWS Glue console.
  2. In the navigation pane, under Data catalog, choose Tables.
  3. On the Tables page, select your table.
  4. Choose Actions, and then choose Edit table.
  5. On the Edit table page, under Table properties, choose Add, and then enter the following information:
    In the Key field, enter columnMapping.
    In the Value field, enter the column names with the following format: mapping value=column name.
    For example, your table contains the following columns: firstName, Lastname, Age, and region. Use the following columnMapping property value format: 
    firstname=firstName
    lastname=Lastname
    age=Age
    region=region
  6. Open the Amazon Athena console.
  7. On the Query editor page, for Data, enter the following information:
    For Data source, enter DynamoDB.
    For Database, choose your database.

Generic user error

You might receive a "GENERIC_USER_ERROR: Encountered an exception" error. This error can occur when the data in the columns of your DynamoDB table aren't compatible with Athena. To resolve the issue, change the data in your table to a type that's compatible with Athena.

To change the data type in your table, complete the following steps:

  1. Open the AWS Glue console.
  2. In the navigation pane, under Data Catalog, choose Tables.
  3. On the Tables page, choose the table that you want to update.
  4. Under Schema, choose Edit schema.
  5. Select the column with the incompatible data, and then choose Edit.
  6. For Data type, select a data type that's compatible with Athena. Then, choose Save.
  7. Choose Save as new table version.
  8. Open the Athena console.
  9. On the Query editor, run the query again.

Related information

Best practices when using Athena with AWS Glue

AWS OFFICIEL
AWS OFFICIELA mis à jour il y a 2 mois
3 Commentaires

Note that the property is columnMapping with a lower case c, and that several column mappings are separated by comme. See https://docs.aws.amazon.com/athena/latest/ug/connectors-dynamodb.html#connectors-dynamodb-parameters

Andreax
a répondu il y a 5 mois

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODÉRATEUR
a répondu il y a 5 mois

It would be much better and time saving if we had a parameter on the lambda about the sample size. That answer feels like "to fix the problem, stop using the DynamoDB connector and make everything by hand - with duplicated databases and data sources". Not very satisfying, especially for teams used to managed services and without any data specialist onboard.

Maxime
a répondu il y a 3 mois