Why do I get a "Column not found" error when I use the Athena DynamoDB connector?

5 minute read
1

When I use the Amazon Athena DynamoDB connector, I get a “Column not found” error.

Short description

The "Column not found" error occurs because of the limited capability of Athena DynamoDB connector's built-in schema inference. The Athena DynamoDB connector uses an AWS Lambda function to infer the table schema. Then, it analyzes only a sample of the data in the table. Because the function analyzes only a sample of the data, the schema can miss data in columns that aren't part of the sample data. Incorrectly formatted data on the Amazon DynamoDB table can also cause the "Column not found" error. The DynamoDB table can't include camel case, capital letters, or data types that Athena doesn't support.

Note: AWS Glue and Athena can't read camel case, capital letters, or special characters other than the underscore.

Resolution

Complete the following steps to use AWS Glue to infer the schema from the DynamoDB table:

  1. Use an AWS Glue crawler to create an AWS Glue table.
  2. Review the DynamoDB table schema to make sure that the crawler correctly discovered the data
  3. Use the columnMapping parameter to prevent missing data columns.

Note: Database names, table names, and column names can't be longer than 255 characters. Acceptable characters include lowercase letters, numbers, and the underscore character.

Use an AWS Glue crawler to create an AWS Glue table

  1. Open the AWS Glue console.
  2. In the navigation pane, under Data catalog, choose Crawlers.
  3. On the Crawlers screen, choose Create crawler.
  4. On the Set crawler properties page, under Crawler details, enter a name for the crawler. Then, choose Next.
  5. For Data source configuration, choose Not yet. Then, choose Add a data source.
  6. On the Add data source dialog, enter the following information:
    For Data source, choose DynamoDB.
    For Table name, enter the name of your DynamoDB table.
    Choose Add a DynamoDB data source. Then, choose Next.
  7. On the Configure security settings page, under IAM role, choose an existing IAM role. Or, choose Create new IAM role. Then, choose Next.
    Note: The IAM role must have the required permissions.
  8. On the Set output and scheduling page, under Output configuration, choose Add database. A new tab opens to the Create a database page.
  9. On the Create a database page, enter the following information:
    For Name, enter a name for your database.
    For Location, set the value to dynamo-db-flag. Then, choose Create database. Return to the Set output and scheduling tab.
    Note: You must use the same value as the dynamo-db-flag for Athena to recognize the table.
  10. On the Set output and scheduling page, for Target database, select the database you created. Then, choose Next.
  11. Choose Create crawler.

Review the table schema

  1. Open the AWS Glue console.
  2. In the navigation pane, under Data catalog, choose Databases.
  3. Choose the database that you created in the previous step.
  4. Under Tables, choose the table that you created in the previous step.
  5. Review the column to make sure columns and data types are discovered correctly.

Configure the columnMapping parameter

Note: Column mapping applies to only top level column names, and doesn't apply to nested fields.

  1. Open the AWS Glue console.
  2. In the navigation pane, under Data catalog, choose Tables.
  3. On the Tables page, choose the table that you created in the preceding step.
  4. Choose Actions, and then choose Edit table.
  5. On the Edit table page, under Table properties, choose Add, and then enter the following information:
    In the Key field, enter ColumnMapping.
    In the Value field, enter the column names with the following format: mapping value=column name.
    For example, suppose your table contains the following columns: firstName, Lastname, Age, and region. For this example, the value of ColumnMapping is:
    firstname=firstName
    lastname=Lastname
    age=Age
    region=region
  6. Open the Amazon Athena console.
  7. On the Query editor page, for Data, enter the following information:
    For Data source, enter DynamoDB.
    For Database, choose the database that you created in the previous step.

Generic user error

You might receive a "GENERIC_USER_ERROR: Encountered an exception" error. This occurs if the data in the columns of your DynamoDB table aren't compatible with Athena. To resolve the error, change the data in your table to a type that's compatible with Athena.

To change the data type in your table, complete the following steps:

  1. Open the AWS Glue console.
  2. In the navigation pane, under Data Catalog, choose Tables.
  3. On the Tables page, choose the table that you want to update.
  4. Under Schema, choose Edit schema.
  5. Select the column with the incompatible data, and then choose Edit.
  6. For Data type, select a data type that's compatible with Athena. Then, choose Save.
  7. Choose Save as new table version.
  8. Open the Athena console.
  9. On the Query editor, run the query again.

Related information

Best practices when using Athena with AWS Glue

Setting up databases and tables in AWS Glue

AWS OFFICIAL
AWS OFFICIALUpdated 7 months ago
3 Comments

Note that the property is columnMapping with a lower case c, and that several column mappings are separated by comme. See https://docs.aws.amazon.com/athena/latest/ug/connectors-dynamodb.html#connectors-dynamodb-parameters

Andreax
replied 3 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 3 months ago

It would be much better and time saving if we had a parameter on the lambda about the sample size. That answer feels like "to fix the problem, stop using the DynamoDB connector and make everything by hand - with duplicated databases and data sources". Not very satisfying, especially for teams used to managed services and without any data specialist onboard.

Maxime
replied a month ago