Skip to content

AWS Glue Job Error

0

Im trying to convert CSV files in S3 to Parquet in another S3 bucket. So first I read the CSV files using a crawler, load the data into a Table, and then use a Job to convert from the Table to S3 in Parquet format.

Problem is the job is failing with the following error:

Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o106.pyWriteDynamicFrame. Inconsistent data type results in choice type: {"dataType":"struct","fields":[{"name":"id_beneficiario","container":{"dataType":"string","properties":{}},"properties":{}},{"name":"tipo_costo","container":{"dataType":"string","properties":{}},"properties":{}},{"name":"tipo_costo2","container":{"dataType":"string","properties":{}},"properties":{}},{"name":"numero_documento","container":{"dataType":"string","properties":{}},"properties":{}},{"name":"estado_documento","container":{"dataType":"string","properties":{}},"properties":{}},{"name":"codigo_cie10","container":{"dataType":"long","properties":{}},"properties":{}},{"name":"grupo_prestadores","container":{"dataType":"string","properties":{}},"properties":{}},{"name":"rutdv_prestador_profesional","container":{"dataType":"string","properties":{}},"properties":{}},{"name":"rutdv_prestador_institucion","container":{"dataType":"string","properties":{}},"properties":{}},{

What does it means? How can I fix it?

1 Answer
1

Hi ignacio,

The error "Inconsistent data type results in choice type" typically occurs when the data types of the columns in the input data (CSV files) are not consistent across all rows. This can happen when some rows have a different data type for a particular column compared to other rows.

To fix this issue, you could try the following steps:

  • Analyze the input data: Inspect the CSV files to identify the columns that have inconsistent data types. You can use tools like pandas or AWS Glue's built-in data preview feature to analyze the data.

  • Data cleaning and transformation: Depending on the nature of the inconsistency, you may need to perform data cleaning and transformation steps. For example, if a column is supposed to be numeric but contains some non-numeric values, you can either remove those rows or replace the non-numeric values with a default value (e.g., null or 0).

  • Define the schema: After cleaning the data, define the schema for your Glue table explicitly. This will ensure that Glue interprets the data types correctly and consistently across all rows.

  • Use the defined schema in the Glue job: When creating the Glue job to convert CSV to Parquet, specify the defined schema to ensure that the data types are correctly interpreted and converted.

You can also find further resources following this link.

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.