Im trying to load a CSV file from Amazon S3 into an Amazon Aurora MySQL table using the LOAD DATA FROM S3 statement.

0

he CSV file is generated using Java, and it contains empty values (e.g., "" or just empty strings) for some columns that map to INT fields in the database.

When I run the LOAD DATA FROM S3 command, I get an error similar to:

Incorrect integer value: '' for column 'my_column' at row 1

What I tried: Using NULLIF(@col, '') or NULLIF(TRIM(@col), '') in the SET clause

Playing with different FIELDS TERMINATED BY and OPTIONALLY ENCLOSED BY options

Unfortunately, nothing worked consistently, especially when the CSV fields were empty quotes ("").

What worked: Finally, I replaced empty numeric values in the CSV with \N (double-escaped in Java as \N), and that worked! Aurora treated them as proper NULL values, and the data loaded successfully without any conversion error.

My questions: Is using \N the official and recommended way to represent NULL values when importing CSV files into Aurora MySQL?

Is this behavior documented anywhere by Amazon or MySQL?

Is there a way to make NULLIF() work with empty string columns from CSV reliably in this context?

Thanks in advance for any help or clarification!

asked a month ago96 views
2 Answers
0

You've encountered a common challenge when importing CSV data into Aurora MySQL, especially when dealing with empty values for numeric fields. Your solution of using '\N' to represent NULL values is indeed a good approach and is widely recognized in MySQL and Aurora MySQL.

To address your questions:

  1. Using '\N' is a standard and recommended way to represent NULL values when importing CSV files into Aurora MySQL. This convention is recognized by MySQL and Aurora MySQL as a NULL indicator, especially when using LOAD DATA operations.

  2. This behavior is documented in MySQL documentation, which Aurora MySQL follows closely. The '\N' (or '\N' when escaped) is treated as a NULL value by default during LOAD DATA operations.

  3. While NULLIF() can work in some contexts, it's not always reliable for LOAD DATA operations, especially with empty strings. The issue you encountered with empty quotes ("") is a known quirk in how LOAD DATA processes fields.

Your approach of replacing empty numeric values with '\N' in the CSV file before import is a robust solution. It ensures that Aurora MySQL correctly interprets these values as NULL without requiring additional conversion logic in the LOAD DATA statement.

If you need to handle this at the database level rather than modifying the source CSV, you can use a combination of the NULL AS and SET clauses in your LOAD DATA statement. For example:

LOAD DATA FROM S3 's3://your-bucket/your-file.csv'
INTO TABLE your_table
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(column1, @var1, column2, @var2)
SET 
  column1 = NULLIF(@var1, ''),
  column2 = NULLIF(@var2, '');

This approach allows you to specify how empty fields should be treated, potentially avoiding the need to modify the source CSV.

Remember that when using LOAD DATA FROM S3 in Aurora MySQL, you need to ensure that the necessary IAM permissions are set up correctly to allow access to the S3 bucket.

In summary, while using '\N' in your CSV is a reliable method, you have options to handle this at both the data preparation stage and during the import process itself. The choice depends on your specific workflow and requirements.
Sources
Import CSV file in S3 into Aurora MySql | AWS re:Post
Loading data into an Amazon Aurora MySQL DB cluster from text files in an Amazon S3 bucket - Amazon Aurora

profile picture
answered a month ago
0

Hi,

Thank you so much for asking your question.

  1. \N is indeed the official and recommended way to represent NULL values when importing CSV files into MySQL (and by extension, Aurora MySQL). The documentation explicitly states: "For the default FIELDS and LINES values, NULL is written as a field value of \N for output, and a field value of \N is read as NULL for input (assuming that the ESCAPED BY character is )."
  2. This behavior is documented in the MySQL documentation in the "Handling of NULL values" section: https://dev.mysql.com/doc/refman/8.0/en/load-data.html

I hope this might help.

AWS
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions