How to convert Database Character Sets to utf8mb4 in AWS RDS/Aurora MySQL

5 minute read
Content level: Intermediate
0

Many customers need to convert their database character sets to utf8mb4, either due to application requirements or when upgrading to MySQL 8. This article covers the detail conversion process for RDS MySQL, Aurora MySQL, and RDS MariaDB.

Many customers need to convert their database character sets to utf8mb4, either due to application requirements or when upgrading to MySQL 8. This article covers the detail conversion process for RDS MySQL, Aurora MySQL, and RDS MariaDB.

Small-Scale Manual Conversion

For databases with few tables and minimal data, you can use manual conversion methods.

  1. Check Current Character Settings

For databases:

SELECT SCHEMA_NAME 'database', 
       default_character_set_name 'character_set', 
       DEFAULT_COLLATION_NAME 'collation' 
FROM information_schema.SCHEMATA;

For tables:

SELECT DISTINCT TABLE_SCHEMA, TABLE_NAME, character_set_name 
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
  1. Modify Character Settings

For databases:

ALTER DATABASE <database_name> 
CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

For tables:

ALTER TABLE <table_name> 
CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
  1. Modify the following character set parameters in the custom parameter group:
character_set_client = utf8mb4
character_set_connection = utf8mb4
character_set_database = utf8mb4
character_set_filesystem = utf8mb4
character_set_results = utf8mb4
character_set_server = utf8mb4
collation_connection = utf8mb4_general_ci
collation_server = utf8mb4_general_ci
  1. All objects (triggers, view, events, routines ) also need to be converted to the new character set.
SELECT * FROM information_schema.triggers 
WHERE (collation_connection != 'utf8mb4_general_ci' OR character_set_client != 'utf8mb4') 
AND  TRIGGER_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
SELECT * FROM information_schema.views 
WHERE (collation_connection != 'utf8mb4_general_ci' OR character_set_client != 'utf8mb4') 
AND  table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
SELECT * FROM information_schema.events 
WHERE (collation_connection != 'utf8mb4_general_ci' OR character_set_client != 'utf8mb4') 
AND  EVENT_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
SELECT * FROM information_schema.routines 
WHERE (collation_connection != 'utf8mb4_general_ci' OR character_set_client != 'utf8mb4') 
AND  ROUTINE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');

Large-Scale Conversion

For environments with numerous tables and databases, follow these steps:

Note: In (Blue/Green Deployment), set binlog_format = ROW on source parameter group.

  1. Generate Conversion Scripts

Create convert_db_ddl.sql:

SELECT CONCAT('ALTER DATABASE ', SCHEMA_NAME, 
       ' CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;') AS sql_script 
FROM information_schema.SCHEMATA 
WHERE default_character_set_name != 'utf8mb4' 
AND SCHEMA_NAME NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');

Create convert_table_ddl.sql:

SELECT DISTINCT CONCAT('ALTER TABLE ', table_schema, '.', table_name,
       ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;') AS sql_script 
FROM information_schema.COLUMNS 
WHERE character_set_name != 'utf8mb4' 
AND TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
  1. Generate SQL Files
mysql -h [green-rds-host] -u [admin] -p -sN < convert_db_ddl.sql > convert_db_utf8mb4.sql
mysql -h [green-rds-host] -u [admin] -p -sN < convert_table_ddl.sql > convert_table_utf8mb4.sql
  1. Execute Conversion
mysql -h [green-rds-host] -u [admin] -p -s < convert_db_utf8mb4.sql
mysql -h [green-rds-host] -u [admin] -p -s < convert_table_utf8mb4.sql

Note: For MySQL 8, the default collation is utf8mb4_0900_ai_ci. To use a different collation, set character_set_server = utf8mb4 and configure collation_connection, collation_server in your parameter group.

  1. Modify the character set parameters in green host's custom parameter group.

  2. Identify a maintenance window to switchover the DB service.

Note: As we are suggesting blue/green deployment, while the character sets might be compatible, differences in collations between source and replica can lead to sorting discrepancies or unexpected query results.

Troubleshooting Common Issues

Row Size Error

ERROR 1118 (42000): Row size too large (> 8126). Changing some columns to TEXT or BLOB may help.

Solutions:

  1. Convert Large Columns to BLOB/TEXT
ALTER TABLE t MODIFY latin1_varchar_col TEXT CHARACTER SET utf8mb4;

Note: For a column that has a data type of VARCHAR or one of the TEXT types, CONVERT TO CHARACTER SET changes the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535 (TEXT: max 64 KB, MEDIUMTEXT: max 16 MB, LONGTEXT: max 4 GB, JSON: alias for LONGTEXT) .

  1. Reduce Column Sizes
ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8mb4;
  1. Remove unnecessary index

  2. Refactoring the Table into Multiple Tables

  3. Disable InnoDB Strict Mode

    Set innodb_strict_mode = 0 in parameter group.

Note: This is an unsafe workaround. When disable InnoDB strict mode, the transition from 3-byte (UTF-8) to 4-byte (UTF-8MB4) encoding may lead data truncation. Even though InnoDB is allowing the table to be modified, there is still an opportunity for errors. Regardless of whether InnoDB strict mode is enabled, if a DML statement is executed that attempts to write a row that the table's InnoDB row format can't store, then InnoDB will raise an error with this message. This creates a somewhat unsafe situation, because it means that the application has the chance to encounter an additional error while executing DML. [1]

Identifying Problem Tables with Row size too large

Use MySQL Shell to check for issues:

  1. Install MySQL Shell on EC2 Linux [2]:

sudo yum install mysql-shell

  1. Run upgrade check [3]:

mysqlsh -- util check-for-server-upgrade { --user=admin --host=example-green-rds-host }

Sample output:

1) Issues reported by 'check table x for upgrade' command
  Notice  : Table (testdb.20columnstable) - Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC
    or ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768 bytes is stored inline.
  Notice  : Table (testdb.30columnstable) - Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or
    ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768 bytes is stored inline.

Reference:

[1] https://mariadb.com/kb/en/troubleshooting-row-size-too-large-errors-with-innodb/

[2] https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-install.html

[3] https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-upgrade.html

AWS
EXPERT
published a month ago257 views