Complete a 3 Question Survey and Earn a re:Post Badge
Help improve AWS Support Official channel in re:Post and share your experience - complete a quick three-question survey to earn a re:Post badge!
How to convert Database Character Sets to utf8mb4 in AWS RDS/Aurora MySQL
Many customers need to convert their database character sets to utf8mb4, either due to application requirements or when upgrading to MySQL 8. This article covers the detail conversion process for RDS MySQL, Aurora MySQL, and RDS MariaDB.
Many customers need to convert their database character sets to utf8mb4, either due to application requirements or when upgrading to MySQL 8. This article covers the detail conversion process for RDS MySQL, Aurora MySQL, and RDS MariaDB.
Small-Scale Manual Conversion
For databases with few tables and minimal data, you can use manual conversion methods.
- Check Current Character Settings
For databases:
SELECT SCHEMA_NAME 'database',
default_character_set_name 'character_set',
DEFAULT_COLLATION_NAME 'collation'
FROM information_schema.SCHEMATA;
For tables:
SELECT DISTINCT TABLE_SCHEMA, TABLE_NAME, character_set_name
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
- Modify Character Settings
For databases:
ALTER DATABASE <database_name>
CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
For tables:
ALTER TABLE <table_name>
CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
- Modify the following character set parameters in the custom parameter group:
character_set_client = utf8mb4
character_set_connection = utf8mb4
character_set_database = utf8mb4
character_set_filesystem = utf8mb4
character_set_results = utf8mb4
character_set_server = utf8mb4
collation_connection = utf8mb4_general_ci
collation_server = utf8mb4_general_ci
- All objects (triggers, view, events, routines ) also need to be converted to the new character set.
SELECT * FROM information_schema.triggers
WHERE (collation_connection != 'utf8mb4_general_ci' OR character_set_client != 'utf8mb4')
AND TRIGGER_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
SELECT * FROM information_schema.views
WHERE (collation_connection != 'utf8mb4_general_ci' OR character_set_client != 'utf8mb4')
AND table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
SELECT * FROM information_schema.events
WHERE (collation_connection != 'utf8mb4_general_ci' OR character_set_client != 'utf8mb4')
AND EVENT_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
SELECT * FROM information_schema.routines
WHERE (collation_connection != 'utf8mb4_general_ci' OR character_set_client != 'utf8mb4')
AND ROUTINE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
Large-Scale Conversion
For environments with numerous tables and databases, follow these steps:
Note: In (Blue/Green Deployment), set binlog_format = ROW on source parameter group.
- Generate Conversion Scripts
Create convert_db_ddl.sql:
SELECT CONCAT('ALTER DATABASE ', SCHEMA_NAME,
' CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;') AS sql_script
FROM information_schema.SCHEMATA
WHERE default_character_set_name != 'utf8mb4'
AND SCHEMA_NAME NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
Create convert_table_ddl.sql:
SELECT DISTINCT CONCAT('ALTER TABLE ', table_schema, '.', table_name,
' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;') AS sql_script
FROM information_schema.COLUMNS
WHERE character_set_name != 'utf8mb4'
AND TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');
- Generate SQL Files
mysql -h [green-rds-host] -u [admin] -p -sN < convert_db_ddl.sql > convert_db_utf8mb4.sql
mysql -h [green-rds-host] -u [admin] -p -sN < convert_table_ddl.sql > convert_table_utf8mb4.sql
- Execute Conversion
mysql -h [green-rds-host] -u [admin] -p -s < convert_db_utf8mb4.sql
mysql -h [green-rds-host] -u [admin] -p -s < convert_table_utf8mb4.sql
Note: For MySQL 8, the default collation is utf8mb4_0900_ai_ci. To use a different collation, set character_set_server = utf8mb4 and configure collation_connection, collation_server in your parameter group.
-
Modify the character set parameters in green host's custom parameter group.
-
Identify a maintenance window to switchover the DB service.
Note: As we are suggesting blue/green deployment, while the character sets might be compatible, differences in collations between source and replica can lead to sorting discrepancies or unexpected query results.
Troubleshooting Common Issues
Row Size Error
ERROR 1118 (42000): Row size too large (> 8126). Changing some columns to TEXT or BLOB may help.
Solutions:
- Convert Large Columns to BLOB/TEXT
ALTER TABLE t MODIFY latin1_varchar_col TEXT CHARACTER SET utf8mb4;
Note: For a column that has a data type of VARCHAR or one of the TEXT types, CONVERT TO CHARACTER SET changes the data type as necessary to ensure that the new column is long enough to store as many characters as the original column. For example, a TEXT column has two length bytes, which store the byte-length of values in the column, up to a maximum of 65,535 (TEXT: max 64 KB, MEDIUMTEXT: max 16 MB, LONGTEXT: max 4 GB, JSON: alias for LONGTEXT) .
- Reduce Column Sizes
ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8mb4;
-
Remove unnecessary index
-
Refactoring the Table into Multiple Tables
-
Disable InnoDB Strict Mode
Set innodb_strict_mode = 0 in parameter group.
Note: This is an unsafe workaround. When disable InnoDB strict mode, the transition from 3-byte (UTF-8) to 4-byte (UTF-8MB4) encoding may lead data truncation. Even though InnoDB is allowing the table to be modified, there is still an opportunity for errors. Regardless of whether InnoDB strict mode is enabled, if a DML statement is executed that attempts to write a row that the table's InnoDB row format can't store, then InnoDB will raise an error with this message. This creates a somewhat unsafe situation, because it means that the application has the chance to encounter an additional error while executing DML. [1]
Identifying Problem Tables with Row size too large
Use MySQL Shell to check for issues:
- Install MySQL Shell on EC2 Linux [2]:
sudo yum install mysql-shell
- Run upgrade check [3]:
mysqlsh -- util check-for-server-upgrade { --user=admin --host=example-green-rds-host }
Sample output:
1) Issues reported by 'check table x for upgrade' command
Notice : Table (testdb.20columnstable) - Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC
or ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768 bytes is stored inline.
Notice : Table (testdb.30columnstable) - Row size too large (> 8126). Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or
ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768 bytes is stored inline.
Reference:
[1] https://mariadb.com/kb/en/troubleshooting-row-size-too-large-errors-with-innodb/
[2] https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-install.html
[3] https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-upgrade.html
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 10 months ago