What is UTF8MB4 in MySQL

What is the difference between the utf8mb4 and utf8 character sets in MySQL?


What is the difference between and character sets in MySQL ?

I already know ASCII- , UTF-8 , UTF-16 and UTF-32 Encodings. However, I'm curious to see what the difference between coding groups and others in MySQL server defined coding types.

Are there any special benefits / suggestions for using it instead?

Reply:


UTF-8 is a variable length encoding. In the case of UTF-8, this means that it takes one to four bytes to store a code point. The MySQL encoding "utf8" (alias "utf8mb3") only saves a maximum of three bytes per code point.

The "utf8" / "utf8mb3" character set cannot therefore store all Unicode code points: It only supports the range 0x000 to 0xFFFF, which is referred to as the "Basic Multilingual Plane". See also Comparison of Unicode Encodings.

This is what the MySQL documentation has to say about this (an earlier version of the same page below):

The character set utf8 [/ utf8mb3] uses a maximum of three bytes per character and only contains BMP characters. As of MySQL 5.5.3, the character set utf8mb4 uses a maximum of four bytes per character and supports additional characters:

  • For a BMP character, utf8 [/ utf8mb3] and utf8mb4 have identical storage properties: same code values, same coding, same length.

  • For an additional character can utf8 [/ utf8mb3] does not save the character at all while utf8mb4 takes four bytes to store it. Since utf8 [/ utf8mb3] cannot store the character at all, you don't have any extra characters in the utf8 [/ utf8mb3] columns and you don't have to worry about converting characters or losing data when using utf8 [/ utf8mb3] - Update data from older versions of MySQL.

So if you want your column to support storing characters that are outside of the BMP (and you usually want to) e.g. B. Emoji, use "utf8mb4". See Also What Are the Most Commonly Used Non-BMP Unicode Characters? .








Taken from the MySQL 8.0 reference manual:

  • : One UTF-8 Coding of the Unicode Character set with one to four bytes per character.

  • : One UTF-8 Coding of the Unicode Character set with one to three bytes per character.

In MySQL is currently an alias for which is outdated and will be removed in a future MySQL - release. At this point becomes a reference to .

Regardless of this alias, you can therefore consciously define a coding.

To complete the answer, would like I the Comment from @ WilliamEntriken add below (also taken from the manual):

To avoid confusion about the meaning of, use character set references instead of explicitly.

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.