首页 > 数据库 >What is the difference between utf8mb4 and utf8 charsets in MySQL?

What is the difference between utf8mb4 and utf8 charsets in MySQL?

时间:2023-03-10 20:44:32浏览次数:62  
标签:What set UTF utf8mb4 utf8 character MySQL

What is the difference between utf8mb4 and utf8 charsets in MySQL?

回答1

UTF-8 is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL's encoding called "utf8" (alias of "utf8mb3") only stores a maximum of three bytes per code point.

So the character set "utf8"/"utf8mb3" cannot store all Unicode code points: it only supports the range 0x000 to 0xFFFF, which is called the "Basic Multilingual Plane". See also Comparison of Unicode encodings.

This is what (a previous version of the same page at) the MySQL documentation has to say about it:

The character set named utf8[/utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:

  • For a BMP character, utf8[/utf8mb3] and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.

  • For a supplementary character, utf8[/utf8mb3] cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8[/utf8mb3] cannot store the character at all, you do not have any supplementary characters in utf8[/utf8mb3] columns and you need not worry about converting characters or losing data when upgrading utf8[/utf8mb3] data from older versions of MySQL.

So if you want your column to support storing characters lying outside the BMP (and you usually want to), such as emoji, use "utf8mb4". See also What are the most common non-BMP Unicode characters in actual use?.

 

回答2

  • utf8 is MySQL's older, flawed implementation of UTF-8 which is in the process of being deprecated.
  • utf8mb4 is what they named their fixed UTF-8 implementation, and is what you should use right now.

In their flawed有缺陷的 version, only characters in the first 64k character plane - the basic multilingual plane - work, with other characters considered invalid. The code point values within that plane - 0 to 65535 (some of which are reserved for special reasons) can be represented by multi-byte encodings in UTF-8 of up to 3 bytes, and MySQL's early version of UTF-8 arbitrarily decided to set that as a limit. At no point was this limitation a correct interpretation of the UTF-8 rules, because at no point was UTF-8 defined as only allowing up to 3 bytes per character. In fact, the earliest definitions of UTF-8 defined it as having up to 6 bytes (since revised to 4). MySQL's original version was always arbitrarily crippled.

Back when MySQL released this, the consequences of this limitation weren't too bad as most Unicode characters were in that first plane. Since then, more and more newly defined character ranges have been added to Unicode with values outside that first plane. Unicode itself defines 17 planes, though so far only 7 of these are used.

In an effort not to break old code making any particular assumptions, MySQL retained the broken implementation and called the newer, fixed version utf8mb4. This has led to some confusion with the name being misinterpreted as if it's some kind of extension to UTF-8 or alternative form of UTF-8, rather than MySQL's implementation of the true UTF-8.

Future versions of MySQL will eventually phase out the older version, and for now it can be considered deprecated. For the foreseeable future you need to use utf8mb4 to ensure correct UTF-8 encoding. After sufficient time has passed, the current utf8 will be removed, and at some future date utf8 will rise again, this time referring to the fixed version, though utf8mb4 will continue to unambiguously refer to the fixed version.

 

回答3

aken from the MySQL 8.0 Reference Manual:

  • utf8mb4: A UTF-8 encoding of the Unicode character set using one to four bytes per character.

  • utf8mb3: A UTF-8 encoding of the Unicode character set using one to three bytes per character.

In MySQL utf8 is currently an alias for utf8mb3 which is deprecated and will be removed in a future MySQL release. At that point utf8 will become a reference to utf8mb4.

So regardless of this alias, you can consciously set yourself an utf8mb4 encoding.

To complete the answer, I'd like to add the @WilliamEntriken's comment below (also taken from the manual):

To avoid ambiguity about the meaning of utf8, consider specifying utf8mb4 explicitly for character set references instead of utf8.

 

What's Navicat?

Navicat is a series of graphical database management and development tools developed by PremiumSoft CyberTech Ltd. It provides a user-friendly interface for managing and developing databases across various database management systems (DBMS) such as MySQL, Oracle, SQL Server, PostgreSQL, SQLite and MariaDB.

Navicat allows users to connect to and manage multiple databases from a single application, making it a popular choice for developers and database administrators who work with multiple databases on different platforms. Some of the key features of Navicat include data modeling, data synchronization, backup and restore, SQL editing, and database administration.

Navicat is available in different editions, including Standard, Professional, and Enterprise, with different levels of functionality and pricing. It is widely used by developers and organizations worldwide, ranging from small businesses to large enterprises.

 

Can I connect to MySQL using ssms?

SQL Server Management Studio (SSMS) is primarily designed for managing SQL Server databases, but it does support connecting to MySQL databases as well.

To connect to a MySQL database using SSMS, you need to have the MySQL Server installed and running, and you also need to install the MySQL Connector/NET, which is a library that provides connectivity between .NET applications and MySQL databases.

Here are the steps to connect to a MySQL database using SSMS:

  1. Open SSMS and click on the "Object Explorer" tab.
  2. Right-click on the "SQL Server" node and select "Register".
  3. In the "Register Server" dialog, select "MySQL" as the Server type.
  4. Enter the connection details for your MySQL database, such as the server name, port number, and authentication credentials.
  5. Click on the "Test" button to test the connection.
  6. If the connection is successful, click on "OK" to save the connection details.

Once you have registered the MySQL database in SSMS, you can use it to manage the database objects, run queries, and perform other database administration tasks just like you would with SQL Server databases. However, note that not all features in SSMS are supported for MySQL databases, and you may need to use other tools or libraries to perform certain tasks.

 

What's collate in MySQL? How many enumerate value could it have? What's the default value?

In MySQL, collation refers to the set of rules that determine how character data is sorted and compared. It defines the character set and sorting rules to be used for a particular column or expression in a query.

MySQL supports many different collations, and each collation has a unique name that identifies the character set and sorting rules used. Some common collations in MySQL include "utf8_general_ci", "utf8mb4_general_ci", "latin1_general_ci", and "binary".

Each collation can have different enumerate values, depending on the specific character set and sorting rules used. For example, the "utf8_general_ci" collation has 58 possible values, while the "utf8mb4_general_ci" collation has 453 possible values.

The default collation in MySQL depends on the version and installation settings. In most cases, the default collation is set to "utf8_general_ci" for MySQL 5.5 and earlier, and "utf8mb4_0900_ai_ci" for MySQL 5.6 and later. However, this can be changed during installation or configured on a per-database or per-table basis using the "COLLATE" keyword in SQL statements.

 

 

标签:What,set,UTF,utf8mb4,utf8,character,MySQL
From: https://www.cnblogs.com/chucklu/p/17204614.html

相关文章

  • What's the Cisco Packet Tracer? Is it free?
    What'stheCiscoPacketTracer?Isitfree?CiscoPacketTracerisasimulationsoftwarethatallowsuserstodesign,configure,andtroubleshootcomputernetw......
  • MySQL字符集 utf8 和 utf8mb4 有什么区别?
    UTF-8是一种Unicode字符集编码方式,用于存储和传输Unicode字符。MySQL支持UTF-8字符集,但在MySQL5.5.3之前,它只支持最多三个字节的UTF-8编码(也称为“utf8”字符集),因此无法存......
  • js 判断字节流是否可以按照utf8编码转成字符串
    下面的代码是我网上找的其他语言的实现方式改的js版本的,原版本的地址找不见了,但是我们还是要先感谢下原版本的作者不同于原版本,因为UTF8兼容ASCII,纯ASCII字节流也能用utf8......
  • I am 23 years old. What can I do now that will change my life forever?
    Savemoney,don’tspendit.Ifyoudon’tneeditdon’tbuyit.ICANNOTSTRESSTHISONEENOUGH.Travelcheapandoften.Travelingisharderasyougofrom......
  • AL32UTF8和UTF8有什么区别呢?
    Oracle的UTF8字符集由来已久,至少在8的时候就已经存在了,而对应的是UNICODE3.0。而AL32UTF8字符集是9i才出现的,其对应的是UNICODE5.0。这两种字符集的区别在于,UNICODE5.0......
  • What is Point ?
    学习心态指针其实跟一些运算符的表达式类似(例如i++,i--),它通过符号隐藏了内部的计算过程,只要学习者逐步的分解开,就很容易理解了。学习指针的时候,尽量想象底层硬件的工作方......
  • What does Strongly Connected Components mean?Why a single vertex can be a compoe
    WhatdoesStronglyConnectedComponentsmean?Whyasinglevertexcanbeacompoent?Inadirectedgraph,astronglyconnectedcomponent(SCC)isasubgraph......
  • .NET 5 的新功能 What‘s new in .NET 5
    本文内容​​.NET5.0doesn'treplace.NETFramework​​​​.NET5.0doesn'treplace.NETStandard​​​​C#updates​​​​F#updates​​​​VisualBasicupdate......
  • WHAT IS PPM Encoder ?
    AboutPPMEncoderThePPMencoderallowstoencodeupto8PWM(pulsewidthmodulated)signalsintoonePPM(pulsepositionmodulation)signal.Thefunction......
  • Vectors, what even are they?
    InterpretationsofVectorsPhysicsPerspective:vectorsarearrowspointinginspace.CSPerspective:vectorsareorderedlistsofnumbers.Mathematician's......