CStdioFile在UNICODE字符集下读写中文

标签：CStdioFile UNICODE stream 字符集 Unicode 字节

问题

以CFile::typeBinary的形式读写包含中文的文件，未出现乱码。
以CFile::typeText方式读写，分两种情况：在多字节字符集下，使用CStdioFile::ReadString读取包含中文的文件，正常；工程编码切换至UNICODE字符集，则出现了中文乱码。

探究

查找MSDN文档，获得以下信息：

Unicode™ Stream I/O in Text and Binary Modes
When a Unicode stream I/O routine (such as fwprintf, fwscanf, fgetwc, fputwc, fgetws, or
fputws) operates on a file that is open in text mode (the default),
two kinds of character conversions take place:
Unicode-to-MBCS or MBCS-to-Unicode conversion. When a Unicode stream-I/O function operates in text mode, the source or destination
stream is assumed to be a sequence of multibyte characters.
Therefore, the Unicode stream-input functions convert multibyte
characters to wide characters (as if by a call to the mbtowc
function). For the same reason, the Unicode stream-output functions
convert wide characters to multibyte characters (as if by a call to
the wctomb function).
Carriage return – linefeed (CR-LF) translation. This translation occurs before the MBCS – Unicode conversion (for Unicode stream input
functions) and after the Unicode – MBCS conversion (for Unicode
stream output functions). During input, each carriage return –
linefeed combination is translated into a single linefeed character.
During output, each linefeed character is translated into a carriage
return – linefeed combination.
However, when a Unicode stream-I/O function operates in binary mode,
the file is assumed to be Unicode, and no CR-LF translation or
character conversion occurs during input or output.

从以上信息中提取到关键信息：

在文本模式下，使用UNICODE版本的IO操作函数（如_wfopen,
fgetws等）时，函数会假定操作的对象是多字节序列（即文件存放的是多字节内容）。这些函数内部会做转换，比如读取时，就会将多字节序列转换成宽字节；写入时，就会将宽字节转换成多字节序列。
在二进制模式下，函数会假定操作的对象是UNICODE序列（即文件存放的是UNICODE内容，即每个字符都用二或四（极少的情况下）个字节存储在文件中，除非显式写入，否则文件中不带BOM头）

网上搜集到的信息：

使用setlocale设置区域，原因:

因为在C/C++语言标准中定义了其运行时的字符集环境为”C”，也就是ASCII字符集的一个子集，那么mbstowcs在工作时会将cstr中所包含的字符串看作是ASCII编码的字符，而不认为是一个包含有chs编码的字符串，所以他会将每一个中文拆成2个ASCII编码进行转换，这样得到的结果就是会形成4个wchar_t的字符组成的串，那么如何才能够让mbstowcs正常工作呢？在调用mbstowcs进行转换之间必须明确的告诉mbstowcs目前cstr串中包含的是chs编码的字符串，通过setlocale(
LC_ALL, “chs” )函数调用来完成，需要注意的是这个函数会改变整个应用程序的字符集编码方式，必须要通过重新调用setlocale(
LC_ALL, “C”
)函数来还原，这样就可以保证mbstowcs在转换时将cstr中的串看作是中文串，并且转换成为2个wchar_t字符，而不是4个。

代码举例：

//区域设定
    char* old_locale = _strdup( setlocale(LC_CTYPE,NULL) );
    setlocale( LC_CTYPE, "chs" );

    //写入中文字串
    CStdioFile mFile;
    if( mFile.Open( _T("test_file.txt"), CFile::modeCreate | 
        CFile::modeReadWrite | CFile::typeText))
    {
        try
        {
            mFile.WriteString( _T("在多字节字符集下，使用CStdioFile::ReadString")
                _T("读取包含中文的文件，正常。但是将工程编码切换至"
                _T("UNICODE字符集，则出现了中文乱码的情况。")) );
        }
        catch (CException* e)
        {
            e->ReportError();
        }
        mFile.Close();
    }


    //读出字段
    CStdioFile mFileRead;
    if ( mFileRead.Open( _T("test_file.txt"), CFile::modeRead | CFile::typeText ) )
    {
        CString strTemp;
        mFileRead.ReadString( strTemp );
        m_ctlDisplay.SetWindowText( strTemp );
    }

    setlocale( LC_CTYPE, old_locale );
    free( old_locale );//还原区域设定

引申

使用CStdioFile读取UTF8、UNICODE(UTF16)、UTF-16LE 编码的文件：

需要用到另一个构造函数CStdioFile( FILE* pOpenStream )，传给其一个FILE对象指针，其中FILE对象是通过_tfopen_s来得到的，_tfopen_s这个函数支持打开UTF8、UNICODE(UTF16)、UTF-16LE 编码的文件，

CStdioFile在UNICODE字符集下读写中文_多字节

网上也有个例子：

模仿着写了一份测试（测试确实可以这样用）：

// 写入
    FILE *fStream = NULL;
    errno_t e = _tfopen_s(&fStream, 
        _T("text_file_utf8.txt"), _T("wt,ccs=UTF-8")); // or ccs=UTF-8
    if (e != 0) return -1; // failed..CString sRead;

    CStdioFile mFile( fStream );
    try
    {
        mFile.WriteString(_T("aaaa中文文本测试11"));
    }
    catch (CException* e)
    {
        e->ReportError();
    }
    mFile.Close();


    //读取
    fStream = NULL;
    e = _tfopen_s(&fStream, 
        _T("text_file_utf8.txt"), _T("rt,ccs=UTF-8")); // or ccs=UTF-8
    if (e != 0) return -1; // failed..CString sRead;

    CStdioFile mFileRead( fStream );
    try
    {
        CString strTemp;
        mFileRead.ReadString( strTemp );
        m_ctlDisplay.SetWindowText( strTemp );
    }
    catch (CException* e)
    {
        e->ReportError();
    }
    mFileRead.Close();

标签：CStdioFile,UNICODE,stream,字符集,Unicode,字节
From： https://blog.51cto.com/u_15905375/5919847

CStdioFile在UNICODE字符集下读写中文

问题

探究

引申

相关文章

赞助商

阅读排行