首页 > 编程语言 >python 3 open() default encoding

python 3 open() default encoding

时间:2022-11-19 10:45:22浏览次数:64  
标签:Python UTF encoding python default locale open

python 3 open() default encoding

回答1

The default UTF-8 encoding of Python 3 only extends to byte->str conversions. open() instead uses your environment to choose an appropriate encoding:

From the Python 3 docs for open():

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

In your case, as you're on Windows with a Western Europe/North America, you will be given the 8bit Windows-1252 character set. Setting encoding to utf-8 overrides this.

 

Fortunately there are recent attempts to end this madness... someday. – Jeyekomon Apr 28, 2020 at 14:05    

Motivation

Using the default encoding is a common mistake

Developers using macOS or Linux may forget that the default encoding is not always UTF-8.

For example, using long_description = open("README.md").read() in setup.py is a common mistake. Many Windows users cannot install such packages if there is at least one non-ASCII character (e.g. emoji, author names, copyright symbols, and the like) in their UTF-8-encoded README.md file.

Of the 4000 most downloaded packages from PyPI, 489 use non-ASCII characters in their README, and 82 fail to install from source on non-UTF-8 locales due to not specifying an encoding for a non-ASCII file. [1]

Another example is logging.basicConfig(filename="log.txt"). Some users might expect it to use UTF-8 by default, but the locale encoding is actually what is used. [2]

Even Python experts may assume that the default encoding is UTF-8. This creates bugs that only happen on Windows; see [3], [4], [5], and [6] for example.

Emitting a warning when the encoding argument is omitted will help find such mistakes.

 

Explicit way to use locale-specific encoding

open(filename) isn’t explicit about which encoding is expected:

  • If ASCII is assumed, this isn’t a bug, but may result in decreased performance on Windows, particularly with non-Latin-1 locale encodings
  • If UTF-8 is assumed, this may be a bug or a platform-specific script
  • If the locale encoding is assumed, the behavior is as expected (but could change if future versions of Python modify the default)

From this point of view, open(filename) is not readable code.

encoding=locale.getpreferredencoding(False) can be used to specify the locale encoding explicitly, but it is too long and easy to misuse (e.g. one can forget to pass False as its argument).

This PEP provides an explicit way to specify the locale encoding.

Prepare to change the default encoding to UTF-8

Since UTF-8 has become the de-facto standard text encoding, we might default to it for opening files in the future.

However, such a change will affect many applications and libraries. If we start emitting DeprecationWarning everywhere the encoding argument is omitted, it will be too noisy and painful.

Although this PEP doesn’t propose changing the default encoding, it will help enable that change by:

  • Reducing the number of omitted encoding arguments in libraries before we start emitting a DeprecationWarning by default.
  • Allowing users to pass encoding="locale" to suppress the current warning and any DeprecationWarning added in the future, as well as retaining consistent behavior if later Python versions change the default, ensuring support for any Python version >=3.10.

 

Which encoding should Python open function use?

回答1

As clearly stated in Python's open documentation:

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

Windows defaults to a localized encoding (cp1252 on US and Western European versions). Linux typically defaults to utf-8.

Because it is platform-dependent, use the encoding parameter and specify the encoding of the file explicitly.

 

https://docs.python.org/3/library/functions.html#open

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

 

locale.getpreferredencoding(do_setlocale=True)

Return the locale encoding used for text data, according to user preferences. User preferences are expressed differently on different systems, and might not be available programmatically on some systems, so this function only returns a guess.

On some systems, it is necessary to invoke setlocale() to obtain the user preferences, so this function is not thread-safe. If invoking setlocale is not necessary or desired, do_setlocale should be set to False.

On Android or if the Python UTF-8 Mode is enabled, always return 'UTF-8', the locale encoding and the do_setlocale argument are ignored.

The Python preinitialization configures the LC_CTYPE locale. See also the filesystem encoding and error handler.

Changed in version 3.7: The function now always returns UTF-8 on Android or if the Python UTF-8 Mode is enabled.

 

 

 

 

标签:Python,UTF,encoding,python,default,locale,open
From: https://www.cnblogs.com/chucklu/p/16905610.html

相关文章

  • How to setup Visual Studio Code to detect and set the correct encoding on file o
    HowtosetupVisualStudioCodetodetectandsetthecorrectencodingonfileopen回答1Addguidebyimage:File>>Preferences>>SettingsEnterautoGuess......
  • OpenCV实现艺术字
    本文参考自《计算机视觉40例从入门到深度学习(OpenCV-Python)》原理介绍通过简单的或运算实现。lenacolor.pngwatermark.bmp(二值图像)lenacolor作为艺术字的背景图像......
  • How to run python interactive in current file's directory in Visual Studio Code?
    Howtorunpythoninteractiveincurrentfile'sdirectoryinVisualStudioCode?问题Whenexecuting"RunSelection/LineinPythonTerminal"commandinVSCod......
  • OpenCV的图像加法
    本文参考自《计算机视觉40例从入门到深度学习(OpenCV-Python)》5.5.2前言日常生活中,我们对于加法的结果有如下两种处理方式取模处理,又称作“循环取余”,例如对时间的处......
  • [oeasy]python0017_解码_decode_字节序列_bytes_字符串_str
    ​ 解码decode回忆上次内容code就是码最早也指电报码后来有各种编码、密码、砝码、条码都指的是把各种事物编个号encode就是编码编码就是给事物编个号......
  • python感知机
    感知机是一种二类分类的线性分类器,属于判别模型(另一种是生成模型)。简单地说,就是通过输入特征,利用超平面,将目标分为两类。感知机是神经网络和支持向量机的基础。现实过程如......
  • PythonAnywhere 部署Flas项目
    一、注册账号官网:https://www.pythonanywhere.com/ 二、将GitHub上的项目发送至PythonAnywhere三、配置环境及运行gitclonehttps://github.com/chao-yua......
  • OpenCV实现LSB算法(数字水印)
    本文参考自《计算机视觉40例从入门到深度学习(OpenCV-Python)》LSB算法的原理就不在过多的介绍了,直接上代码。lenacolor.pngwatermark.bmp#LSB算法importnumpyasn......
  • HOG算法的理解与python实现
    HOG称为方向梯度直方图(HistogramofOrientedGradient),主要是为了对图像进行特征提取。所以在传统目标检测算法中经常与SVM结合用于行人识别任务(当前都是基于深度学习来做......
  • python学习笔记(二)
    一、数据类型python里面直接auto了,跟c有很大不同,基本上由编译器自动检测赋值内容,但也可以手动确定。 只不过有挺多其他的函数很方便var1=100var2=200var3=300......