Python酷库之旅-第三方库Pandas(136)

# 611、pandas.DataFrame.to_orc方法
pandas.DataFrame.to_orc(path=None, *, engine='pyarrow', index=None, engine_kwargs=None)
Write a DataFrame to the ORC format.

New in version 1.5.0.

Parameters:
path
str, file-like object or None, default None
If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function). If path is None, a bytes object is returned.

engine
{‘pyarrow’}, default ‘pyarrow’
ORC library to use.

index
bool, optional
If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, similar to infer the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.

engine_kwargs
dict[str, Any] or None, default None
Additional keyword arguments passed to pyarrow.orc.write_table().

Returns:
bytes if no path argument is provided else None
Raises:
NotImplementedError
Dtype of one or more columns is category, unsigned integers, interval, period or sparse.

ValueError
engine is not pyarrow.

611-2、参数

611-2-1、path(可选，默认值为None)：str或path-like，指定要写入的ORC文件的路径，如果不提供，则返回ORC格式的数据而不写入文件。

611-2-2、engine(可选，默认值为'pyarrow')：字符串，指定用于写入ORC的引擎，可以是'pyarrow'或其他支持ORC格式的库(如'fastparquet')，默认为pyarrow，因为它通常提供更快的性能和更多的功能。

611-2-3、index(可选，默认值为None)：布尔值，指定是否将DataFrame的索引写入文件，默认值为None，表示采用默认行为；如果设为False，则索引将不会写入文件。

611-2-4、engine_kwargs(可选，默认值为None)：字典，传递给底层引擎的额外参数，这可以用于配置具体的写入选项，具体取决于所选择的引擎。

611-3、功能

将Pandas DataFrame转换并存储为ORC文件格式，便于后续的数据查询和分析，ORC格式的优势是在存储和读取大数据时提供更好的性能和压缩效果。

611-4、返回值

如果提供了path参数并成功写入文件，则返回None；如果没有提供path，则返回一个包含DataFrame数据的ORC格式的可序列化对象(在使用pyarrow时，通常返回一个pyarrow.Table对象)。

611-5、说明

无

611-6、用法

611-6-1、数据准备

无

611-6-2、代码示例

# 611、pandas.DataFrame.to_orc方法
import pandas as pd
# 创建一个示例 DataFrame
data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# 指定ORC文件的保存路径
path = 'example.orc'
# 将DataFrame保存到ORC文件
df.to_orc(
    path=path,
    engine='pyarrow',  # 使用 pyarrow 引擎
    index=True,  # 保留 DataFrame 的索引
    engine_kwargs={  # 可选的引擎参数，这里作为示例没有设置任何内容
        # 例如，可以指定压缩算法等
        # 'compression': 'snappy'
    }
)
print(f"DataFrame has been saved to {path}")

611-6-3、结果输出

无

612、pandas.DataFrame.to_dict方法

612-1、语法

# 612、pandas.DataFrame.to_dict方法
pandas.DataFrame.to_dict(orient='dict', *, into=<class 'dict'>, index=True)
Convert the DataFrame to a dictionary.

The type of the key-value pairs can be customized with the parameters (see below).

Parameters:
orientstr {‘dict’, ‘list’, ‘series’, ‘split’, ‘tight’, ‘records’, ‘index’}
Determines the type of the values of the dictionary.

‘dict’ (default) : dict like {column -> {index -> value}}

‘list’ : dict like {column -> [values]}

‘series’ : dict like {column -> Series(values)}

‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

‘tight’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values], ‘index_names’ -> [index.names], ‘column_names’ -> [column.names]}

‘records’ : list like [{column -> value}, … , {column -> value}]

‘index’ : dict like {index -> {column -> value}}

New in version 1.4.0: ‘tight’ as an allowed value for the orient argument

intoclass, default dict
The collections.abc.MutableMapping subclass used for all Mappings in the return value. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized.

indexbool, default True
Whether to include the index item (and index_names item if orient is ‘tight’) in the returned dictionary. Can only be False when orient is ‘split’ or ‘tight’.

New in version 2.0.0.

Returns:
dict, list or collections.abc.MutableMapping
Return a collections.abc.MutableMapping object representing the DataFrame. The resulting transformation depends on the orient parameter.

612-2、参数

612-2-1、orient(可选，默认值为'dict')：字符串，指定字典的构造方式，常见的取值包括：

'dict'：默认，字典的键为列名，值为列的数据(以字典形式存储)。
'list'：字典的键为列名，值为列的数据(以列表形式存储)。
'series'：字典的键为列名，值为Series对象。
'split'：返回一个字典，包含三个键：'index'(索引)、'columns'(列名)和'data'(数据)。
'records'：每行作为字典，返回一个字典的列表。
'index'：字典的键为行索引，值为字典(每列的名称作为字典的键，每列的数据作为字典的值)。

612-2-2、into(可选，默认值为<class 'dict'>)：type，指定要返回的字典类型，默认是普通字典。

612-2-3、index(可选，默认值为True)：布尔值，如果为True，则在返回字典中包含行索引；如果为False，则不包含行索引。

612-3、功能

将DataFrame转换为字典形式，方便与其他数据结构的交互或用于数据的序列化，通过不同的orient选项，可以根据需求选择合适的字典结构。

612-4、返回值

返回一个字典，内容和结构取决于指定的orient参数，如果没有指定into参数，返回的将是标准的Python字典。

612-5、说明

无

612-6、用法

612-6-1、数据准备

无

612-6-2、代码示例

# 612、pandas.DataFrame.to_dict方法
import pandas as pd
# 创建一个示例DataFrame
df = pd.DataFrame({
    'name': ['Myelsa', 'Bryce', 'Jimmy'],
    'age': [43, 6, 15]
})
# 将DataFrame转换为字典（默认）
dict_default = df.to_dict()
# 将DataFrame转换为列表格式字典
dict_list = df.to_dict(orient='list')
# 将DataFrame转换为记录格式字典
dict_records = df.to_dict(orient='records')
# 将DataFrame转换为索引格式字典
dict_index = df.to_dict(orient='index')
print('Default:', dict_default)
print('List:', dict_list)
print('Records:', dict_records)
print('Index:', dict_index)

612-6-3、结果输出

# 612、pandas.DataFrame.to_dict方法
# Default: {'name': {0: 'Myelsa', 1: 'Bryce', 2: 'Jimmy'}, 'age': {0: 43, 1: 6, 2: 15}}
# List: {'name': ['Myelsa', 'Bryce', 'Jimmy'], 'age': [43, 6, 15]}
# Records: [{'name': 'Myelsa', 'age': 43}, {'name': 'Bryce', 'age': 6}, {'name': 'Jimmy', 'age': 15}]
# Index: {0: {'name': 'Myelsa', 'age': 43}, 1: {'name': 'Bryce', 'age': 6}, 2: {'name': 'Jimmy', 'age': 15}}

613、pandas.DataFrame.to_records方法

613-1、语法

# 613、pandas.DataFrame.to_records方法
pandas.DataFrame.to_records(index=True, column_dtypes=None, index_dtypes=None)
Convert DataFrame to a NumPy record array.

Index will be included as the first field of the record array if requested.

Parameters:
indexbool, default True
Include index in resulting record array, stored in ‘index’ field or using the index label, if set.

column_dtypesstr, type, dict, default None
If a string or type, the data type to store all columns. If a dictionary, a mapping of column names and indices (zero-indexed) to specific data types.

index_dtypesstr, type, dict, default None
If a string or type, the data type to store all index levels. If a dictionary, a mapping of index level names and indices (zero-indexed) to specific data types.

This mapping is applied only if index=True.

Returns:
numpy.rec.recarray
NumPy ndarray with the DataFrame labels as fields and each row of the DataFrame as entries.

613-2、参数

613-2-1、index(可选，默认值为True)：布尔值，指定是否将DataFrame的索引量转换为记录中的字段，如果为True，则索引会作为结构化数组的字段之一；如果为False，索引将不会包含在返回结果中。

613-2-2、column_dtypes(可选，默认值为None)：字典，用于指定列的类型，如果提供，该字典的键为列名，值为对应列的类型，通常用于调整返回的结构化数组中字段的数据类型。

613-2-3、index_dtypes(可选，默认值为None)：字典，用于指定索引的类型，如果提供，该字典的键为索引的名称，值为要应用于索引的类型。

613-3、功能

将DataFrame转换为一个结构化数组，方便通过属性名的方式访问每一行的数据，这种格式通常在处理数据时非常有用，可以与NumPy和其他数据分析工具更好地集成。

613-4、返回值

返回一个结构化数组recarray，每一行表示DataFrame中的一行，每一列表示DataFrame的列。

613-5、说明

无

613-6、用法

613-6-1、数据准备

无

613-6-2、代码示例

# 613、pandas.DataFrame.to_records方法
import pandas as pd
# 创建一个示例DataFrame
df = pd.DataFrame({
    'name': ['Myelsa', 'Bryce', 'Jimmy'],
    'age': [43, 6, 15]
})
# 将DataFrame转换为结构化数组（默认将索引包含在内）
structured_records_default = df.to_records()
# 将DataFrame转换为结构化数组（不包含索引）
structured_records_no_index = df.to_records(index=False)
print('Default Records:')
print(structured_records_default)
print('Records without Index:')
print(structured_records_no_index)

613-6-3、结果输出

# 613、pandas.DataFrame.to_records方法
# Default Records:
# [(0, 'Myelsa', 43) (1, 'Bryce',  6) (2, 'Jimmy', 15)]
# Records without Index:
# [('Myelsa', 43) ('Bryce',  6) ('Jimmy', 15)]

614、pandas.DataFrame.to_string方法

614-1、语法

# 614、pandas.DataFrame.to_string方法
pandas.DataFrame.to_string(buf=None, *, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', line_width=None, min_rows=None, max_colwidth=None, encoding=None)
Render a DataFrame to a console-friendly tabular output.

Parameters:
buf
str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.

columns
array-like, optional, default None
The subset of columns to write. Writes all columns by default.

col_space
int, list or dict of int, optional
The minimum width of each column. If a list of ints is given every integers corresponds with one column. If a dict is given, the key references the column, while the value defines the space to use..

header
bool or list of str, optional
Write out the column names. If a list of columns is given, it is assumed to be aliases for the column names.

index
bool, optional, default True
Whether to print index (row) labels.

na_rep
str, optional, default ‘NaN’
String representation of NaN to use.

formatters
list, tuple or dict of one-param. functions, optional
Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.

float_format
one-parameter function, optional, default None
Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep.

sparsify
bool, optional, default True
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.

index_names
bool, optional, default True
Prints the names of the indexes.

justify
str, default None
How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are

left

right

center

justify

justify-all

start

end

inherit

match-parent

initial

unset.

max_rows
int, optional
Maximum number of rows to display in the console.

max_cols
int, optional
Maximum number of columns to display in the console.

show_dimensions
bool, default False
Display DataFrame dimensions (number of rows by number of columns).

decimal
str, default ‘.’
Character recognized as decimal separator, e.g. ‘,’ in Europe.

line_width
int, optional
Width to wrap a line in characters.

min_rows
int, optional
The number of rows to display in the console in a truncated repr (when number of rows is above max_rows).

max_colwidth
int, optional
Max width to truncate each column in characters. By default, no limit.

encoding
str, default “utf-8”
Set character encoding.

Returns:
str or None
If buf is None, returns the result as a string. Otherwise returns None.

614-2、参数

614-2-1、buf(可选，默认值为None)：字符串或None，指定输出的目标，如果为None，则返回格式化后的字符串；如果为一个类文件对象，会将结果写入该对象。

614-2-2、columns(可选，默认值为None)：列表或None，指定要显示的列，如果为None，则显示所有列。

614-2-3、col_space(可选，默认值为None)：整数或None，指定列的最小空间，如果指定，该参数可以用于调整列宽。

614-2-4、header(可选，默认值为True)：布尔值，指定是否显示列标题。

614-2-5、index(可选，默认值为True)：布尔值，指定是否显示行索引。

614-2-6、na_rep(可选，默认值为'NaN')：字符串，指定缺失值的表示形式。

614-2-7、formatters(可选，默认值为None)：字典或None，指定列格式化的函数，键是列名，值是格式化函数。

614-2-8、float_format(可选，默认值为None)：字符串或None，指定浮点数的格式化方式，例如，'${:.2f}'将浮点数格式化为货币格式。

614-2-9、sparsify(可选，默认值为None)：布尔值，是否在打印MultiIndex时使用稀疏格式，如果为True，则会为MultiIndex进行格式化处理。

614-2-10、index_names(可选，默认值为True)：布尔值，指定是否显示索引名称。

614-2-11、justify(可选，默认值为None)：字符串或None，指定列标题的对齐方式，如'left'、'right'、'center'。

614-2-12、max_rows(可选，默认值为None)：整数或None，指定显示的最大行数。

614-2-13、max_cols(可选，默认值为None)：整数或None，指定显示的最大列数。

614-2-14、show_dimensions(可选，默认值为False)：布尔值，是否显示DataFrame的维度信息。

614-2-15、decimal(可选，默认值为'.')：字符串，指定小数点的符号，常用于处理小数表示。

614-2-16、line_width(可选，默认值为None)：整数或None，指定输出行的最大宽度，如果超过该限制，会自动换行。

614-2-17、min_rows(可选，默认值为None)：整数或None，强制至少显示的行数。

614-2-18、max_colwidth(可选，默认值为None)：整数或None，指定列的最大宽度，如果某列内容超过该宽度，则内容会被截断。

614-2-19、encoding(可选，默认值为None)：字符串或None，指定输出字符串的编码方式，仅适用于将输出写入文件时。

614-3、功能

提供了灵活的选项来格式化DataFrame的输出，以便更好地展示数据，这对于调试或数据审查非常有用。

614-4、返回值

返回一个字符串(如果buf为None)，或将结果写入到buf指定的文件对象中。

614-5、说明

无

614-6、用法

614-6-1、数据准备

无

614-6-2、代码示例

# 614、pandas.DataFrame.to_string方法
import pandas as pd
# 创建一个示例DataFrame
df = pd.DataFrame({
    'name': ['Myelsa', 'Bryce', 'Jimmy'],
    'age': [43, 6, 15],
    'salary': [50000, None, 70000]
})
# 使用to_string方法显示DataFrame
output = df.to_string(index=False)
print(output, end='\n\n')
# 输出DataFrame，限制最大列宽为10
output_limited_colwidth = df.to_string(max_colwidth=10)
print(output_limited_colwidth)

614-6-3、结果输出

# 614、pandas.DataFrame.to_string方法
#   name  age  salary
# Myelsa   43 50000.0
#  Bryce    6     NaN
#  Jimmy   15 70000.0
# 
#      name  age   salary
# 0  Myelsa   43  50000.0
# 1   Bryce    6      NaN
# 2   Jimmy   15  70000.0

615、pandas.DataFrame.to_markdown方法

615-1、语法

# 615、pandas.DataFrame.to_markdown方法
pandas.DataFrame.to_markdown(buf=None, *, mode='wt', index=True, storage_options=None, **kwargs)
Print DataFrame in Markdown-friendly format.

Parameters:
buf
str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.

mode
str, optional
Mode in which file is opened, “wt” by default.

index
bool, optional, default True
Add index (row) labels.

storage_options
dict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

**kwargs
These parameters will be passed to tabulate.

Returns:
str
DataFrame in Markdown-friendly format.

Notes

Requires the tabulate package.

615-2、参数

615-2-1、buf(必须)：字符串或None，指定输出的目标，如果为None，则返回格式化后的字符串；如果为一个类文件对象，会将结果写入该对象。

615-2-2、mode(可选，默认值为'wt')：字符串，指定写入模式，常用的值包括：

'wt'：文本模式写入。
'wb'：二进制模式写入。

615-2-3、index(可选，默认值为True)：布尔值，指定是否显示行索引。

615-2-4、storage_options(可选，默认值为None)：字典或None，提供用于读取或写入数据的额外选项，主要用于支持不同的存储后端(例如，云存储)。

615-2-5、**kwargs(可选)：其他关键字参数，将传递给Markdown格式化的内部函数，这些参数可以包括控制列格式化、宽度等选项。

615-3、功能

允许用户将Pandas DataFrame方便地转换为Markdown格式，输出为易于阅读的表格，适用于文档、报告或处理Markdown文本文件时。

615-4、返回值