前言
pandas
版本低于2.x
,engine
参数不支持calamine
,但是又想提升读取大型Excel
的速度,那么还是可以用python-calamine
库的
# 安装
pip install python-calamine
示例1
import pandas as pd
from python_calamine import CalamineWorkbook
wb = CalamineWorkbook.from_path(filename)
# sht_names = wb.sheet_names # 获取所有sheet名
# row_list = wb.get_sheet_by_name(sheet).to_python()
row_list = wb.get_sheet_by_index(0).to_python()
df = pd.DataFrame(row_list)
df.columns = df.iloc[0].values
df.drop(0, inplace=True)
df.reset_index(drop=True, inplace=True)
print(df.head())
print(df.shape)
"""
Column_1 Column_2 Column_3 Column_4 ... Column_17 Column_18 Column_19 Column_20
0 72.0 32.0 34.0 46.0 ... 78.0 68.0 31.0 97.0
1 46.0 85.0 62.0 62.0 ... 30.0 79.0 28.0 94.0
2 96.0 44.0 80.0 2.0 ... 44.0 30.0 39.0 3.0
3 13.0 26.0 25.0 61.0 ... 69.0 57.0 17.0 89.0
4 15.0 76.0 21.0 74.0 ... 64.0 60.0 4.0 88.0
[5 rows x 20 columns]
(120000, 20)
耗时:2.229128837585449
"""
当然,当你读取一个超大文件时,可能会提示内存分配错误
示例2
import pandas as pd
from python_calamine import CalamineWorkbook
def iter_excel_calamine(file):
workbook = CalamineWorkbook.from_filelike(file)
rows = iter(workbook.get_sheet_by_index(0).to_python())
# headers = list(map(str, next(rows)))
for row in rows:
# yield dict(zip(headers, row))
yield row
tmp_list = []
with open(filename, "rb") as fh:
for row in iter_excel_calamine(fh):
tmp_list.append(row)
df = pd.DataFrame(tmp_list)
print(df.head())
print(df.shape)
"""
0 1 2 ... 17 18 19
0 Column_1 Column_2 Column_3 ... Column_18 Column_19 Column_20
1 72.0 32.0 34.0 ... 68.0 31.0 97.0
2 46.0 85.0 62.0 ... 79.0 28.0 94.0
3 96.0 44.0 80.0 ... 30.0 39.0 3.0
4 13.0 26.0 25.0 ... 57.0 17.0 89.0
[5 rows x 20 columns]
(120001, 20)
耗时:2.1872622966766357
"""
标签:...,读取,Column,list,Excel,python,df,pandas,row
From: https://www.cnblogs.com/rong-z/p/18210989