PYTHON 快速分割CSV

时间：2023-08-20 11:34:21浏览次数：60

标签：分割 csv PYTHON True chunk df pd CSV open

from openpyxl import Workbook
import pandas as pd
import numpy as np
import sys,time,re,csv
path="f:/te/qh.csv"
path1="F:/BaiduNetdiskDownload\行政许可/行政许可/行政许可.csv"
##num_rows = sum(1 for row in open(path,encoding="utf-8"))
##num_rows1 = sum(1 for row in open(path1,encoding="utf-8"))
chunksize = 10000
chunk_pointer = 0
tt="f:/te/qhv1.xlsx"
writer = pd. ExcelWriter(tt, engine= 'openpyxl')
# 循环读取 CSV 文件的每个块
def read_csv_feature(filePath):
    # 读取文件
    f = open(filePath, encoding='utf-8')
    reader = pd.read_csv(f, sep=',', iterator=True,low_memory=False)
    loop = True
    chunkSize = 100000
    chunks = []
    while loop:
        try:
            chunk = reader.get_chunk(chunkSize)
            chunks.append(chunk)
        except StopIteration:
            loop = False
            print('Iteration is END!!!')
    df = pd.concat(chunks, axis=0, ignore_index=True)
    f.close()
    return df 

f = open(path1, encoding='utf-8')
cxx=['company_id','unified_code','ent_name','reg_capital','real_capital','reg_no','legal_person','open_status','old_ent_name','industry','tax_no','license_number','org_no',
'authority','annual_date','start_date','ent_type','open_time','district','district_code','reg_addr','scope','state','create_time','update_time','数据来源']
reader = pd.read_csv(f, sep=',', iterator=True,low_memory=False)

f1 = open(path, encoding='utf-8')
reader1 = pd.read_csv(f1, sep=',', iterator=True,low_memory=False,names=cxx)
loop = loop1=True
chunkSize = 5000
chunks =[]
chunks1=[]
ab=0
tff=0
while loop:
    ab=ab+1
    try:
        chunk = reader.get_chunk(120000)
        chunks.append(chunk)
    except:
        loop=False
    df = pd.concat(chunks, axis=0, ignore_index=True)
    df.drop(columns=['state','create_time','update_time','数据来源'])
    print(df)
    df.to_csv("f:/te/qinghai"+str(ab)+".csv")

标签：分割,csv,PYTHON,True,chunk,df,pd,CSV,open
From： https://www.cnblogs.com/xkdn/p/17643763.html

python
pythonclassBook: def__init__(self,title,author,year): self.title=title self.author=author self.year=yearclassLibrary: def__init__(self): self.books=[] defadd_book(self,book): se......
Python学习：迭代器与生成器的深入解析
函数在Python中扮演着重要角色，不仅可以封装代码逻辑，还能通过迭代器和生成器这两种强大的技术，实现更高效的数据处理和遍历。本篇博客将深入探讨Python函数的迭代器和生成器，结合实际案例为你揭示它们的神奇，以及如何巧妙地应用迭代器和生成器来解决实际问题。迭代器：数据的遍历之道迭代......
Convert excel file to csv
/**Convertexcelfiletocsv*/publicfunctionexcel_to_csv(){ini_set('max_execution_time','0');ini_set('memory_limit','1G');$xls_file=storage_path('excel_......
【python】如何将枚举指针传递至dll接口中
在Python中，可以使用 ctypes 模块来将枚举指针传递给DLL接口。以下是一个简单的示例代码，演示了如何在Python中使用 ctypes 将枚举指针传递给DLL接口：importctypes#定义枚举类型classMyEnum(ctypes.Structure):_fields_=[("value",ctypes.c_int)]#加载D......
python+playwright 学习-75 playwright 通过浏览器发送post请求
前言page.goto()可以通过浏览器直接发get请求，playwright也可以支持通过浏览器发送post请求。page.goto()使用page.goto()访问网站的时候，实际上是有返回值的，可以获取到response对象fromplaywright.sync_apiimportsync_playwright,expectwithsync_playwright()asp:......
知识图谱入门：使用Python创建知识图，分析并训练嵌入模型
本文中我们将解释如何构建KG、分析它以及创建嵌入模型。构建知识图谱加载我们的数据。在本文中我们将从头创建一个简单的KG。 https://avoid.overfit.cn/post/7ec9eb11e66c4b44bd2270b8ad66d80d......
python创建虚拟环境【其它人项目】
download他人项目-创建虚拟环境这是别人的项目打开pycahrm的终端，创建虚拟环境名字为venv【python-mvenvvenv】此时文件目录多出一个venv目录设置里面选择虚拟环境关闭pycahrm里面终端，重开会自动进入虚拟环境里面结束！......
python 垃圾回收
【第1题】Pythonn内存管理以及垃圾回收机制-武沛齐-博客园(cnblogs.com)https://www.bilibili.com/video/BV1F54114761/ 元祖总结：为了回收内存，每个对象都加入了refchain双向环向链表，对象被引用+1，del掉-1，等于0内存就被回收，这个叫引用计数器ob_refcnt；但是像列......
python 小案例正则表达式
正则表达式是一种用于匹配、查找和替换文本的强大工具。在提取网页中的目标数据时，可以使用正则表达式来搜索和匹配特定模式的文本。以下是一个使用正则表达式提取网页中的目标数据的示例代码：importre#网页源代码html="""<divclass="title">正则表达式教程</div><divc......
Python分享之python super()
一、问题的发现与提出在Python类的方法（method）中，要调用父类的某个方法，在Python2.2以前，通常的写法如代码段1：代码段1：classA:def__init__(self):print"enterA"print"leaveA"classB(A):def__init__(self):print"enterB"A.__init__(self)print......

PYTHON 快速分割CSV

相关文章

赞助商

阅读排行