首页 > 其他分享 >3.5-io读取与存储

3.5-io读取与存储

时间:2023-10-18 15:37:14浏览次数:36  
标签:... 01 读取 io 09 3.5 2020 2022 data

3.5-io读取与存储    

3.5.1 有哪些io方式

  • 数据分析阶段的重点:分析、建模

3.5.2 读取和存储csv

  • 存储、读取、索引设置
  • 数据追加

3.5.3 读取和存储excel

  • 存储、读取、工作表设置
  • 数据追加
  In [ ]:
import pandas as pd
import numpy as np
data = pd.DataFrame(np.random.randn(1000,3),columns=["a",'b','c'],index=pd.date_range('20200101',periods=1000))
data
  Out[ ]:  
 abc
2020-01-01 -0.275344 -0.216934 -0.083554
2020-01-02 -0.667199 -0.722457 0.587480
2020-01-03 0.270504 -0.363369 1.450482
2020-01-04 -0.621467 0.067100 -0.161752
2020-01-05 0.701313 1.162024 0.929627
... ... ... ...
2022-09-22 0.264954 0.972600 0.249330
2022-09-23 0.193843 0.535633 0.796537
2022-09-24 0.209620 -0.995445 -0.202398
2022-09-25 -1.751889 0.253133 0.573625
2022-09-26 -2.156979 1.351400 0.036575

1000 rows × 3 columns

  In [ ]:
# 数据存储
data.to_csv('txt.csv')
  In [ ]:
pd.read_csv('txt.csv')
  Out[ ]:  
 Unnamed: 0abc
0 2020-01-01 -0.275344 -0.216934 -0.083554
1 2020-01-02 -0.667199 -0.722457 0.587480
2 2020-01-03 0.270504 -0.363369 1.450482
3 2020-01-04 -0.621467 0.067100 -0.161752
4 2020-01-05 0.701313 1.162024 0.929627
... ... ... ... ...
995 2022-09-22 0.264954 0.972600 0.249330
996 2022-09-23 0.193843 0.535633 0.796537
997 2022-09-24 0.209620 -0.995445 -0.202398
998 2022-09-25 -1.751889 0.253133 0.573625
999 2022-09-26 -2.156979 1.351400 0.036575

1000 rows × 4 columns

  In [ ]:
## 上面读取时候索引变了,有两种方法可以变成我们原来要的样子
## 方法一:读取时候设置索引
pd.read_csv('txt.csv',index_col=['Unnamed: 0'])
 
  Out[ ]:  
 abc
2020-01-01 -0.275344 -0.216934 -0.083554
2020-01-02 -0.667199 -0.722457 0.587480
2020-01-03 0.270504 -0.363369 1.450482
2020-01-04 -0.621467 0.067100 -0.161752
2020-01-05 0.701313 1.162024 0.929627
... ... ... ...
2022-09-22 0.264954 0.972600 0.249330
2022-09-23 0.193843 0.535633 0.796537
2022-09-24 0.209620 -0.995445 -0.202398
2022-09-25 -1.751889 0.253133 0.573625
2022-09-26 -2.156979 1.351400 0.036575

1000 rows × 3 columns

  In [ ]:
## 方法二:存储时对数据索引进行命名: date
data.index
  Out[ ]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
               '2020-01-09', '2020-01-10',
               ...
               '2022-09-17', '2022-09-18', '2022-09-19', '2022-09-20',
               '2022-09-21', '2022-09-22', '2022-09-23', '2022-09-24',
               '2022-09-25', '2022-09-26'],
              dtype='datetime64[ns]', length=1000, freq='D')
  In [ ]:
data.index.names = ['date']
  In [ ]:
data
  Out[ ]:  
 abc
date   
2020-01-01 -0.275344 -0.216934 -0.083554
2020-01-02 -0.667199 -0.722457 0.587480
2020-01-03 0.270504 -0.363369 1.450482
2020-01-04 -0.621467 0.067100 -0.161752
2020-01-05 0.701313 1.162024 0.929627
... ... ... ...
2022-09-22 0.264954 0.972600 0.249330
2022-09-23 0.193843 0.535633 0.796537
2022-09-24 0.209620 -0.995445 -0.202398
2022-09-25 -1.751889 0.253133 0.573625
2022-09-26 -2.156979 1.351400 0.036575

1000 rows × 3 columns

  In [ ]:
data.to_csv('txt.csv')   # 完全覆盖/替换
  In [ ]:
# 对已有文件进行数据追加
data2=data.tail()
data2
  Out[ ]:  
 abc
date   
2022-09-22 0.264954 0.972600 0.249330
2022-09-23 0.193843 0.535633 0.796537
2022-09-24 0.209620 -0.995445 -0.202398
2022-09-25 -1.751889 0.253133 0.573625
2022-09-26 -2.156979 1.351400 0.036575
  In [ ]:
data2.to_csv('txt.csv',mode='a',header=False)  # a追加模式且不写入列名
  In [ ]:
# excel的存储和读取
filename = 'excel.xlsx'
data.to_excel(filename,sheet_name='a')
  In [ ]:
pd.read_excel(filename)
  Out[ ]:  
 dateabc
0 2020-01-01 -0.275344 -0.216934 -0.083554
1 2020-01-02 -0.667199 -0.722457 0.587480
2 2020-01-03 0.270504 -0.363369 1.450482
3 2020-01-04 -0.621467 0.067100 -0.161752
4 2020-01-05 0.701313 1.162024 0.929627
... ... ... ... ...
995 2022-09-22 0.264954 0.972600 0.249330
996 2022-09-23 0.193843 0.535633 0.796537
997 2022-09-24 0.209620 -0.995445 -0.202398
998 2022-09-25 -1.751889 0.253133 0.573625
999 2022-09-26 -2.156979 1.351400 0.036575

1000 rows × 4 columns

  In [ ]:
# 一次写入多个sheet
with pd.ExcelWriter('writer.xlsx') as writer:
    data.to_excel(writer,sheet_name='a')
    data.to_excel(writer,sheet_name='b')
    data.to_excel(writer,sheet_name='c')
  In [ ]:
# 追加新sheet
with pd.ExcelWriter('writer.xlsx',mode='a',engine='openpyxl') as writer:
    data2.to_excel(writer,sheet_name="d")
  In [ ]:  

标签:...,01,读取,io,09,3.5,2020,2022,data
From: https://www.cnblogs.com/mlzxdzl/p/17772473.html

相关文章

  • Failed to stop auditd.service: Operation refused, unit auditd.service may be req
    [root@7~]#systemctlstopauditd.serviceFailedtostopauditd.service:Operationrefused,unitauditd.servicemayberequestedbydependencyonly(itisconfiguredtorefusemanualstart/stop).Seesystemlogsand'systemctlstatusauditd.service&#......
  • Dictionary 的五种遍历方法
    //3.0以上版本可以使用//方法一:通过var类型获取键值foreach(varitemindic){Debug.Log(item.Key+item.Value);}//方法二:使用KeyValuePair<T,K>获取foreach(KeyValuePair<string,int>k......
  • TCP Retransmission
    TCPRetransmission造成的原因有哪些?-知乎https://www.zhihu.com/question/586578058/answer/2916704280在Wireshark抓包分析中,“TCPRetransmission”(TCP重传)和"TCPDupACK"(TCP重复确认)是两种不同的现象,表示了TCP通信中可能出现的问题。下面是它们的区别:TCPRetransmi......
  • System.TypeLoadException:“程序集“XXXX.K3.SCM.App.Core, Version=1.0.0.0, Cultur
    一、问题描述:网站页面调用方法时报错:报错内容如下:System.TypeLoadException:“程序集“XXXX.K3.SCM.App.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null”中的类型“XXXX.K3.SCM.App.Core.StockService”的方法“WriteBackAfterByInWhenAudit”没有实现。”......
  • Redission并发锁报错:IllegalMonitorStateException: attempt to unlock lock, not loc
    生产上突然出现一条报错j.l.IllegalMonitorStateException:attempttounlocklock,notlockedbycurrentthreadbynodeid:1411e030-3c44-48d7-9eb6-6030022ce681thread-id:111ato.r.RedissonBaseLock.lambda$unlockAsync$2(RedissonBaseLock.java:323)......
  • linux centos部署minio
    1.单节点部署cd/usr/localwgethttps://dl.minio.io/server/minio/release/linux-amd64/miniochmod+xminio./minioserver/data#use#启动后会打印出AccessKey和SecretKey等信息./minioserver/data/minio_oss_srv#自定义MINIO_ACCESS_KEY和MINIO_SECRET_......
  • DCS_HADDOP_Introduction
     Haddop https://www.bilibili.com/video/BV1sb4y1k7cQ?p=1&vd_source=8b9de621639420a0ceb703aceed712f7 第一章、大数据简介一、概述大数据:无论哪个机构对大数据进行定义,实际上都是围绕对海量数据进行快速有效的处理二、特点1.Volumn:数据体量大。2.Variety:种类样......
  • Lock wait timeout exceeded; try restarting transaction临时解决办法
    错误日志Errorupdatingdatabase.Cause:com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException:Lockwaittimeoutexceeded;tryrestartingtransaction###Theerrormayinvolvecom.zhonghe.userim.dao.mysql.CrmRoleDao.updateById-Inline###SQL:UPDATE......
  • ERROR: The Python ssl extension was not compiled. Missing the OpenSSL lib?
    CentOS7pyenv安装Python3.10.13报错yuminstall-yopenssl-developenssl11-developenssl11-libCPPFLAGS="-I/usr/include/openssl11"LDFLAGS="-L/usr/lib64/openssl11-lssl-lcrypto"pyenvinstall-v3.10.13[root@Chatglm2-6B~]#pyenvversi......
  • iOS开发之——xcode Developer Mode DIsabled
       真机测试遇到——isnotpairedwithyourcomputer 然后xcode显示DeveloperModeDIsabled隐私与安全性——>安全性——>开发者模式将开发者模式打开后,设备重启重启设备后,查看开发者模式是否打开参考:https://blog.csdn.net/Calvin_zhou/article/details/129......