首页 > 其他分享 >Pandas备忘录

Pandas备忘录

时间:2024-08-22 20:18:54浏览次数:4  
标签:df column DataFrame 备忘录 dataframe print my Pandas

DataFrames are the central data structure in the pandas API. It‘s like a spreadsheet, with numbered rows and named columns.

为方便引入例程,先导入对应模块。

1 import pandas as pd
View Code

The following code instantiates a  pd.DataFrame  class to generate a DataFrame.

 1 # Create and populate a 5x2 NumPy array.
 2 my_data = np.array([[0, 3], [10, 7], [20, 9], [30, 14], [40, 15]])
 3 
 4 # Create a Python list that holds the names of the two columns.
 5 my_column_names = ['temperature', 'activity']
 6 
 7 # Create a DataFrame.
 8 my_dataframe = pd.DataFrame(data=my_data, columns=my_column_names)
 9 
10 # Print the entire DataFrame
11 print(my_dataframe)
View Code

You may add a new column to an existing pandas DataFrame just by assigning values to a new column name.

1 # Create a new column named adjusted.
2 my_dataframe["adjusted"] = my_dataframe["activity"] + 2
3 
4 # Print the entire DataFrame
5 print(my_dataframe)
View Code

Pandas provide multiples ways to isolate specific rows, columns, slices or cells in a DataFrame.

print("Rows #0, #1, and #2:")
print(my_dataframe.head(3), '\n')

print("Row #2:")
print(my_dataframe.iloc[[2]], '\n') # The type of result is DataFrame.
print("Row #2:")
print(my_dataframe.iloc[2], '\n') # The type of the result is Series.
print("Rows #1, #2, and #3:")
print(my_dataframe[1:4], '\n') # Note the index starts from the second row not 
# 1st

print("Column 'temperature':")
print(my_dataframe['temperature'])
View Code

Q: What's the difference between Series and DataFrame? 

A: The former is a column(Google Gemini insists row but I don't know why) of the latter.

How to index a particular cell of the DataFrame?

 1 # Create a Python list that holds the names of the four columns.
 2 my_column_names = ['Eleanor', 'Chidi', 'Tahani', 'Jason']
 3 
 4 # Create a 3x4 numpy array, each cell populated with a random integer.
 5 my_data = np.random.randint(low=0, high=101, size=(3, 4))
 6 
 7 # Create a DataFrame.
 8 df = pd.DataFrame(data=my_data, columns=my_column_names)
 9 
10 # Print the entire DataFrame
11 print(df)
12 
13 # Print the value in row #1 of the Eleanor column.
14 print("\nSecond row of the Eleanor column: %d\n" % df['Eleanor'][1]) #Chained # indexing
View Code

The following code shows how to create a new column to an existing DataFrame through row-by-row calculation between or among columns:

1 # Create a column named Janet whose contents are the sum
2 # of two other columns.
3 df['Janet'] = df['Tahani'] + df['Jason']
4 
5 # Print the enhanced DataFrame
6 print(df)
View Code

Pandas provides two different ways to duplicate a DataFrame:

  • Referencing: 藕不断丝连。
  • Copying: 相互独立。

 

 1 # Create a reference by assigning my_dataframe to a new variable.
 2 print("Experiment with a reference:")
 3 reference_to_df = df
 4 
 5 # Print the starting value of a particular cell.
 6 print("  Starting value of df: %d" % df['Jason'][1])
 7 print("  Starting value of reference_to_df: %d\n" % reference_to_df['Jason'][1])
 8 
 9 # Modify a cell in df.
10 df.at[1, 'Jason'] = df['Jason'][1] + 5 # Why not using Chained Indexing for #DataFrame assignment?
11 print("  Updated df: %d" % df['Jason'][1])
12 print("  Updated reference_to_df: %d\n\n" % reference_to_df['Jason'][1])
View Code

There're a lot of differences among  .iloc ,  .at  and Chained indexing. It seems the last one might not be a proper way for assignment, though it can exchange positions freely with  .at  generating exactly the same output, superficially.

The following code shows an experiment of a copy(to B finished)

copy_of_my_dataframe = my_dataframe.copy()
View Code

 

标签:df,column,DataFrame,备忘录,dataframe,print,my,Pandas
From: https://www.cnblogs.com/ArmRoundMan/p/18360508

相关文章

  • df['料品分类'].apply(format_value) 是一个 Pandas 操作,用于对 DataFrame 中的 '料品
    df['料品分类'].apply(format_value)是一个Pandas操作,用于对DataFrame中的'料品分类'列的每个值应用一个名为format_value的函数,并将处理后的结果返回给这一列。分解解释df['料品分类']:这部分代码选择DataFramedf中名为'料品分类'的列。df是一个PandasDat......
  • df.iterrows() 是 Pandas 中的一个方法,用于在遍历 DataFrame 时,逐行返回每一行的索引
    df.iterrows()是Pandas中的一个方法,用于在遍历DataFrame时,逐行返回每一行的索引和数据。它生成一个迭代器,每次迭代时返回一个(index,Series)对,index是行索引,Series是该行的数据。详细解释df.iterrows():这个方法遍历DataFrame的每一行。每次迭代时,返回的是(ind......
  • 矢量化操作是 Pandas 的一个强大特性
    矢量化操作是Pandas的一个强大特性,它允许你对整个DataFrame或Series进行操作,而不需要显式地写出循环。矢量化操作利用底层的C语言实现和优化,使得它在处理大数据集时比循环效率更高。使用矢量化操作替代iterrows()的示例假设你有以下DataFramedf,并且你想要在每一行上......
  • 豆瓣评分8.7!Python pandas创始人亲码的数据分析入门手册!
    在众多解释型语言中,Python最大的特点是拥有一个巨大而活跃的科学计算社区。进入21世纪以来,在行业应用和学术研究中采用python进行科学计算的势头越来越猛。近年来,由于Python有不断改良的库(主要是pandas),使其成为数据处理任务的一大代替方案,结合其在通用编程方面的强大实力,完......
  • 备忘录——C#创建钉钉OA审批实例
    目录1.钉钉接口文档及SDK2.钉钉中创建应用3.代码段3.1获取Token3.2通过手机号获取钉钉UserID等信息3.3创建流程审批实例1.钉钉接口文档及SDK完整发起审批流程实例的步骤:https://open.dingtalk.com/document/orgapp/tutorial-creating-or-updating-an-approval-template调用......
  • 【Linux系列】应急响应 · 备忘录
    这些命令和文件可以帮助你快速定位问题、查找可疑文件、监控进程等。请注意,这些命令可能需要root权限才能执行。查找72小时内新增的文件:find/-ctime-2查找24小时内被修改的JSP文件:find./-mtime0-name"*.jsp"根据确定时间去反推变更的文件:ls-al/tmp|gre......
  • python入门机器学习4:pandas入门
     一.Series:一维数组,listimportnumpyasnpimportpandasaspdmyarray=np.array([1,2,3])myindex=['a','b','c']myseries=pd.Series(myarray,index=myindex)print(myseries)print(myseries[0])#第一个元素print(myseries['c'])#in......
  • Langchain pandas agent - Azure OpenAI account
    Langchainpandasagent结合AzureOpenAI账户使用时,主要涉及到通过AzureOpenAI提供的自然语言处理能力,来操作pandasDataFrame或进行相关的数据处理任务。以下是关于这一结合使用的详细解析:一、Langchainpandasagent概述在LangChain中,Agent是一个核心概念,它代表了......
  • git command 工作中常用命令备忘录
    模拟目前工作流程在gitlabfork需要开发的项目到自己仓库分配一个工作任务(feature、improvment、bug)本地从个人仓库克隆项目gitclonehttp://mylocal/group/project本地添加对于远端项目gitremoteaddupstreamhttp://dev.xxx.io/group/project基于远端仓库切出本......
  • python 利用高德得到地址对应的经伟度,由于地址原因在指定时间范围内得不到经伟度而终
    importrequests,sysimportjson,math,xlrd,xlwt,time#!/usr/bin/envpython#-*-coding:utf-8-*-frommathimportsin,asin,cos,radians,fabs,sqrtimportpandasaspdfromgeopy.distanceimportgeodesicfromopenpyxlimportload_workbookimportred......