DataFrames are the central data structure in the pandas API. It‘s like a spreadsheet, with numbered rows and named columns.
为方便引入例程,先导入对应模块。
1 import pandas as pdView Code
The following code instantiates a pd.DataFrame class to generate a DataFrame.
1 # Create and populate a 5x2 NumPy array. 2 my_data = np.array([[0, 3], [10, 7], [20, 9], [30, 14], [40, 15]]) 3 4 # Create a Python list that holds the names of the two columns. 5 my_column_names = ['temperature', 'activity'] 6 7 # Create a DataFrame. 8 my_dataframe = pd.DataFrame(data=my_data, columns=my_column_names) 9 10 # Print the entire DataFrame 11 print(my_dataframe)View Code
You may add a new column to an existing pandas DataFrame just by assigning values to a new column name.
1 # Create a new column named adjusted. 2 my_dataframe["adjusted"] = my_dataframe["activity"] + 2 3 4 # Print the entire DataFrame 5 print(my_dataframe)View Code
Pandas provide multiples ways to isolate specific rows, columns, slices or cells in a DataFrame.
print("Rows #0, #1, and #2:") print(my_dataframe.head(3), '\n') print("Row #2:") print(my_dataframe.iloc[[2]], '\n') # The type of result is DataFrame. print("Row #2:") print(my_dataframe.iloc[2], '\n') # The type of the result is Series. print("Rows #1, #2, and #3:") print(my_dataframe[1:4], '\n') # Note the index starts from the second row not # 1st print("Column 'temperature':") print(my_dataframe['temperature'])View Code
Q: What's the difference between Series and DataFrame?
A: The former is a column(Google Gemini insists row but I don't know why) of the latter.
How to index a particular cell of the DataFrame?
1 # Create a Python list that holds the names of the four columns. 2 my_column_names = ['Eleanor', 'Chidi', 'Tahani', 'Jason'] 3 4 # Create a 3x4 numpy array, each cell populated with a random integer. 5 my_data = np.random.randint(low=0, high=101, size=(3, 4)) 6 7 # Create a DataFrame. 8 df = pd.DataFrame(data=my_data, columns=my_column_names) 9 10 # Print the entire DataFrame 11 print(df) 12 13 # Print the value in row #1 of the Eleanor column. 14 print("\nSecond row of the Eleanor column: %d\n" % df['Eleanor'][1]) #Chained # indexingView Code
The following code shows how to create a new column to an existing DataFrame through row-by-row calculation between or among columns:
1 # Create a column named Janet whose contents are the sum 2 # of two other columns. 3 df['Janet'] = df['Tahani'] + df['Jason'] 4 5 # Print the enhanced DataFrame 6 print(df)View Code
Pandas provides two different ways to duplicate a DataFrame:
- Referencing: 藕不断丝连。
- Copying: 相互独立。
1 # Create a reference by assigning my_dataframe to a new variable. 2 print("Experiment with a reference:") 3 reference_to_df = df 4 5 # Print the starting value of a particular cell. 6 print(" Starting value of df: %d" % df['Jason'][1]) 7 print(" Starting value of reference_to_df: %d\n" % reference_to_df['Jason'][1]) 8 9 # Modify a cell in df. 10 df.at[1, 'Jason'] = df['Jason'][1] + 5 # Why not using Chained Indexing for #DataFrame assignment? 11 print(" Updated df: %d" % df['Jason'][1]) 12 print(" Updated reference_to_df: %d\n\n" % reference_to_df['Jason'][1])View Code
There're a lot of differences among .iloc , .at and Chained indexing. It seems the last one might not be a proper way for assignment, though it can exchange positions freely with .at generating exactly the same output, superficially.
The following code shows an experiment of a copy(to B finished)
copy_of_my_dataframe = my_dataframe.copy()View Code
标签:df,column,DataFrame,备忘录,dataframe,print,my,Pandas From: https://www.cnblogs.com/ArmRoundMan/p/18360508