pandas\dataframe

时间：2024-07-29 20:06:07浏览次数：17

标签：subset netflix movie dataframe movies duration 1990s pandas

# Importing pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt

# Read in the Netflix CSV as a DataFrame
netflix_df = pd.read_csv("netflix_data.csv")

# Subset the DataFrame for type "Movie"
netflix_subset = netflix_df[netflix_df["type"] == "Movie"]

# Filter the to keep only movies released in the 1990s
# Start by filtering out movies that were released before 1990
subset = netflix_subset[(netflix_subset["release_year"] >= 1990)]

# And then do the same to filter out movies released on or after 2000
movies_1990s = subset[(subset["release_year"] < 2000)]

# Another way to do this step is to use the & operator which allows you to do this type of filtering in one step
# movies_1990s = netflix_subset[(netflix_subset["release_year"] >= 1990) & (netflix_subset["release_year"] < 2000)]

# Visualize the duration column of your filtered data to see the distribution of movie durations
# See which bar is the highest and save the duration value, this doesn't need to be exact!
plt.hist(movies_1990s["duration"])
plt.title('Distribution of Movie Durations in the 1990s')
plt.xlabel('Duration (minutes)')
plt.ylabel('Number of Movies')
plt.show()

duration = 100

# Filter the data again to keep only the Action movies
action_movies_1990s = movies_1990s[movies_1990s["genre"] == "Action"]

# Use a for loop and a counter to count how many short action movies there were in the 1990s

# Start the counter
short_movie_count = 0

# Iterate over the labels and rows of the DataFrame and check if the duration is less than 90, if it is, add 1 to the counter, if it isn't, the counter should remain the same
for label, row in action_movies_1990s.iterrows() :
if row["duration"] < 90 :
short_movie_count = short_movie_count + 1
else:
short_movie_count = short_movie_count

print(short_movie_count)

# A quicker way of counting values in a column is to use .sum() on the desired column
# (action_movies_1990s["duration"] < 90).sum()

标签：subset,netflix,movie,dataframe,movies,duration,1990s,pandas
From： https://www.cnblogs.com/hotshotgg/p/18330953

如何使用 group by 对 pandas 数据框进行排序
我正在处理类似于下面示例的数据框：importpandasaspdimportnumpyasnpnp.random.seed(0)np.random.seed(0)df=pd.DataFrame({'date':np.tile(['2024-05-01','2024-06-01'],4),'State':np.repeat(['fl&#......
检索 Panda Dataframe 列中列表的最后一个元素
我有这个csv文件示例：（将其读为项目AAA成本1000，项目AAA(1)成本2000）ColumnNameAAA(1000)AAA(1)(2000)我想创建所有数字的pandas数据框列即Column_cost10002000我尝试拆分thisby'('返回类似以下内容的内容（因为第二个项目的名称中有'('：Result......
Pandas 将从 OECD 修订后的统计 API 中获取数据
OECD更改了其统计API。在以前的版本中，此结构有效。oecd=pdmx.Request("OECD")data=oecd.data(resource_id="HH_DASH",key="AUS+DEU+ITA+JPN+KOR+GBR+USA+EU27_2020+EMU+G7M+OECD.RGDP_INDEX+RHHGDI_INDEX.Q/all?startTime=2007-Q1&endTime=202......
如何从pandas中的字符串中提取带有变量的正则表达式？
我有一个包含文本的数据框列，我想创建一个新列，其中包含带有名称的句子，但没有其他句子。希望最终结果如下所示：我能够从名称列表中识别包含名称的单元格，但我在提取包含名称的句子的部分上遇到了困难。importreimportpandasaspdimportnumpyasnpdf=pd.Dat......
来自 PyArrow ChunkedArray 的虚拟编码 PyArrow 表，无需通过 pandas？
假设我importpyarrowaspaca=pa.chunked_array([['a','b','b','c']])print(ca)<pyarrow.lib.ChunkedArrayobjectat0x7fc938bcea70>[["a","b","b","......
如何使用 Pandas 解析函数处理 Excel 中的合并单元格？
我有一个包含合并的列和行的Excel文件，我想读取该Excel文件并解析它以将其转换为DataFrame。这只是所发生情况的一个小示例，因为我拥有的真实数据非常多很大，有很多桌子。这就是Excel文件的样子：当我尝试时xl=pd.read_excel('file')我得到了这个：......
从混合字符串中查找 pandas 中的字符数
我正在寻找pandasdf中混合列中字母的数量。如果字母数量超过2，请保留最接近数字的两个字母。我尝试使用下面的方法获取计数df['count']=len(re.findall('[a-zA-Z]',a['MyCOlumn']))但出现以下错误：TypeError:Expectedstringorbytes-像对象输......
Python酷库之旅-第三方库Pandas(050)
目录一、用法精讲181、pandas.Series.var方法181-1、语法181-2、参数181-3、功能181-4、返回值181-5、说明181-6、用法181-6-1、数据准备181-6-2、代码示例181-6-3、结果输出182、pandas.Series.kurtosis方法182-1、语法182-2、参数182-3、功能182-4、返回值1......
如何向 python pandas 数据透视表添加过滤器？ (pd.read_excel)
我正在尝试使用pandas库在python中重新创建我在Excel中制作的数据透视表。我试图用时间段作为过滤器来总结超过500k行的OD总行程数据。在Excel上，我只会执行行（O）、列（D）、值（行程）、过滤器（时间）。到目前为止，在python上我只有索引、列、值、aggfunc，但我不知道如何过滤。有......
如何过滤 pandas 数据帧以查找包含给定列表中所有子字符串的字符串
我试图过滤掉数据帧的行，其中名称“question”下的字符串列包含给定列表中的所有子字符串。也就是说，如果给定的子字符串列表是['King','England']，那么我需要保留数据框中df.question中的字符串同时包含King和England的所有行。此代码执......

pandas\dataframe

相关文章

赞助商

阅读排行