这是对此已接受的 答案 的扩展。
我的数据帧:
import pandas as pd
df = pd.DataFrame(
{
'a': [-3, -1, -2, -5, 10, -3, -13, -3, -2, 1, 2, -100],
'b': [1, 2, 3, 4, 5, 10, 80, 90, 100, 99, 1, 12]
}
)
预期输出:
a b
5 -3 10
6 -13 80
7 -3 90
8 -2 100
逻辑:
a) 选择
a
中最长的负数条纹。|| |b) 例如,如果有两条大小相同的条纹,我想要
总和更大的一条。在
b
中有两条大小为4的条纹,但我想要第二条,因为
df
的总和更大。
b
我的尝试:
My Attempt:
import numpy as np
s = np.sign(df['a'])
df['g'] = s.ne(s.shift()).cumsum()
df['size'] = df.groupby('g')['g'].transform('size')
df['b_sum'] = df.groupby('g')['b'].transform('sum')
import numpy as np
def find_group_with_most_negatives(df):
"""
Finds the group in DataFrame 'df' with the most negative numbers in column 'a',
breaking ties by selecting the group with the largest sum in column 'b'.
Args:
df (pd.DataFrame): The input DataFrame with columns 'a' and 'b'.
Returns:
pd.DataFrame: A subset of the input DataFrame containing the group with the most negatives.
"""
# Calculate sign changes and group consecutive negative/positive sequences
s = np.sign(df['a'])
df['g'] = s.ne(s.shift()).cumsum()
# Calculate group size and sum of 'b' for each group
df['size'] = df.groupby('g')['g'].transform('size')
df['b_sum'] = df.groupby('g')['b'].transform('sum')
# Find the group with the maximum size (number of negatives)
max_size_group = df['size'].max()
df_filtered_size = df[df['size'] == max_size_group]
# If there are multiple groups with the same maximum size,
# filter for the group with the maximum sum of 'b'
if len(df_filtered_size) > 1:
max_b_sum = df_filtered_size['b_sum'].max()
df_filtered_size = df_filtered_size[df_filtered_size['b_sum'] == max_b_sum]
return df_filtered_size[['a', 'b']]
# Apply the function to your DataFrame
result_df = find_group_with_most_negatives(df.copy())
# Print the result
print(result_df)
Explanation:
- Sign Changes and Grouping: This part remains the same as your attempt. It calculates sign changes in column 'a' and groups consecutive negatives.
- Group Calculations: Like your attempt, it calculates the size of each group (number of negatives) and the sum of values in column 'b' for each group.
- Filtering by Maximum Size: It identifies the group with the maximum size.
- Filtering by Maximum Sum (Tiebreaker): If there are multiple groups with the same maximum size, it further filters these groups to select the one with the maximum sum of 'b' values.
- Returning the Subset: Finally, it returns the subset of the DataFrame corresponding to the selected group.
This code will efficiently identify the desired group based on the specified criteria.
标签:python,pandas,dataframe From: 78828009