首页 > 编程问答 >Pandas 数据框中的排列总和呈超指数增长

Pandas 数据框中的排列总和呈超指数增长

时间:2024-07-20 22:02:50浏览次数:21  
标签:python pandas dataframe algorithm group-by


import pandas as pd

data = {
  "Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
  "Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
  "theta": [8,9,2,12,4,5,30,3,2,1,50]

df = pd.DataFrame(data)

的pandas数据框,我想通过以下方法创建一个新列 df['feature'] :对于每个 Race_ID ,假设 Student_ID 等于i,那么我们将特征定义为

enter image description here

def f(thetak, thetaj, thetai, *theta):
  prod = 1;
  for t in theta:
    prod = prod * t;
  return ((thetai + thetaj) / (thetai + thetaj + thetai * thetak)) * prod 

其中 k,j,l 是同一个 Student_ID 中的 Race_ID ,使得 k =/= i, j=/=i,k, l=/=k,j,i 且 theta_i 是| ||与 theta 等于i。例如,对于 Student_ID =2, Race_ID =1,我们的特征等于 Student_ID f(2,3,1,4,5)+f(2,3,1,5,4)+ f(2,4,1,3,5)+f(2,4,1,5,3)+f(2,5,1,3,4)+f(2,5,1,4,3 )+f(3,2,1,4,5)+f(3,2,1,5,4)+f(3,4,1,2,5)+f(3,4,1,5 ,2)+f(3,5,1,2,4)+f(3,5,1,4,2)+f(4,2,1,3,5)+f(4,2,1 ,5,3)+f(4,3,1,2,5)+f(4,3,1,5,2)+f(4,5,1,2,3)+f(4,5 ,1,3,2)+f(5,2,1,3,4)+f(5,2,1,4,3)+f(5,3,1,2,4)+f(5 ,3,1,4,2)+f(5,4,1,2,3)+f(5,4,1,3,2)

等于 299.1960138012742。

但是作为 1很快我们就意识到,总和中的项数随着比赛中学生的数量呈超指数增长:如果一场比赛中有 n 名学生,那么就有 (n-1) 个!

总和中的项。幸运的是,由于 f 的对称性,我们可以通过注意以下事项将项数减少到仅仅 (n-1)(n-2) 项:

令 i, j,k 被给定,1,2,3(例如为了缘故)与 i,j,k 不同(即 1,2,3 在 *arg 中)。那么 f(k,j,i,1,2,3) = f(k,j,i,1,3,2) = f(k,j,i,2,1,3) = f(k, j,i,2,3,1) = f(k,j,i,3,1,2) = f(k,j,i,3,2,1)。因此,如果我们只计算任何一项,然后将其乘以 (n-3),我们就可以减少项数!


=5, Race_ID =9,则有已经有 5!=120 项求和,但是使用上述对称性,我们只需要对 5x4 = 20 项求和(k 有 5 个选择,i 有 4 个选择,l 有 1 个(非唯一选择)),即| ||f(2,3,9,5,6,10)+f(2,5,9,3,6,10)+f(2,6,9,3,5,10)+f(2 ,10,9,3,5,6)+f(3,2,9,5,6,10)+f(3,5,9,3,6,10)+f(3,6,9, 2,5,10)+f(3,10,9,2,5,6)+f(5,2,9,3,6,10)+f(5,3,9,2,6,10 )+f(5,6,9,2,3,10)+f(5,10,9,2,3,6)+f(6,2,9,3,5,10)+f(6 ,3,9,2,5,10)+f(6,5,9,2,3,10)+f(6,10,9,2,3,5)+f(10,2,9, 3,5,6)+f(10,3,9,2,5,6)+f(10,5,9,2,3,6)+f(10,6,9,2,3,5 ) Student_ID 第 5 场比赛中学生 9 的特征将等于上述总和乘以 3! = 53588.197759



So by question is: how do i write the sum for the above dataframe? I have computed the features by hand for checking and the desired outcome looks like:

import pandas as pd

data = {
  "Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
  "Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
  "theta": [8,9,2,12,4,5,30,3,2,1,50],
  "feature": [299.1960138012742, 268.93506341257876, 634.7909309816431, 204.18901708653254, 483.7234700875771, 53588.197759, 9395.539167178009, 78005.26224935807, 92907.8753942894, 118315.38359654899, 5600.243276203378]

df = pd.DataFrame(data)

Thank you so much.

import pandas as pd
from itertools import permutations

def f(thetak, thetaj, thetai, *theta):
  prod = 1
  for t in theta:
    prod = prod * t
  return ((thetai + thetaj) / (thetai + thetaj + thetai * thetak)) * prod

def calculate_feature(df):
    features = []
    for race_id in df['Race_ID'].unique():
        race_df = df[df['Race_ID'] == race_id]
        n = len(race_df)
        for i in range(n):
            thetai = race_df['theta'].iloc[i]
            student_i = race_df['Student_ID'].iloc[i]
            feature = 0
            for j, k in permutations(range(n), 2):
                if j != i and k != i and k != j:
                    thetaj = race_df['theta'].iloc[j]
                    thetak = race_df['theta'].iloc[k]
                    other_thetas = race_df['theta'].iloc[[l for l in range(n) if l not in [i, j, k]]].values
                    feature += f(thetak, thetaj, thetai, *other_thetas)
            features.append(feature * (n-3))  # Multiply by (n-3)!
    return features

data = {
  "Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
  "Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
  "theta": [8,9,2,12,4,5,30,3,2,1,50]

df = pd.DataFrame(data)
df['feature'] = calculate_feature(df)

This code defines two functions:

  1. f(thetak, thetaj, thetai, *theta) : This function calculates the individual term of your sum, as defined in your question.

  2. calculate_feature(df) : This function iterates through the dataframe and calculates the feature for each Student_ID within each Race_ID .

    • For each Race_ID , it iterates through each student and calculates the sum of f over all possible unique combinations of j and k , as you described in your optimization.
    • It then multiplies the sum by (n-3)! to account for the symmetry of f .

The final result is stored in a new column called 'feature' in the original dataframe.

From: 78760550


  • 七大排序算法的Python实现
  • Pandas学习笔记
  • python反序列化
  • 我在 python 项目中不断收到“无法识别图像文件中的数据”错误
  • Python学习笔记41:游戏篇之外星人入侵(二)
  • Python学习笔记39:进阶篇(二十八)pygame的使用之按键映射及按键失效问题解决
  • Python学习笔记40:游戏篇之外星人入侵(一)
  • Python学习笔记37:进阶篇(二十六)pygame的使用之输入处理
  • Python学习笔记38:进阶篇(二十七)pygame的使用之时间与帧数控制
  • 音频文件降噪及python示例