Pandas 数据框中的排列总和呈超指数增长

时间：2024-07-20 22:02:50浏览次数：13

标签：python pandas dataframe algorithm group-by

我有一个看起来像

import pandas as pd

data = {
  "Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
  "Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
  "theta": [8,9,2,12,4,5,30,3,2,1,50]
}

df = pd.DataFrame(data)

的pandas数据框，我想通过以下方法创建一个新列 df['feature'] ：对于每个 Race_ID ，假设 Student_ID 等于i，那么我们将特征定义为

def f(thetak, thetaj, thetai, *theta):
  prod = 1;
  for t in theta:
    prod = prod * t;
  return ((thetai + thetaj) / (thetai + thetaj + thetai * thetak)) * prod

其中 k,j,l 是同一个 Student_ID 中的 Race_ID ，使得 k =/= i, j=/=i,k, l=/=k,j,i 且 theta_i 是| ||与 theta 等于i。例如，对于 Student_ID =2， Race_ID =1，我们的特征等于 Student_ID f(2,3,1,4,5)+f(2,3,1,5,4)+ f(2,4,1,3,5)+f(2,4,1,5,3)+f(2,5,1,3,4)+f(2,5,1,4,3 )+f(3,2,1,4,5)+f(3,2,1,5,4)+f(3,4,1,2,5)+f(3,4,1,5 ,2)+f(3,5,1,2,4)+f(3,5,1,4,2)+f(4,2,1,3,5)+f(4,2,1 ,5,3)+f(4,3,1,2,5)+f(4,3,1,5,2)+f(4,5,1,2,3)+f(4,5 ,1,3,2)+f(5,2,1,3,4)+f(5,2,1,4,3)+f(5,3,1,2,4)+f(5 ,3,1,4,2)+f(5,4,1,2,3)+f(5,4,1,3,2)

等于 299.1960138012742。

但是作为 1很快我们就意识到，总和中的项数随着比赛中学生的数量呈超指数增长：如果一场比赛中有 n 名学生，那么就有 (n-1) 个！

总和中的项。幸运的是，由于 f 的对称性，我们可以通过注意以下事项将项数减少到仅仅 (n-1)(n-2) 项：

令 i， j,k 被给定，1,2,3（例如为了缘故）与 i,j,k 不同（即 1,2,3 在 *arg 中）。那么 f(k,j,i,1,2,3) = f(k,j,i,1,3,2) = f(k,j,i,2,1,3) = f(k, j,i,2,3,1) = f(k,j,i,3,1,2) = f(k,j,i,3,2,1)。因此，如果我们只计算任何一项，然后将其乘以 (n-3)，我们就可以减少项数！

因此，例如，对于

=5， Race_ID =9，则有已经有 5!=120 项求和，但是使用上述对称性，我们只需要对 5x4 = 20 项求和（k 有 5 个选择，i 有 4 个选择，l 有 1 个（非唯一选择）），即| ||f(2,3,9,5,6,10)+f(2,5,9,3,6,10)+f(2,6,9,3,5,10)+f(2 ,10,9,3,5,6)+f(3,2,9,5,6,10)+f(3,5,9,3,6,10)+f(3,6,9, 2,5,10)+f(3,10,9,2,5,6)+f(5,2,9,3,6,10)+f(5,3,9,2,6,10 )+f(5,6,9,2,3,10)+f(5,10,9,2,3,6)+f(6,2,9,3,5,10)+f(6 ,3,9,2,5,10)+f(6,5,9,2,3,10)+f(6,10,9,2,3,5)+f(10,2,9, 3,5,6)+f(10,3,9,2,5,6)+f(10,5,9,2,3,6)+f(10,6,9,2,3,5 ) Student_ID 第 5 场比赛中学生 9 的特征将等于上述总和乘以 3！ = 53588.197759

所以问题是：我如何编写上述数据帧的总和？我已经手动计算了这些特征以进行检查，所需的结果如下所示：

非常感谢。

So by question is: how do i write the sum for the above dataframe? I have computed the features by hand for checking and the desired outcome looks like:

import pandas as pd

data = {
  "Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
  "Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
  "theta": [8,9,2,12,4,5,30,3,2,1,50],
  "feature": [299.1960138012742, 268.93506341257876, 634.7909309816431, 204.18901708653254, 483.7234700875771, 53588.197759, 9395.539167178009, 78005.26224935807, 92907.8753942894, 118315.38359654899, 5600.243276203378]
}

df = pd.DataFrame(data)

Thank you so much.

import pandas as pd
from itertools import permutations

def f(thetak, thetaj, thetai, *theta):
  prod = 1
  for t in theta:
    prod = prod * t
  return ((thetai + thetaj) / (thetai + thetaj + thetai * thetak)) * prod

def calculate_feature(df):
    features = []
    for race_id in df['Race_ID'].unique():
        race_df = df[df['Race_ID'] == race_id]
        n = len(race_df)
        for i in range(n):
            thetai = race_df['theta'].iloc[i]
            student_i = race_df['Student_ID'].iloc[i]
            feature = 0
            for j, k in permutations(range(n), 2):
                if j != i and k != i and k != j:
                    thetaj = race_df['theta'].iloc[j]
                    thetak = race_df['theta'].iloc[k]
                    other_thetas = race_df['theta'].iloc[[l for l in range(n) if l not in [i, j, k]]].values
                    feature += f(thetak, thetaj, thetai, *other_thetas)
            features.append(feature * (n-3))  # Multiply by (n-3)!
    return features

data = {
  "Race_ID": [2,2,2,2,2,5,5,5,5,5,5],
  "Student_ID": [1,2,3,4,5,9,10,2,3,6,5],
  "theta": [8,9,2,12,4,5,30,3,2,1,50]
}

df = pd.DataFrame(data)
df['feature'] = calculate_feature(df)
print(df)

This code defines two functions:

f(thetak, thetaj, thetai, *theta) : This function calculates the individual term of your sum, as defined in your question.
calculate_feature(df) : This function iterates through the dataframe and calculates the feature for each Student_ID within each Race_ID .
- For each Race_ID , it iterates through each student and calculates the sum of f over all possible unique combinations of j and k , as you described in your optimization.
- It then multiplies the sum by (n-3)! to account for the symmetry of f .

The final result is stored in a new column called 'feature' in the original dataframe.

标签：python,pandas,dataframe,algorithm,group-by
From： 78760550

七大排序算法的Python实现
七大排序算法的Python实现1.冒泡排序(BubbleSort)算法思想冒泡排序通过重复交换相邻的未按顺序排列的元素来排序数组。每次迭代都将最大的元素“冒泡”到数组的末尾。复杂度分析时间复杂度:O(n^2)空间复杂度:O(1)defbubble_sort(arr):n=len(arr)for......
Pandas学习笔记
数据载入及初步观察1第一章：数据加载1.1载入数据数据集下载https://www.kaggle.com/c/titanic/overview1.1.1任务一：导入numpy和pandasimportnumpyasnpimportpandasaspdimportosasos【提示】如果加载失败，学会如何在你的python环境下安装numpy和pandas这两个库......
python反序列化
之前hgame中遇到python反序列化，这次正好借分享会来尽可能详细学习一下python反序列化基础知识什么是序列化？反序列化？在很多时候为了方便对象传输，我们往往会把一些内容转化成更方便存储、传输的形式。我们把“对象->字符串”的翻译过程称为“序列化”；相应地，把“字符串->对......
我在 python 项目中不断收到“无法识别图像文件中的数据”错误
我正在尝试向我的TK窗口添加一个图标，但我不断收到一条错误消息：Traceback(mostrecentcalllast):File"C:\Users\roger\source\repos\PythonApplication\PythonApplication.py",line7,in<module>windowIcon=tk.PhotoImage(file="C:/Users/roger/Downloa......
Python学习笔记41：游戏篇之外星人入侵(二)
前言在上一篇文章，我们已经创建好了项目目录，在今天，我们主要编写入口模块的功能。mainmain.py模块是我们游戏程序的入口，所有我们需要在模块中编写游戏主启动以及主页面相关的代码。当前我们的main模块是这样的，这是我们创建项目时默认生成一些代码，接下来我们就要进行我们......
Python学习笔记39：进阶篇(二十八)pygame的使用之按键映射及按键失效问题解决
前言基础模块的知识通过这么长时间的学习已经有所了解，更加深入的话需要通过完成各种项目，在这个过程中逐渐学习，成长。我们的下一步目标是完成pythoncrashcourse中的外星人入侵项目，这是一个2D游戏项目。在这之前，我们先简单学习一下pygame模块。私信我发送消息python资料，......
Python学习笔记40：游戏篇之外星人入侵(一)
前言入门知识已经学完，常用标准库也了解了,pygame入门知识也学了，那么开始尝试小游戏的开发。当然这个小游戏属于比较简单的小游戏，复杂的游戏需要长时间的编写累计开发经验，同时也需要一定的时间才能编写出来。现在的话还是嫩了点。从基础的简单的开始，学习实践，慢慢的成长才......
Python学习笔记37：进阶篇(二十六)pygame的使用之输入处理
前言基础模块的知识通过这么长时间的学习已经有所了解，更加深入的话需要通过完成各种项目，在这个过程中逐渐学习，成长。我们的下一步目标是完成pythoncrashcourse中的外星人入侵项目，这是一个2D游戏项目。在这之前，我们先简单学习一下pygame模块。私信我发送消息python资料，......
Python学习笔记38：进阶篇(二十七)pygame的使用之时间与帧数控制
前言基础模块的知识通过这么长时间的学习已经有所了解，更加深入的话需要通过完成各种项目，在这个过程中逐渐学习，成长。我们的下一步目标是完成pythoncrashcourse中的外星人入侵项目，这是一个2D游戏项目。在这之前，我们先简单学习一下pygame模块。私信我发送消息python资料，......
音频文件降噪及python示例
操作系统：Windows10_x64Python版本：3.9.2noisereduce版本：3.0.2从事音频相关工作，大概率会碰到降噪问题，今天整理下之前学习音频文件降噪的笔记，并提供Audacity和python示例。我将从以下几个方面展开：noisereduce库介绍使用Audacity进行降噪使用fft滤波降噪使用noisereduce进......

Pandas 数据框中的排列总和呈超指数增长

相关文章

赞助商

阅读排行