保序回归
保序回归其实是reliability diagram回归的自动版,即自动选择分桶的边界。如下图:
从mse的角度来说,loss会比OLS大,但是保证了不改变模型的排序能力。个人认为,对于contrastive loss或者是cross entropy loss,都会由于优化目标与p(y|x)的真实值有区别。如果选模型用的是auc的话,那么用其他的如logistic regression做校准会改变模型的排序能力。这时候需要思考取舍,是准度更重要还是排序能力更重要。
虽然[1]中描述的用法需要预先分桶,但是在sklearn的实际使用中其实不需要分桶都可以获得正确的曲线。
1 ir = IsotonicRegression(out_of_bounds="clip") 2 ir_prob_list = y_ = ir.fit_transform(all_prob_list, all_label_list) 3 x = all_prob_list 4 y = all_label_list 5 df = pd.DataFrame({"x": x, "y": y, "y_": y_}) 6 df = df.sort_values("x") 7 plt.plot(df.x, df.y, "C0.") 8 plt.plot(df.x, df.y_, "C1.-")
附录: 计算ece的代码
1 def compute_ece(prob_list, label_list, num_bins=20): 2 bins = np.linspace(0, 1, num_bins+1) 3 bin_indices = np.digitize(prob_list, bins, right=True) 4 5 bin_sums = np.bincount(bin_indices, weights=prob_list, minlength=num_bins+2)[1:-1] 6 bin_true = np.bincount(bin_indices, weights=label_list, minlength=num_bins+2)[1:-1] 7 bin_total = np.bincount(bin_indices, minlength=num_bins+2)[1:-1] 8 9 non_empty_bins = bin_total > 0 10 bin_diffs = np.abs(bin_true[non_empty_bins]/bin_total[non_empty_bins] - bin_sums[non_empty_bins]/bin_total[non_empty_bins]) 11 12 bin_weights = bin_total[non_empty_bins] / bin_total.sum() 13 ece = (bin_diffs * bin_weights).sum() 14 15 return ece
[1] http://vividfree.github.io/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/2015/12/21/classifier-calibration-with-isotonic-regression
标签:bin,non,校准,df,calibration,list,prob,bins From: https://www.cnblogs.com/kunrenzhilu/p/17630969.html