Abstract
背景:
- the de facto standard to assess the quality of DNNs in the industry is to check their performance (accuracy) on a collected set of labeled test data
- test selection can save labor and then be used to assess the model
前提: the model should have similar prediction accuracy on the data which have similar distances to the decision boundary
本文:Aries
Github: https://github.com/wellido/Aries
Task: estimate the performance of DNNs on new unlabeled data using only the information obtained from the original test data
实验:
数据集:CIFAR-10, Tiny-ImageNet
对象:CIFAR10-ResNet20, CIFAR10-VGG16, TinyImageNet-ResNet101, TinyImageNet-DenseNet, 13 types of data transformation methods.
Competitors: Cross Entropy-based Sampling, Practical Accuracy Estimation
效果:
- Aries accuracy与真实accuracy的效果只差了0.03%-2.60%
- outperforms other labeling-free methods in 50/52 cases
- outperforms other selection-labeling based methods in 96/128 cases
3. Methodology
Finding 1: A DNN has similar accuracy on the data sets that have similar distances to the decision boundary.
Finding 2: There is a linear relationship between the % of highly confident data (LV R = 1) and the accuracy of the whole set. Therefore, given some labeled data, if we know 1) the accuracy of the DNN in each Bucket, and 2) the percentage of highly confident data, it is promising to estimate the accuracy of the new unlabeled data.