- 微信公众号:机器学习炼丹术
- 笔记:陈亦新
- 参考论文:DeepSTAPLE: Learning to predict multimodal registration quality for unsupervised domain adaptation、
- 参考code:multimodallearning/deep_staple (github.com)
introduction
To overcome these issues transferring existing annotations from a labeled source to the target domain is desirable.
想要实现这个label transfer需要两个步骤:
- multiple sample annotations are transferred to target images via image registration。通过配准来将样本标注迁移到target images上
- Secondly label fusion can be applied to build the label consensus。
这里的label fusion和label consensus需要后续理解一下。
method
Data Paremeters
上面的论文formulate the data parameter and curriculum learning approach as a modification altering the logits input of the loss function.
通过一个可学习的,logits-weighting,改进可以在不同的场景中展示,比方说有噪声的训练样本,或者不同类在训练过程被加权。我们的Data Parameters (DP)focus on per-sample parameters的。这里的STAPLE的策略需要参考下面的论文:
我们后续再来看上面的论文,应该是有某一种策略来计算出STAPLE。
这里先对计算出来的DP增加了一个sigmoid:
然后看到这个损失,就是segmentation的crossentropy的损失,这个DP作为每一个样本的权重:
Risk Regularisation
【为什么要用这个?】
Even when a foreground class is present in the image and a registered target label only contains background voxels, the network can achieve a zero-loss value by overfitting.
As a consequence, upweighting the over-fitted samples will be of no harm in terms of loss reduction which leads to the upweighting of maximal noisy (empty) samples.
因此加了一个risk regularisation来鼓励网络冒险:
和分别表示positive和negative predicted voxel count。
当预测更多的target voxels的时候,样本可以减少损失。比方说,被归类成clean sample。
这个公式是平衡的,因为如果预测是不正确的,那么预测更多正体素将会增加交叉熵比重。
这个我还不理解
Fixed wegihting scheme
We found that the parameters have a strong correlation with the ground-truth voxels present in their values. Applying a fixed compensation weighting to the data parameters can improve the correlation of the learned parameters and out target scores:
相当于对于DP做了一个矫正,因为发现DP是存在一定的偏差的。
Out-of-line backpropagation process for improved stability
数据参数和模型参数存在inter-dependency,在预测不准确的早期时期会产生问题。
通过两步走的方法来解决:
- 先训练main model
- 再data parameters (out-of-line)
这样既可以保持稳定,又可以估计label noise。
【什么是out-of-line?】
When using the out-of-line, two-step approach data parameter optimization becomes a hypothesis of "waht would help the model optimizing right now?" without intervening.
consensus generation via weightd voting
这一段可能只有看了代码,才能更好的理解了。
Experiments
继续填坑。。。
Dataset
我们的实验选择了一个多模态分割任务,是CorssMoDa挑战赛的一部分。(这个挑战赛我之前搞过,就是多模态迁移的一个挑战赛)。
数据包含:
- contrast-enhanced T1-weighted brain tumour MRI scans(384/448x384/4[email protected])
- high-resolution T2-weighted images([email protected])
我们还使用了TCIA dataset来提供omiited labels of the C rossModa challenge which served as orcle-labels.
预处理部分:
- Prior to training isotropic resampling to 0.5mmx0.5mmx0.5mm was performed as well as cropping the data to 128x128x128vox around the tumour.
- 这里不禁产生了一个问题,是如何crop到tumour的周围128的呢?我猜测,是根据label的来crop的。相当于对入组数据做了一定的约束
- We omitted the provided cochlea labels and train on binary masks of backgroun/tumour.我们忽视了提供的其他标签,只做二分类任务。
- As the tumour is either contained on the right- or left size of the hemisphere, we flipped the right samples to provide pre-oriented training data and omit the data without tumour structures.大脑肿瘤要么在右侧和左侧,我们将在右侧的样本进行了反转,并且省略了没有肿瘤的样本。
- For the 2D experiments we sliced the last data dimension.
Model and training settings
【2D segmentation】
- For 2D segmentation, we employ a LR-ASPP MobileNetV3-Large model
- AdamW优化器,0.0005 learning rate,batch=32,cosine annealing schedule with restart after 500 batch steps and multiplication factor of 2
- For the data parameters, we use SparseAdam-optrimizer implementation
【3D segmentation】
- For 3D experiments we use a custom 3D-MobileNet backbone with an adapted 3D-LR-ASPP head.
- 0.01 learning rate,batch=8,exponentially decayed scheduling with factor 0.99。
- during training,我们没有做weight-clipping,weight decay of l2正则 on data parameters
- parameters DP were initialized with a value of 0
- For all experiments,we used spatial affine and bspline augmentation and random-noise-augmentation on image intensities。
- prior to augmenting we upscaled the input image and labels to 256x256 px in 2D and 192x192x192 vox in 3D training。
- 数据被分成三分之一validation,三分之二training。
- use global class weights
Experiment I
- 2D model training,artificially disturbed ground-truth
Experiment II
- 2D model training quality-mixed registered single-atlas labels
- use 30T1-weighted images as fixed targets and T2-weighted image 和 labels作为moving pairs。
- 配准使用了Convex ADam方法。
- 我们选择了两种配准质量来展示对训练的影响:
- best-quality registration:the single best registration with an average of around 80% orical-Dice across all atlas registrations
- Combined-quality:
Experiment III
- 3D 配准用了iterative deeds和Convex Adam。
- 这里意识到一个有趣的事情,假设我们的fixed data有40个,然后我们将moving data对fixed data做配准,每一个fixed data用10个moving data来配准,那么其实最终可以得到400个fixed data和label的pair。
上图展示了in-line 和out-of-line训练data parameters的区别,inline就会很差。这里除了Dice,还展示了Spearman-corr,这两幅图像也可以做这个spearman-corr的吗?
Experiment IV
- Consensus generation and subsequent network training
- 通过两种配准方法,我们得到了两个sonsensi:
- 10 deed registered @ 40 fixed
- 10 Convex Adam registered @ 40 fixed
- Consensi were built by applying the STAPLE algorithm as baseline and opposed to that our proposed weighted-sum method on data parameters
- 在这个基础上训练了几个nnUnet的模型来作为分割
这就是最终的性能展示,发现使用STAPLE和DP的方法的性能在63~67之间。