首页 > 其他分享 >A Literature Survey about Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels

A Literature Survey about Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels

时间:2024-01-31 10:46:14浏览次数:26  
标签:about Literature prompt tuning CLIP boldsymbol robustness noise Prompt

I.Summary Overview

Background: A vision-language model can be adapted to a new classification task through few-shot prompt tuning. We find that such a prompt tuning process is highly robust to label noises.
Interest: Studying the key reasons contributing to the robustness of the prompt tuning paradigm.

Findings:

  1. the fixed classname tokens provide a strong regularization to the optimization of the model, reducing gradients induced by the noisy samples;
  2. the powerful pre-trained image-text embedding that is learned from diverse and generic web data provides strong prior knowledge for image classification.

II.Research Interests

The author studies the key reasons contributing to the robustness of the prompt tuning paradigm.

III.Problems Solved

In author's work, they demonstrate that prompt tuning is robust to noisy labels, and investigate the mechanisms that enable this robustness.

IV.Previous Research

While prompt tuning has proven effective when training on downstream tasks with accurately annotated datasets, their robustness to noisy labels has been neglected.

V.Author's Innovation

The author investigates the mechanisms that enable this robustness and proposes a simple yet effective method for unsupervised prompt tuning, showing that randomly selected noisy pseudo labels can be effectively used to enhance CLIP zero-shot performance.

VI.Author's Contribution

  • We demonstrate that prompt tuning for pre-trained vision-language models (e.g., CLIP) is more robust to noisy labels than traditional transfer learning approaches, such as model fine-tuning and linear probes.
  • We further demonstrate that prompt tuning robustness can be further enhanced through the use of a robust training objective.
  • We conduct an extensive analysis on why prompt tuning is robust to noisy labels to discover which components contribute the most to its robustness.
  • Motivated by this property, we propose a simple yet effective method for unsupervised prompt tuning, showing that randomly selected noisy pseudo labels can be effectively used to enhance CLIP zero-shot performance. The proposed robust prompt tuning outperformed prior work on a variety of datasets, even though noisier pseudo-labels are used for self-training.

VII.Algorithm Flow

Recent Research

  • CLIP: CLIP applies prompt engineering to incorporate the category information in the text input such that its pre-trained model can adapt to various image classification tasks without further training.
  • CoOp: CoOp introduces learnable prompts optimized on target datasets to address CLIP's problem
  • ProDA: ProDA tackles CoOp's issue by utilizing diverse prompts to capture the distribution of varying visual representations.
  • UPL: UPL proposes a framework to perform prompt tuning without labeled data.
  • TPT: TPT achieves zero-shot transfer by dynamically adjusting prompts using only a single test sample.
  • Potential of prompt tuning: Label noise-robust learning
  • Label noise-robust learning
    • robust losses that tolerate noisy labels
    • loss correction approaches that estimate a transition matrix to correct the predictions
    • meta-learning frameworks that learn to correct the label noise in training examples
    • regularization techniques that are customized to lower the negative impact of noise

Existing Problems

  • CLIP: the design of a proper prompt is challenging and requires heuristics.
  • CoOp: CoOp has also faced criticism for disregarding the diversity of visual representations.

Author's Processing

  • Demonstrate that prompt tuning on CLIP naturally holds powerful noise robustness.
  • Explore the key factors behind such robustness.
  • Show its application on unsupervised prompt tuning.

Constructed Model

  • CLIP
    In the case of image classification, a normalized image embedding \(\boldsymbol{f}^v\) is obtained by passing an image \(\boldsymbol{x}\) through CLIP's visual encoder, and a set of normalized class embeddings \([\boldsymbol{f}^t_i]^K_{i=1}\) by feeding template prompts of the form "A photo of a " into CLIP's text encoder.

\[Pr(y=i|\boldsymbol{x})=\frac{\exp(sim(\boldsymbol{f}^v,\boldsymbol{f}^t_i))/\tau}{\sum_{j=1}^K\exp(sim(\boldsymbol{f}^v,\boldsymbol{f}^t_j))/\tau} \]

  • Prompt Tuning
    The name of a class c is first converted into a classname embedding \(\boldsymbol{w}\in R^d\) and prepended with a sequence of \(M\) learnable tokens \(\boldsymbol{p_m}\in R^d\) shared across all classes.

\[P_c=[\boldsymbol{p_1}, \boldsymbol{p_2}, \cdots, \boldsymbol{p_M}, \boldsymbol{w_c}]\rightarrow \boldsymbol{f}^t_c \]

CoOp optimizes the shared learnable tokens \(\boldsymbol{p_1}, \boldsymbol{p_1}, \cdots, \boldsymbol{p_M}\) on a small labeled dataset \(D = [(\boldsymbol{x_i}, c_i)^N_{i=1}]\) to minimize the cross-entropy loss

\[L_{CE}=-E_{(\boldsymbol{x},c)\in D}[\log Pr(y=c|\boldsymbol{x})]. \]

  • Robust Prompt Tuning
    Further enhance this robustness by optimizing the learnable prompts using the generalized cross-entropy (GCE) loss

\[L_{GCE}=E_{(\boldsymbol{x},c)\in D}[\frac{1-Pr(y=c|\boldsymbol{x})^q}{q}]. \]

  • Author's Conclusion: \(q = 0.7\) leads to overall good performance across several experimental settings.

VIII.Robustness Analysis

Different Model Structures

Pre-trained CLIP Generates Effective Class Embeddings

  • Author's Conclusions:
    • Classifier-R v.s. Classifier-C: CLIP class embeddings provide a strong initialization for few-shot learning.
    • TEnc-FT v.s. Classifier-C: The highly expressive CLIP text encoder can easily overfit to the noisy labels.
    • Prompt Tuning v.s. Classifiers: The text encoder is essential for providing a strong but informative regularization of the text embeddings to combat noisy inputs.
    • Prompt Tuning v.s. TEnc-FT: The text encoder should be fixed to prevent overfitting.

Effectiveness of Prompt

  • Author's Conclusions:
    • Full Prompt Tuning v.s. CLS Tuning: The class embeddings generated by CLIP pre-trained text encoder plays a critical role in noise robustness.
  • Hypothesis:
    • The classname token \(\boldsymbol{w_c}\) provides a strong regularization to the model, since it is leveraged by the text encoder to encode relationships between the different visual concepts.

Prompt Tuning Suppresses Noisy Gradients

  • Prompt tuning can suppress gradient updates from noisy samples, while aggregating gradients from clean samples.
  • This property likely arises from the highly constrained prompt tuning optimization, which restricting the model to fit the noisy labels.

Generalization Across Model Architectures

  • Context length
    • The optimal context length is dataset dependent.
  • Image encoders
    • ViT-B/32-PT outperforms RN50-PT under most settings. Moreover, both methods do not suffer from a large performance drop and maintain competitive accuracy at high noise rates.

Robustness to Correlated Label Noise

  • Confusion noise: Each mislabeled sample is labeled as the incorrect class that is most favored by zero-shot CLIP.
  • Author's Conclusions:
    • Confusion noise presents a bigger challenge to transfer learning, leading to larger degradation of classification accuracy at high noise ratios compared to random noise.
    • Prompt tuning still achieves the best overall performance, providing further evidence for its robustness even to more challenging types of noise.

IX.Application to Unsupervised Prompt Tuning

  • Baseline UPL
    • Phase 1: Leverage pre-trained CLIP to generate pseudo labels for unlabeled images.
    • Phase 2: Select the \(K\) most confident samples per class to optimize the learnable tokens through the typical prompt-tuning optimization process (described in CoOp).
    • Features: UPL improved transfer performance by ensembling multiple predictions generated by models with different learnable prompts.
  • Robust UPL
    • Overview: Based on UPL, randomly sample \(K\) training samples and optimize the prompt with the robust GCE loss

X.Summary And Views

Summary

This paper focus on prompt tuning to research and analyze the attribution of robustness to label noise that it has naturally. And the author also combines the findings with the UPL model and proposes a more robust UPL model in unsupervised prompt tuning.

Personal Views

Firstly I learned a lot from this paper which analysis the robust of prompt tuning to label noise. This research spirit and methodology is a great need in motivating me to work on the research of robustness. And what's impresses me most is the robust UPL model that is the author's innovation about the previous research.

XI.Domain Learning

  • Vision-language model
  • text-image embedding and image-text embedding
  • few-shot prompt tuning
  • fixed classname tokens
  • zero-shot learning
  • downstream tasks: few-shot learning, continual learning, object segmentation
  • model-informed structure
  • traditional fine-tuning and linear probing paradigms
  • generalized cross-entropy (GCE)
  • VisionLanguage Pre-Trained Models (VL-PTMs)
  • meta-learning

References

标签:about,Literature,prompt,tuning,CLIP,boldsymbol,robustness,noise,Prompt
From: https://www.cnblogs.com/LZHMS/p/17998714

相关文章

  • Prompt实战优化
    1.概述在深度学习领域,Prompt(提示语)被广泛应用于自然语言处理任务中,如文本生成、机器翻译和问答系统等。Prompt的设计对模型的性能和生成结果有着重要的影响,因此在实际应用中合理而有效地利用Prompt是提升模型表现的关键策略之一。本篇博客笔者将为大家介绍如何通过反复修改Prompt......
  • windows console prompt modification
    bysettingprompttobe$d$t@$p$_$g$Sinenvironmentvariables.referenceChangeyourDOS/Commandprompthttp://www.chami.com/tips/windows/020497W.htmlcmd-HowdoIchangethecommand-linepromptinWindows?-StackOverflowhttps://stackoverflow.c......
  • 经典Prompt欣赏 - 金庸群俠傳
    这个GPTs可以在下面地址体验:https://chat.openai.com/g/g-puVi10p7j-jin-yong-qun-xia-chuan体验开始时,先输入你要扮演的角色然后就会让你每次都有一个选择:实现为了便于理解,我把原Prompt中的繁体改成了简体,最初的这段英文做了翻译。Youarea"GPT"–aversionof......
  • 经典Prompt欣赏 - 金庸群俠傳
    这个GPTs可以在下面地址体验:https://chat.openai.com/g/g-puVi10p7j-jin-yong-qun-xia-chuan体验开始时,先输入你要扮演的角色然后就会让你每次都有一个选择:实现为了便于理解,我把原Prompt中的繁体改成了简体,最初的这段英文做了翻译。Youarea"GPT"–aversionof......
  • 矩阵号:日入100+,八大提示词(Prompt)使用技巧
    最近在搞头条矩阵,发现自己的指令写的太烂了,一个指令将会决定你的写作质量。收益比较拉垮,50个号收益好的,也就这么几个号。于是我扒了一些提示词的操作技巧,分享一下自己的学习心得。先说理论知识,实操放文章最后。我们与GPT沟通交流时,可以用到乔哈里()沟通视窗模型,它分为......
  • Stable Diffusion Prompt
    Prompt俗称咒语,实际上也是很难完全把控,在实际生图过程中需要不断的摸索。本文从“规则”、“原理”、“结合扩散模型”三个角度对Prompt进行探讨,希望小伙伴们能对Prompt整体有立体的认识。一、规则1、增强/减弱(emphasized)实质是:缩放语义向量:::warning()强度变为1.1倍[]......
  • 问题:What is this passage mainly about
    问题:WhatisthispassagemainlyaboutA.DemonstratingtheseriousweatherconditionaroundLakeChad.B.Introducingpeople'sactivityaroundLakeChad.C.AnalyzingofthefactorsthatcausewaterdecreasesinLakeChad.D.Introducingscientists'work......
  • [Windows] 视频拍摄必备神器,桌面提词器(TelePrompter)-2.7.1
    随着短视频内容的兴起,越来越多的小伙伴开始尝试拍摄视频。但没有足够的经验,可能面对镜头就懵了,不仅磕巴,还会忘词。今天介绍的这款工具是主持人、记者常用的桌面提词器,有做短视频或直播的小伙伴可以试试这款免费工具。TelePrompter是一款易于使用、功能强大的现代Windows文本/演......
  • Prompt Engineering 可能会是 2024 年最热门的“编程语言”?
    编者按:“PromptEngineering”是否已经过时?模型本身的能力是否已经足够,不再需要特意设计prompt?我们今天为大家带来的文章,作者认为PromptEngineering不会过时,相反随着模型能力的增强,编写高质量prompt的重要性也将继续增加。文章详细论点归纳:(1)大语言模型应被视为操作系统的内......
  • 解密Prompt系列23.大模型幻觉分类&归因&检测&缓解方案脑图全梳理
    上一章我们主要聊聊RAG场景下的幻觉检测和解决方案,这一章我们单独针对大模型的幻觉问题,从幻觉类型,幻觉来源,幻觉检测,幻觉缓解这四个方向进行整理。这里就不细说任意一种方法了,因为说不完根本说不完,索性用脑图概览式地看下整个大模型幻觉领域。主要参考以下两篇论文ASurveyonHa......