• 2024-09-20ACL会议2024-MPLMM精读
    论文地址:MultimodalPromptLearningwithMissingModalitiesforSentimentAnalysisandEmotionRecognition-ACLAnthology代码地址:GitHub-zrguo/MPLMM:[ACL2024Main]OfficialPyTorchimplementationofthepaper"MultimodalPromptLearningwithMissingMo
  • 2024-09-01多模态大模型
    ASurveyonMultimodalLargeLanguageModelshttps://arxiv.org/pdf/2306.13549多模态大预言模型,其是基于LLM,同时具有了接收、推理、输出多模态信息的能力。Inlightofthiscomplementarity,LLMandLVMruntowardseachother,leadingtothenewfieldofMultimodalL
  • 2024-08-27DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
    DocKylin:ALargeMultimodalModelforVisualDocumentUnderstandingwithEfficientVisualSlimmingarxiv:http://arxiv.org/abs/2406.19101视觉处理器+LLM:视觉处理器:SwinTransformer创新点:通过:1、去除图片冗余像素;2、去除冗余token。来减小模型中的视觉处理器的参数量
  • 2024-08-03How to pass multimodal data directly to models
    Howtopassmultimodaldatadirectlytomodelshttps://python.langchain.com/v0.2/docs/how_to/multimodal_inputs/Herewedemonstratehowtopassmultimodalinputdirectlytomodels.WecurrentlyexpectallinputtobepassedinthesameformatasOpenAIe
  • 2024-07-14机器人前沿--PalmE:An Embodied Multimodal Language Model 具身多模态大(语言)模型
    首先解释这篇工作名称Palm-E,发表时间为2023.03,其中的Palm是谷歌内部在2022.04开发的大语言模型,功能类似ChatGPT,只是由于各种原因没有那样火起来,E是Embodied的首字母,翻译过来就是具身多模态大语言模型大模型,我们一般习惯将其称为具身多模态大模型。何为具身?这个词听起来非常
  • 2024-06-21Reflective Journal Final
    1.Initially,Ithoughtthatdigitalmultimodalwritingsimplycombinestraditionaltextwritingwithmultimediaelementssuchasimages,audio,video,etc.However,asIexploredthisfieldmoredeeply,Icametorealizethatdigitalmultimodalwritingis
  • 2024-06-21Reflective Journal Final
    ReflectiveJournalFinal1.Atthebeginningofthecourse,Ijusthaveablurryunderstandingofdigitalmultimodalcomposing.Afterhavingcoursesformanytimes,Igraduallygraspedtheconceptofdigitalmultimodalcomposing.Thekeyresidesin“multim
  • 2024-06-20Reflective Journal Final
    Firstofall,Iwouldliketothankmyteachers,LiuFulanandZhouMengchen,fortheirguidancethroughoutthesemester.Iamalsoverygratefultotheteachersforgivingmethisopportunitytolearndigitalmultimodalwritingsystematically.AlthoughI
  • 2024-06-08GLaMM : Pixel Grounding Large Multimodal Model
    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Abstract大型多模态模型(LMM)将大语言模型扩展到视觉领域。最初的LMM使用整体图像和文本提示词来生成无定位的文本响应。最近,区域级LMM已被用于生成视觉定位响应。然而,它们仅限于一次仅引用单个目标类别,要求用户指定
  • 2024-05-27LGMRec Local and Global Graph Learning for Multimodal Recommendation
    目录概符号说明MotivationLGMRecLocalGraphEmbeddingGlobalGraphEmbeddingFusion代码GuoZ.,LiJ.,LiG.,WangC.,ShiS.andRuanB.LGMRec:Localandglobalgraphlearningformultimodalrecommendation.AAAI,2024.概本文采用分解的方法进行对ID和模态信
  • 2024-03-25Reflective Journal 1
    Inthepasttwoweeks,Ihavelearnedtoinfertheemotionsofcharactersfromdetails.Forexample,inthefirstclass,weobservedtheprotagonist'semotionsthroughthedetailsofthevideo"TheNecklace",anddiscoveredherunwillingnes
  • 2024-03-24Reflective Journal I
    Duringthetwoweeksofstudy,Ihavegainedadeeperunderstandingofmultimodalwriting,whichisacombinationoftext,images,audioandvideosoastoexpressinformationmorecomprehensivelyandunderstandthecontentmoreintuitively.Besides,Iha
  • 2024-03-21A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation
    目录概FREEDOMMotivationFrozenItem-ItemgraphDenoisingUser-ItemBipartiteGraphTwoGraphsforLearning代码ZhouX.andShenZ.Ataleoftwographs:Freezinganddenoisinggraphstructuresformultimodalrecommendation.概本文主要是对LATTICE的改进.FREE
  • 2023-12-16Instruction-Following Agents with Multimodal Transformer
    概述提出了InstructRL,包含一个multimodaltransformer用来将视觉obs和语言的instruction进行编码,以及一个transformer-basedpolicy,可以基于编码的表示来输出actions。前者在1M的image-text对和NL的text上进行训练,后者跟踪了整个obs和act的历史,自回归地输出动作。问题纯语言
  • 2023-01-16Embracing Domain Differences in Fake News- Cross-domain Fake News Detection using Multimodal Data(AA
    一、摘要随着社交媒体的快速发展,假新闻已经成为一个重大的社会问题,它无法通过人工调查及时解决。这激发了大量关于自动假新闻检测的研究。大多数研究探索了基于新闻记录
  • 2022-12-21论文解读:Multimodal Machine Translation with Embedding Prediction
    论文解读:MultimodalMachineTranslationwithEmbeddingPrediction  机器翻译中有一个非常重要的问题即是对未知词(unknownword)和罕见词(rareword)的预测。有许多工作着
  • 2022-11-25【五期杨志】CCF-A (KDD'20) Multimodal Learning with Incomplete Modalities by Knowledge Distillation
    WangQ,ZhanL,ThompsonP,etal.Multimodallearningwithincompletemodalitiesbyknowledgedistillation[C]//Proceedingsofthe26thACMSIGKDDInternatio
  • 2022-10-25全球名校AI课程库(13)| CMU卡内基梅隆 · 多模态机器学习课程『Multimodal Machine Learning』