Multimodal

2024-12-27祝大家这周圣诞快乐！！本周进军多模态！From LLMs to MLLMs:Exploring the Landscape of Multimodal Jailbreaking
从LLMs到MLLMs:探索多模态越狱攻击的前景禁止盗用，侵权必究！！！欢迎大家积极举报
2024-12-20Apollo: An Exploration of Video Understanding in Large Multimodal Models
本文是LLM系列文章，针对《Apollo:AnExplorationofVideoUnderstandinginLargeMultimodalModels》的翻译。阿波罗：大型多模态模型中的视频理解探索摘要1引言2现有的视频问答基准有多有效？3缩放一致性：在模型设计过程中，你能做到多小？4探索视频LMM设计空间：什么
2024-12-10驾校预约系统｜Java｜SSM｜VUE｜前后端分离
【技术栈】1⃣️：架构:B/S、MVC2⃣️：系统环境：Windowsh/Mac3⃣️：开发环境：IDEA、JDK1.8、Maven、Mysql5.7+4⃣️：技术栈：Java、Mysql、SSM、Mybatis-Plus、VUE、jquery,html5⃣️数据库可视化工具：navicat6⃣️服务器：SpringBoot自带apachetom
2024-11-30SpringBoot 在新冠密接者跟踪系统中的应用：可扩展性与适应性的完美结合
第3章系统分析在进行系统分析之前，需要从网络上或者是图书馆的开发类书籍中收集大量的资料，因为这个环节也是帮助即将开发的程序软件制定一套最优的方案，一旦确定了程序软件需要具备的功能，就意味着接下来的工作和任务都是围绕着这个方案执行的，所以系统分析需要对程序功能反复
2024-11-29计算机网络八股整理（四）
目录八股整理（四）应用层1：怎么解决tcp粘包？2:tcp的拥塞控制介绍一下？网络场景1：描述一下打开百度首页后发生的网络过程？2：网页非常慢转圈圈的时候需要从哪些方面考虑问题？3：servera和serverb如何判断两个服务器是否正常连接？4：服务器ping不通但是http请求能请求成功，会出现这种情况
2024-11-26树莓派5自启动.py（一）
要在树莓派5上设置名为 ydd5.py 的脚本在启动时自动运行，您可以按照下述步骤使用 systemd 方法进行设置。（脚本名为ydd5.py且存放在/home/wu/wu/YDD目录下）一、使用systemd设置自启动打开终端（Terminal）创建systemd服务文件：输入以下命令来创建一个新的服务文件：sud
2024-09-20ACL会议2024-MPLMM精读
论文地址：MultimodalPromptLearningwithMissingModalitiesforSentimentAnalysisandEmotionRecognition-ACLAnthology代码地址：GitHub-zrguo/MPLMM:[ACL2024Main]OfficialPyTorchimplementationofthepaper"MultimodalPromptLearningwithMissingMo
2024-09-01多模态大模型
ASurveyonMultimodalLargeLanguageModelshttps://arxiv.org/pdf/2306.13549多模态大预言模型，其是基于LLM，同时具有了接收、推理、输出多模态信息的能力。Inlightofthiscomplementarity,LLMandLVMruntowardseachother,leadingtothenewfieldofMultimodalL
2024-08-27DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
DocKylin:ALargeMultimodalModelforVisualDocumentUnderstandingwithEfficientVisualSlimmingarxiv:http://arxiv.org/abs/2406.19101视觉处理器+LLM：视觉处理器：SwinTransformer创新点：通过：1、去除图片冗余像素；2、去除冗余token。来减小模型中的视觉处理器的参数量
2024-08-03How to pass multimodal data directly to models
Howtopassmultimodaldatadirectlytomodelshttps://python.langchain.com/v0.2/docs/how_to/multimodal_inputs/Herewedemonstratehowtopassmultimodalinputdirectlytomodels.WecurrentlyexpectallinputtobepassedinthesameformatasOpenAIe
2024-07-14机器人前沿--PalmE：An Embodied Multimodal Language Model 具身多模态大(语言)模型
首先解释这篇工作名称Palm-E，发表时间为2023.03，其中的Palm是谷歌内部在2022.04开发的大语言模型，功能类似ChatGPT，只是由于各种原因没有那样火起来，E是Embodied的首字母，翻译过来就是具身多模态大语言模型大模型，我们一般习惯将其称为具身多模态大模型。何为具身？这个词听起来非常
2024-06-21Reflective Journal Final
1.Initially,Ithoughtthatdigitalmultimodalwritingsimplycombinestraditionaltextwritingwithmultimediaelementssuchasimages,audio,video,etc.However,asIexploredthisfieldmoredeeply,Icametorealizethatdigitalmultimodalwritingis
2024-06-21Reflective Journal Final
ReflectiveJournalFinal1.Atthebeginningofthecourse,Ijusthaveablurryunderstandingofdigitalmultimodalcomposing.Afterhavingcoursesformanytimes,Igraduallygraspedtheconceptofdigitalmultimodalcomposing.Thekeyresidesin“multim
2024-06-20Reflective Journal Final
Firstofall,Iwouldliketothankmyteachers,LiuFulanandZhouMengchen,fortheirguidancethroughoutthesemester.Iamalsoverygratefultotheteachersforgivingmethisopportunitytolearndigitalmultimodalwritingsystematically.AlthoughI
2024-06-08GLaMM : Pixel Grounding Large Multimodal Model
郑重声明：原文参见标题，如有侵权，请联系作者，将会撤销发布！ Abstract大型多模态模型(LMM)将大语言模型扩展到视觉领域。最初的LMM使用整体图像和文本提示词来生成无定位的文本响应。最近，区域级LMM已被用于生成视觉定位响应。然而，它们仅限于一次仅引用单个目标类别，要求用户指定
2024-05-27LGMRec Local and Global Graph Learning for Multimodal Recommendation
目录概符号说明MotivationLGMRecLocalGraphEmbeddingGlobalGraphEmbeddingFusion代码GuoZ.,LiJ.,LiG.,WangC.,ShiS.andRuanB.LGMRec:Localandglobalgraphlearningformultimodalrecommendation.AAAI,2024.概本文采用分解的方法进行对ID和模态信
2024-03-25Reflective Journal 1
Inthepasttwoweeks,Ihavelearnedtoinfertheemotionsofcharactersfromdetails.Forexample,inthefirstclass,weobservedtheprotagonist'semotionsthroughthedetailsofthevideo"TheNecklace",anddiscoveredherunwillingnes
2024-03-24Reflective Journal I
Duringthetwoweeksofstudy,Ihavegainedadeeperunderstandingofmultimodalwriting,whichisacombinationoftext,images,audioandvideosoastoexpressinformationmorecomprehensivelyandunderstandthecontentmoreintuitively.Besides,Iha
2024-03-21A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation
目录概FREEDOMMotivationFrozenItem-ItemgraphDenoisingUser-ItemBipartiteGraphTwoGraphsforLearning代码ZhouX.andShenZ.Ataleoftwographs:Freezinganddenoisinggraphstructuresformultimodalrecommendation.概本文主要是对LATTICE的改进.FREE
2023-12-16Instruction-Following Agents with Multimodal Transformer
概述提出了InstructRL，包含一个multimodaltransformer用来将视觉obs和语言的instruction进行编码，以及一个transformer-basedpolicy，可以基于编码的表示来输出actions。前者在1M的image-text对和NL的text上进行训练，后者跟踪了整个obs和act的历史，自回归地输出动作。问题纯语言
2023-01-16Embracing Domain Differences in Fake News- Cross-domain Fake News Detection using Multimodal Data(AA
一、摘要随着社交媒体的快速发展，假新闻已经成为一个重大的社会问题，它无法通过人工调查及时解决。这激发了大量关于自动假新闻检测的研究。大多数研究探索了基于新闻记录
2022-12-21论文解读：Multimodal Machine Translation with Embedding Prediction
论文解读：MultimodalMachineTranslationwithEmbeddingPrediction 机器翻译中有一个非常重要的问题即是对未知词（unknownword）和罕见词（rareword）的预测。有许多工作着
2022-11-25【五期杨志】CCF-A (KDD'20) Multimodal Learning with Incomplete Modalities by Knowledge Distillation
WangQ,ZhanL,ThompsonP,etal.Multimodallearningwithincompletemodalitiesbyknowledgedistillation[C]//Proceedingsofthe26thACMSIGKDDInternatio