2025-01-15Proj CJI Paper Reading: A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool LaAbstractbackground:本文认为现有的jailbreaking方法要么需要人力,要么需要大模型,本文不需要本文:ReNELLMTask:JailbreakingLLMblackboxMethod:PromptRewriting,ScenarioNesting,利用被攻击的LLM来生成jailbreakpromptsPromptWriting似乎是每次iterate都