ChatGPT4＋Stable Diffusion + Midjourney V5 意味着什么？

标签：Diffusion GPT4 GPT3.5 prompt AI ChatGPT4 V4 V5

AI绘画服务Midjourney发布了v5版模型.

网络上已经有了一些关于v5的介绍文章, 一般都在惊叹v5版模型生成超写实照片的能力.

当然了, 这肯定是很强的能力, 能轻松生成以假乱真的照片. 不过坦诚的说, 开源的Stable Diffusion社区也出过超写实照片版本的模型了, 而且关键是, 在创作领域, "写实"只是其中一种风格. 很多时候, 我们需要的或许是其他不同的艺术风格.

很多人忽视了这一次 MidJourney V5真正牛逼的特点, 那就是, V5更倾向于自然语言的输入, 而不是一系列关键词(prompt)!

这对AI绘画模型是一个革命性的变化. 在之前, 所有AI绘画模型, 都要求以一系列提示关键词组合即"Prompt"为输入; AI绘画结果的质量和prompt密切相关, 甚至还诞生了一个名词"prompt工程师", 即专门研究AI绘画模型提示词的工程师

但是, 现在MidJourney V5居然支持自然语言输入了!

有人会说, 即使用人类语言描述, 我也不太会描写那些脑海中的细节啊.

别忘了, 我们还有GPT4.

如果, 我们让GPT4来生成文本描述, 甚至直接利用已有的prompt扩展成自然语言描述呢? 话痨GPT是不是能把MJ V5的潜力挖掘到极致!

Talk is cheap, show me the pictures!

以下, 我们就看看, 同一个咒语在Stable Diffusion, Midjourney V4, V5, V5+GPT3.5, V5+GPT4 的加持之下, 能达到什么样的水准.

先看一个星际大战场景

咒语:

starcraft,terran,battlecruiser,combat,base,galaxy,laser gun,fire, quality,extremely detailed CG,unity 8K wallpaper,hyperdetailed,highres,cyber screen frame,absurderes,intricate and refined delicate detailed,cinematic lighting,strong rim light,brighter colours,depth of field

SD:

这个效果只能说勉强(顺便说一句, SD在科幻内容的训练语料似乎不足, 整体表现较弱).

MidJourney V4(上) | V5(下)

V4的水平已经比SD要好, V5的细节明显的更进一步.

那么, 我们在V5+Prompt的基础上再引入GPT呢?

只要简单和GPT说一句: "用英文描述一幅ai绘画的prompt, 关键词如下", 然后灌输给它prompt.

我们就得到了GPT给出的话痨版描述, 要注意的是, GPT3.5和GPT4给出的答案是不同的:

GPT3.5生成的自然文本:

This Al-generated artwork depicts a battle scene in the world of Starcraft,Create a stunning AI-generated painting with the prompt of a battle scene set in the StarCraft universe. The focus of the painting is a Terran battlecruiser engaged in combat, hovering over a strategic base in the midst of a bustling galaxy. The battlecruiser is equipped with a powerful laser gun, firing at enemy forces with intense bursts of fire. The painting should be of extremely high quality and hyperdetailed, featuring intricate and refined delicate details. The unity 8K wallpaper should showcase the painting's impressive high resolution, cyber screen frame, and absurderes style. The cinematic lighting should be strong, with a powerful rim light casting the battlecruiser and base in a bright glow. The colours should be brighter and more intense, adding depth and dimension to the painting's composition. Finally, the depth of field should add a sense of realism, focusing the viewer's attention on the battlecruiser while blurring the background for added visual impact.

4GPT4 生成的自然文本:

Title: "Terran Triumph: An Epic StarCraft Battle in 8K"

In this extremely detailed CG masterpiece, inspired by the iconic StarCraft universe, a Terran battlecruiser dominates the scene, surrounded by a fierce and electrifying combat. Set against the backdrop of a distant galaxy, the intricately designed base bustles with activity, as laser guns discharge fiery beams in a thrilling display of high-resolution warfare.

This stunning 8K wallpaper showcases hyperdetailed visuals, immersing viewers in the captivating world of StarCraft. The cyber screen frame adds a futuristic touch, while the absurdly high resolution ensures that even the most refined, delicate details are meticulously captured.

Cinematic lighting, accompanied by strong rim light, illuminates the scene, highlighting the battlecruiser's sleek design and intensifying the sense of action. Brighter colors punctuate the chaos, guiding the viewer's eye through the vast expanse of the conflict. The depth of field effect brings the central elements into sharp focus, while the unity of the composition anchors the viewer in the heart of the battle.

This exceptional 8K wallpaper pays homage to the beloved StarCraft franchise, combining cutting-edge technology and artistic skill to deliver an unforgettable visual experience filled with intricate details, refined delicacy, and unparalleled realism.

那么具体应用起来如何呢?

分别把生成文本扔到MidJourneyV5里, 结果如下:

V5+GPT3.5

V5+GPT4:

很明显, 无论是使用GPT3.5还是GPT4 驱动V5, 都比直接用prompt驱动的V5 细节更多, 而GPT4给出的结果更甚GPT3.5一筹!

各自抽一张大图对比, GPT3.5(上) V.S. 下GPT4(下):

注意看飞船的细节, 细节!

好吧, 要承认, 人类不但不需要去画画, 甚至都不需要动点脑筋去琢磨prompt组合了, 生成文本都交给GPT4就好. 我们只需要简单的告诉GPT4几个关键点, 细节驱动文本的生成, 都交给AI.

上面这例子给到GPT的prompt还略复杂, 后面有非常简单的例子, 作为人类, 只需要告诉AI寥寥几个词, 然后有了Midjourney V5+GPT4这对无敌组合, 从此, 还没上岗的prompt工程师可以直接失业了.

以下, 我们给出更多的实例, 一起来看看SD(prompt only), MJ V4(prompt only), V5 prompt, V5+GPT3.5, V5+GPT4的对比吧!

(为了节省篇幅, 不再给出GPT的话痨输出, prompt仍然奉上, 大伙儿可以自行让GPT生成)

魔法美少女

咒语:

magic girl,library underground,candles,anime,posing,very long hair,white hair,detailed beautiful hair,floating hair,diamond earring,emotionless,ribbon choker,intricated filigree,aqua eyes,glowing eyes,crystal textured skin,cloaks,detached collar,summoning,light smile,bracelets,white lace detailed stockings,frilled hat,beautiful pupil,hair ornament,parted lips,magic book,masterpiece,best quality,extremely detailed CG,unity 8K wallpaper,hyperdetailed,highres,cyber screen frame,absurderes,intricate and refined delicate detailed,cinematic lighting,strong rim light,brighter colours,depth of field,

SD系列:

MJ V4 | V5

V5+GPT3.5 | GPT4:

大图:

阳光下的少女

咒语:

Meticulous painting, sunshine, delicate, light, ancient girl, delicate facial features, watery eyes, game

SD:

MJ V4 | V5:

MJ+GPT3.5 | GPT4:

大图:

盔甲美少女

咒语:

extremely detailed CG unity 8k wallpaper, masterpiece,1girl, ahoge, armor, armored_dress, artoriapendragon(fate), blonde_hair, braid, glowing, holding, holding_sword, holding_weapon, looking_at_viewer, solo, weapon, yellow_eyes,album cover,movie poster

SD:

V4 | V5:

V5+GPT3.5 | GPT 4:

大图:

再来看看场景:

一个日漫风格的:

咒语:

street on a hill, urban, sunny day, animation concept art, clean, studio ghibli style, Clear reflection, Makoto Shinkai style, Anno Hideaki, Toriyama Akira, Oshii Mamoru, Mamoru Hosoda, Katsura Masakazu, cyberpunk, 1.2 Neon lights, 0.4 8k, octane render, animated by madhouse inc, studio trigger, vibrant color scheme, epic composition, golden ratio, cinematic

V4 | V5:

V5+GPT3.5 | GPT4:

大图:

一个幻想风格的城堡场景:

咒语:

Beautiful fantasy, castle courtyard with a stunning maple tree in the centre, autumn, hyperdetailed, beautiful lighting, Arches, steps, symmetry, shadows, Jordon grimmer, Grzegorz rutkowski, Breath of the wild, Studio ghibli

SD:

V4|V5

V5+GPT3.5 | GPT4:

大图:

新海诚风格的小镇山顶早晨

咒语:

house and street on a hill, sunrise, 2d, urban, by Makoto Shinkai

SD:

V4 | V5:

V5+GPT3.5 | GPT4:

大图:

其实... 连英文prompt都不需要烦恼, 简单的找GPT翻译就可以.

比如以下的一个中文关键词描述:

"孙悟空形象的高达，手持机械长棒，极致细节，大师之作，超级华丽，概念美术，游戏CG，杰作，高解析度，穆迪照明，延德尔效应，巨大的星空之门背景，金属,巨型机甲，月光，紫色光粒子特效，雷电特效，对称",

让GPT4翻译一下:

A Gundam with the likeness of Sun Wukong, wielding a mechanical staff, in 8K resolution, as a wallpaper. A masterpiece with ultimate attention to detail, super glamorous, created as concept art on Pixiv for game CG. Featuring high-resolution, moody lighting, and the Van Der Waals effect, with a massive gateway to the starry sky as the background. The metallic giant mecha is illuminated by moonlight, adorned with purple light particle effects, lightning effects, and symmetric features.

然后就成了可以用的咒语了

MJ V4 | V5: