python - 从文本生成音乐

标签：python generate

请给我一些建议

为了解释一下，我输入“深度睡眠的睡眠音乐”，它将返回一个 wav 文件： https://www.youtube.com/watch?v=1wAdQhFJy54 或者我给出一个 wav 文件，它会返回相同的

现在这是我尝试过的：

https://github.com/facebookresearch/audiocraft 但它的质量很低

  import torchaudio
  from audiocraft.models import MusicGen

  au_crmode = MusicGen.get_pretrained(pre, cache_dir=pret_loca)
  au_crmode.set_generation_params(duration=kwargs['thoigian'])

  melody, sr = torchaudio.load('./176_183.wav')
  # generates using the melody from the given audio and the provided descriptions.
  wav = au_crmode.generate_with_chroma(input_text, melody[None].expand(3, -1, -1), sr)

  for idx, one_wav in enumerate(wav):
       audio_write(f'{idx}', one_wav.cpu(), au_crmode.sample_rate, strategy="loudness")

或

processor = AutoProcessor.from_pretrained(pretra[0], cache_dir=pret_loca)
model = MusicgenForConditionalGeneration.from_pretrained(pretra[0], cache_dir=pret_loca)

sample_length = 30
num_tokens = sample_length * model.config.audio_encoder.frame_rate + 3
inputs = processor(
    text=input_text,
    padding=True,
    return_tensors="pt",
).to(device)
audio_values = model.generate(
    **inputs,
    max_new_tokens=num_tokens,  # defines the length of the generated music piece
    guidance_scale=g_scale,  # controls the creativity level  # The guidance_scale is used in classifier free guidance...Higher guidance scale encourages the model to generate samples that are more closely linked to the input prompt
    do_sample=True,  # enables stochastic sampling, making the generation process more creative  # sampling leads to significantly better results than greedy (do_sample=False)
)

https://github.com/yy1lab/Lyrics-Conditioned-Neural-Melody-Generation 无论是在 python 2.7 还是更新版本中我都无法运行
https://github.com/lucidrains/musiclm -pytorch 似乎 hubert 中的模型有错误；就我而言，它们是

KeyError: 'hubert_pretraining'

with HuBERT Base or

'NoneType' 对象没有属性 'task'

with HuBERT Large

任何理想

将文本转换为音乐是一个复杂的任务，虽然 Audiocraft 等现有模型显示出了希望，但生成高质量且多样化的音乐仍然是一个挑战。让我们探索一些改进结果的想法以及解决遇到的特定问题的建议：

提高音乐生成质量的技巧

微调预训练模型： 不要从头开始训练，而是使用更大的音乐数据集微调 Audiocraft 等预训练模型。这可以提高生成音乐的质量和与期望风格的一致性。
实验生成参数： Audiocraft 和 MusicGen 提供了诸如 guidance_scale 和 duration 之类的参数。尝试不同的值可以显著影响结果。更高的 guidance_scale 值可以提高与输入文本的一致性，而较低的值则可以提供更多样化的输出。
分层方法： 考虑将文本到音乐的生成分解为更小的、更易于管理的任务。例如，首先生成 MIDI 表示的旋律或和声进行，然后使用专用模型将其合成为音频。
强化学习： 探索强化学习技术来训练模型，根据感知质量、一致性和其他与音乐相关的标准的奖励函数来生成音乐。
生成对抗网络 (GAN)： GAN 在生成逼真的图像和音频方面已显示出希望。训练一个 GAN，其中生成器创建音乐，而鉴别器则试图区分真实音乐和生成的音乐。

解决特定代码问题

Audiocraft 质量低： 这可能是由于模型训练不足或数据集有限造成的。考虑使用更大的数据集微调 Audiocraft 模型以获得更好的结果。此外，如前所述，尝试不同的生成参数。
Lyrics-Conditioned-Neural-Melody-Generation 兼容性： 该项目可能需要特定的 Python 版本或依赖项。创建一个虚拟环境并使用 requirements.txt 文件安装必要的依赖项。如果仍然遇到问题，请在项目的 GitHub 页面上寻求帮助。
MusicLM-pytorch 中的 Hubert 错误： 这些错误表明配置或模型加载存在问题。确保使用的是兼容版本的 fairseq 和 hubert ，并且模型已正确加载。检查项目文档或问题页面以获取有关这些特定错误的解决方案。

其他资源和模型

MuseNet (OpenAI)： MuseNet 是一个大型语言模型，经过训练可以生成具有不同乐器和风格的 4 分钟长的音乐作品。
Jukebox (OpenAI)： Jukebox 是一个生成式音乐模型，可以生成完整的歌曲，包括人声，具有惊人的保真度。
Amper Music： Amper Music 是一个人工智能音乐创作平台，允许用户创建自定义免版税音乐。

请记住，文本到音乐的生成是一个快速发展的领域。请关注新的模型、技术和数据集，以增强的音乐生成努力。

标签：python,generate
From： 78809784

python - 从文本生成音乐

提高音乐生成质量的技巧

解决特定代码问题

其他资源和模型

相关文章

赞助商

阅读排行