学习歌曲之美：神经网络歌声美化器

这个代码库是我们 ACL-2022 论文的官方 PyTorch 实现。

0. 数据集 (PopBuTFy) 获取

音频样本

您可以从这里下载数据集。请发送邮件给我们进行注册（详见申请表）。
数据集预览。

文本标签

NeuralSVB 不需要文本作为输入，但是用于提取 PPG 的 ASR 模型需要文本。因此我们也提供了 PopBuTFy 的文本标签。

1. 准备工作

环境准备

大多数所需的包都在 https://github.com/NATSpeech/NATSpeech/blob/main/requirements.txt

或者您可以使用仓库目录中的 Requirements.txt 文件准备环境。

pip install Requirements.txt

数据准备

提取声音音色嵌入：

CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config egs/datasets/audio/PopBuTFy/save_emb.yaml

打包数据集：

CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config egs/datasets/audio/PopBuTFy/para_bin.yaml

声码器准备

我们提供了预训练的 HifiGAN-Singing 模型，该模型专门为具有 NSF 机制的 SVS 设计。

在训练声学模型之前，请将预训练的声码器解压到 checkpoints 目录。

这个歌唱声码器是在 100 多小时的歌唱数据（包括中文和英文歌曲）上训练的。

PPG 提取器准备

我们提供了预训练的 PPG 提取器模型。

在训练声学模型之前，请将预训练的 PPG 提取器解压到 checkpoints 目录。

按照上述说明操作后，目录结构应如下所示：

.
|--data
    |--processed
        |--PopBuTFy (解压 PopBuTFy.zip)
            |--data
                |--包含 wav 文件的目录
    |--binary
        |--PopBuTFyENSpkEM
|--checkpoints
    |--1009_pretrain_asr_english
        |--
        |--config.yaml
    |--1012_hifigan_all_songs_nsf
        |--
        |--config.yaml

2. 训练示例

CUDA_VISIBLE_DEVICES=0,1 python tasks/run.py --config egs/datasets/audio/PopBuTFy/vae_global_mle_eng.yaml --exp_name exp_name --reset

3. 推理

从打包的测试集进行推理

CUDA_VISIBLE_DEVICES=0,1 python tasks/run.py --config egs/datasets/audio/PopBuTFy/vae_global_mle_eng.yaml --exp_name exp_name --reset --infer

推理结果默认保存在 ./checkpoints/EXP_NAME/generated_ 目录下。

我们提供了：

NSVB（英文版）的预训练模型；

记得将预训练模型放在 checkpoints 目录中。

从原始输入进行推理

开发中。

局限性

请参阅我们论文中附录 D "局限性和解决方案"。

引用

如果这个代码库对您的研究有所帮助，请引用：

@inproceedings{liu-etal-2022-learning-beauty,
title = "Learning the Beauty in Songs: Neural Singing Voice Beautifier",
author = "Liu, Jinglin  and
  Li, Chengxi  and
  Ren, Yi  and
  Zhu, Zhiying  and
  Zhao, Zhou",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.549",
pages = "7970--7983",}