Bark 语音克隆

请阅读

这段代码在 Python 3.10 上运行，我没有在其他版本上测试过。某些较老版本可能会有问题。

使用 Bark 进行高质量的语音克隆？

现在可以实现了。

https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer/assets/36931363/516375e2-d699-44fe-a928-cd0411982049

如何克隆语音？

对于开发者：

Huggingface 模型页面上的代码示例

对于所有人：

克隆的语音不太逼真，为什么别人的克隆语音比我的要好？

确保你的语音输入中没有以下这些内容：（没有特定顺序）

噪音（可以先使用噪音去除器）
音乐（也有去除音乐的工具）（除非你想要背景音乐）
结尾被截断（这会导致生成过程试图继续）
少于1秒的训练数据（我个人建议大约10秒，但5秒也能有很好的效果）

什么样的提示音频比较好？（没有特定顺序）

语音清晰
没有奇怪的背景噪音
只有一个说话者
在句子结束后音频结束
常规/普通语音（通常更成功，虽然能够克隆复杂语音，但效果不如普通语音）
大约10秒的数据

预训练模型

官方

名称	HuBERT 模型	量化器版本	轮次	语言	数据集
quantifier_hubert_base_ls960.pth	HuBERT Base	0	3	英文	GitMylo/bark-semantic-training
quantifier_hubert_base_ls960_14.pth	HuBERT Base	0	14	英文	GitMylo/bark-semantic-training
quantifier_V1_hubert_base_ls960_23.pth	HuBERT Base	1	23	英文	GitMylo/bark-semantic-training

社区

作者	名称	HuBERT 模型	量化器版本	轮次	语言	数据集
HobisPL	polish-HuBERT-quantizer_8_epoch.pth	HuBERT Base	1	8	波兰文	Hobis/bark-polish-semantic-wav-training
C0untFloyd	german-HuBERT-quantizer_14_epoch.pth	HuBERT Base	1	14	德文	CountFloyd/bark-german-semantic-wav-training

对于开发者：在 Bark 项目中实现语音克隆

只需将文件从这个目录复制到你的项目中。
HuBERT 管理器包含下载 HuBERT 和自定义量化器模型的方法。
加载CustomHuBERT应该非常简单。
笔记本包含在 cuda 或 cpu 上使用的代码，而不仅仅是 cpu。

from hubert.pre_kmeans_hubert import CustomHubert
import torchaudio

# 加载 HuBERT 模型，
# 默认配置下，checkpoint_path 应该使用 data/models/hubert/hubert.pt
hubert_model = CustomHubert(checkpoint_path='path/to/checkpoint')

# 运行模型从音频文件中提取语义特征，这里的 wav 是你的音频文件
wav, sr = torchaudio.load('path/to/wav')  # 这是你加载 wav 的地方，可以使用 soundfile 或 torchaudio

if wav.shape[0] == 2:  # 如果需要，从立体声转换为单声道
    wav = wav.mean(0, keepdim=True)

semantic_vectors = hubert_model.forward(wav, input_sample_hz=sr)

加载和运行自定义 kmeans

import torch
from hubert.customtokenizer import CustomTokenizer

# 从检查点加载 CustomTokenizer 模型
# 使用默认配置，你可以使用 Huggingface 上的预训练模型
# 使用 HuBERTManager 的默认设置，这将在 data/models/hubert/tokenizer.pth
tokenizer = CustomTokenizer.load_from_checkpoint('data/models/hubert/tokenizer.pth')  # 自动使用正确的层

# 处理前面 HuBERT 运行的语义向量（这可以批量处理，因此你可以发送整个 HuBERT 输出）
semantic_tokens = tokenizer.get_token(semantic_vectors)

# 恭喜！你现在有了可以在讲者提示文件中使用的语义令牌。