whisper-diarization

<h1 align="center">使用OpenAI Whisper进行说话人分类</h1> <p align="center"> <a href="https://github.com/MahmoudAshraf97/whisper-diarization/stargazers"> <img src="https://yellow-cdn.veclightyear.com/2b54e442/6b3f5b55-229d-4288-b1e5-13fb4b44f183.svg?colorA=orange&colorB=orange&logo=github" alt="GitHub 星标"> </a> <a href="https://github.com/MahmoudAshraf97/whisper-diarization/issues"> <img src="https://yellow-cdn.veclightyear.com/2b54e442/fc014b33-2a45-4c47-8f1b-0f628b50a668.svg" alt="GitHub 问题"> </a> <a href="https://github.com/MahmoudAshraf97/whisper-diarization/blob/master/LICENSE"> <img src="https://yellow-cdn.veclightyear.com/2b54e442/ed9f7d3a-6f58-48e0-854b-c87b9b759334.svg" alt="GitHub 许可证"> </a> <a href="https://twitter.com/intent/tweet?text=&url=https%3A%2F%2Fgithub.com%2FMahmoudAshraf97%2Fwhisper-diarization"> <img src="https://yellow-cdn.veclightyear.com/2b54e442/3d48638f-87ff-406d-b1fb-2487a776df99.svg?style=social" alt="Twitter"> </a> </a> <a href="https://colab.research.google.com/github/MahmoudAshraf97/whisper-diarization/blob/main/Whisper_Transcription_%2B_NeMo_Diarization.ipynb"> <img src="https://yellow-cdn.veclightyear.com/2b54e442/21f038b2-afbb-4ce5-ad79-761378a46abe.svg" alt="在Colab中打开"> </a> </p>

基于OpenAI Whisper的说话人分类流程我要感谢@m-bain提供的批量Whisper推理，@mu4farooqi提供的标点符号重新对齐算法

<img src="https://yellow-cdn.veclightyear.com/2b54e442/27d4a204-da21-4204-a2c0-b1986b069d31.png" alt="drawing" width="25"/> 如果您欣赏我对社区的贡献，请在GitHub上为该项目点星（见右上角）！

这是什么

这个仓库将Whisper的ASR能力与语音活动检测（VAD）和说话人嵌入相结合，以识别Whisper生成的转录中每个句子的说话人。首先，从音频中提取人声以提高说话人嵌入的准确性，然后使用Whisper生成转录，然后使用WhisperX校正和对齐时间戳，以帮助最小化由于时间偏移导致的分类错误。然后将音频传入MarbleNet进行VAD和分段以排除静音，然后使用TitaNet提取说话人嵌入以识别每个分段的说话人，然后将结果与WhisperX生成的时间戳关联，以根据时间戳检测每个单词的说话人，然后使用标点模型重新对齐以补偿微小的时间偏移。

WhisperX和NeMo参数已编码到diarize.py和helpers.py中，我稍后会添加CLI参数来更改它们

安装

需要预先安装FFMPEG和Cython作为先决条件

pip install cython

或

sudo apt update && sudo apt install cython3

# 在Ubuntu或Debian上
sudo apt update && sudo apt install ffmpeg

# 在Arch Linux上
sudo pacman -S ffmpeg

# 在MacOS上使用Homebrew (https://brew.sh/)
brew install ffmpeg

# 在Windows上使用Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# 在Windows上使用Scoop (https://scoop.sh/)
scoop install ffmpeg

# 在Windows上使用WinGet (https://github.com/microsoft/winget-cli)
winget install ffmpeg