Aurora入门学习资料 - 激活Mixtral-8x7B中文对话能力的MoE模型

Aurora项目简介

Aurora是一个基于Mixtral-8x7B的中文MoE(mixture of experts)模型,通过指令微调激活了模型的中文开放域对话能力。该项目由澳门理工大学应用科学学院团队开发,旨在增强Mixtral-8x7B稀疏混合专家模型的中文对话能力。

Aurora logo

项目资源

代码仓库

GitHub: WangRongsheng/Aurora

模型下载

基础模型 Mixtral-8x7B-Instruct-v0.1:
- HuggingFace
- ModelScope
Aurora LoRA权重:
Aurora-Plus (推荐):
- HuggingFace
- WiseModel

快速使用

克隆项目并安装依赖:

git clone https://github.com/WangRongsheng/Aurora.git
cd Aurora
pip install -r requirements.txt

下载模型权重(基础模型和LoRA权重)
运行推理:

Web界面:

CUDA_VISIBLE_DEVICES=0 python src/web_demo.py \
    --model_name_or_path ./Mixtral-8x7B-Instruct-v0.1 \
    --checkpoint_dir Aurora \
    --finetuning_type lora \
    --quantization_bit 4 \
    --template mistral

命令行界面:

CUDA_VISIBLE_DEVICES=0 python src/cli_demo.py \
    --model_name_or_path ./Mixtral-8x7B-Instruct-v0.1 \
    --checkpoint_dir Aurora \
    --finetuning_type lora \
    --quantization_bit 4 \
    --template mistral

模型评测

Aurora在多个中文基准测试上表现优异:

评测结果

在医学评测基准CMB上,Aurora得分29.87,远超Mistral-7B的22.26分。

训练细节

Aurora采用指令微调的方式,使用了三个中文指令数据集。训练代码如下:

CUDA_VISIBLE_DEVICES=5 python src/train_bash.py \
    --stage sft \
    --model_name_or_path ./Mixtral-8x7B-Instruct-v0.1 \
    --do_train \
    --dataset alpaca_zh,alpaca_gpt4_zh,sharegpt \
    --finetuning_type lora \
    --quantization_bit 4 \
    --output_dir output/ \
    --per_device_train_batch_size 2 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --fp16 \
    --template mistral