Llama-2-7B-32K-Instruct

Llama-2-7B-32K-Instruct项目介绍

项目概况

Llama-2-7B-32K-Instruct是一个开源的长上下文聊天模型，基于Llama-2-7B-32K进行了高质量指令和聊天数据的微调。这个模型的构建只用了不到200行的Python代码，并且充分利用了Together API。项目的目标是让更多人可以微调属于自己的Llama-2-7B-32K版本，让大家可以通过Together API进行实验并反馈使用体验。

数据收集详情

Llama-2-7B-32K-Instruct的微调数据来自两个部分：

19,000轮由人类指令和Llama-2-70B-Chat输出生成的单轮和多轮对话。这个数据集是通过使用像Alpaca、Vicuna、WizardLM和Orca一样的蒸馏范式收集的，利用了一个强大的语言模型生成指令。在此处可以查看完整数据集，并且这里提供了完整的数据收集过程。
长上下文总结和长上下文问答。这部分训练遵循了Llama-2-7B-32K的方法，使用了BookSum数据集和多文档问答数据集。

最终用于模型微调的数据比例为：指令数据（50%）+ BookSum（25%）+ 多文档问答（25%）。

模型使用

项目鼓励用户通过Together API尝试该模型。为了在本地获得最佳性能，建议安装Flash Attention V2。以下是安装和加载模型的指令：

# 请更新 `CUDA_HOME` 的路径
export CUDA_HOME=/usr/local/cuda-11.8
pip install transformers==4.31.0
pip install sentencepiece
pip install ninja
pip install flash-attn --no-build-isolation
pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary

加载模型：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-Instruct")
model = AutoModelForCausalLM.from_pretrained("togethercomputer/Llama-2-7B-32K-Instruct",
    trust_remote_code=True, torch_dtype=torch.float16)
input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
output = model.generate(input_ids, max_length=128,
    temperature=0.7, repetition_penalty=1.1, top_p=0.7, top_k=50)
output_text = tokenizer.decode(output[0], skip_special_tokens=True)

用户也可以在Together Playground上直接使用该模型，只需使用如下格式的提示即可：

[INST]\n<你的指令>\n[\INST]\n\n

例如，输入：

[INST]\n写一首关于猫的诗\n[\INST]\n\n

模型将返回一首关于猫的诗。

模型评估

模型的评估从三个方面进行：1) Alpaca Eval; 2) 针对BookSum的数据Rouge得分; 以及3) 多文档问答(MQA)的准确率。与其他模型如GPT-3.5-Turbo-16K和Longchat-7b-16k等比较，以下是结果总结：

Alpaca Eval的胜率方面，Llama-2-7B-32K-Instruct表现出色，其中Llama-2-7B-32K-Instruct达到70.36%。
在BookSum数据集上的Rouge得分表现中，Llama-2-7B-32K-Instruct得分更高，尤其在R1、R2和RL指标上都优于其他对比模型。
在多文档问答准确率测试中，Llama-2-7B-32K-Instruct同样表现优秀，准确率与GPT-3.5-Turbo-16K相当。

限制与偏见

和所有语言模型一样，Llama-2-7B-32K-Instruct可能生成不正确或带有偏见的内容。用户在使用时应保持警惕。

参与社区

欢迎加入我们的Together Discord社区，共同交流与合作。