<div align="center"> <img src="https://yellow-cdn.veclightyear.com/835a84d5/00694156-3677-4909-91a9-a4f019101431.svg?raw=true" width="60%" alt="DeepSeek-V2" /> </div> <hr> <div align="center" style="line-height: 1;"> <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;"> <img alt="主页" src="https://yellow-cdn.veclightyear.com/835a84d5/601eac3c-7c0f-4ccd-9e94-fca3f2cbd933.svg?raw=true" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;"> <img alt="聊天" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V2-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;"> <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;"> <img alt="微信" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;"> <img alt="Twitter 关注" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-CODE" style="margin: 2px;"> <img alt="代码许可" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL" style="margin: 2px;"> <img alt="模型许可" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> </div> <p align="center"> <a href="#2-model-downloads">模型下载</a> | <a href="#3-evaluation-results">评估结果</a> | <a href="#4-model-architecture">模型架构</a> | <a href="#6-api-platform">API平台</a> | <a href="#8-license">许可</a> | <a href="#9-citation">引用</a> </p> <p align="center"> <a href="https://arxiv.org/abs/2405.04434"><b>论文链接</b>👁️</a> </p>

DeepSeek-V2：强大、经济且高效的混合专家语言模型

1. 简介

今天，我们推出了DeepSeek-V2，这是一个强大的混合专家（MoE）语言模型，具有经济的训练和高效的推理特点。它总共包含236B参数，其中每个token激活21B参数。与DeepSeek 67B相比，DeepSeek-V2不仅性能更强，还节省了42.5%的训练成本，将KV缓存减少了93.3%，并将最大生成吞吐量提高到了5.76倍。

我们在包含8.1万亿token的多样化、高质量语料库上对DeepSeek-V2进行了预训练。这种全面的预训练之后，我们还进行了监督微调（SFT）和强化学习（RL）过程，以充分发挥模型的能力。评估结果验证了我们方法的有效性，DeepSeek-V2在标准基准测试和开放式生成评估中都取得了出色的表现。

2. 新闻

2024.05.16：我们发布了DeepSeek-V2-Lite。
2024.05.06：我们发布了DeepSeek-V2。

3. 模型下载

模型	总参数量	激活参数量	上下文长度	下载
DeepSeek-V2-Lite	16B	2.4B	32k	🤗 HuggingFace
DeepSeek-V2-Lite-Chat (SFT)	16B	2.4B	32k	🤗 HuggingFace
DeepSeek-V2	236B	21B	128k	🤗 HuggingFace
DeepSeek-V2-Chat (RL)	236B	21B	128k	🤗 HuggingFace

</div>

由于HuggingFace的限制，开源代码目前在使用Huggingface的GPU上运行时性能比我们的内部代码库慢。为了便于高效运行我们的模型，我们提供了一个专门的vllm解决方案，优化了有效运行我们模型的性能。

4. 评估结果

基础模型

标准基准测试（大于67B的模型）

基准测试	领域	LLaMA3 70B	Mixtral 8x22B	DeepSeek-V1 (Dense-67B)	DeepSeek-V2 (MoE-236B)
MMLU	英语	78.9	77.6	71.3	78.5
BBH	英语	81.0	78.9	68.7	78.9
C-Eval	中文	67.5	58.6	66.1	81.7
CMMLU	中文	69.3	60.0	70.8	84.0
HumanEval	代码	48.2	53.1	45.1	48.8
MBPP	代码	68.6	64.2	57.4	66.6
GSM8K	数学	83.0	80.3	63.4	79.2
Math	数学	42.2	42.5	18.7	43.6

</div>

标准基准测试（小于16B的模型）

<div align="center"> | **基准测试** | **领域** | **DeepSeek 7B (密集)** | **DeepSeekMoE 16B** | **DeepSeek-V2-Lite (MoE-16B)** | |:-------------:|:----------:|:--------------:|:-----------------:|:--------------------------:| | **架构** | - | MHA+密集 | MHA+MoE | MLA+MoE | | **MMLU** | 英语 | 48.2 | 45.0 | 58.3 | | **BBH** | 英语 | 39.5 | 38.9 | 44.1 | | **C-Eval** | 中文 | 45.0 | 40.6 | 60.3 | | **CMMLU** | 中文 | 47.2 | 42.5 | 64.3 | | **HumanEval** | 代码 | 26.2 | 26.8 | 29.9 | | **MBPP** | 代码 | 39.0 | 39.2 | 43.2 | | **GSM8K** | 数学 | 17.4 | 18.8 | 41.1 | | **Math** | 数学 | 3.3 | 4.3 | 17.1 |

有关更多评估细节，如少样本设置和提示，请查阅我们的论文。

上下文窗口

"大海捞针"（NIAH）测试的评估结果。DeepSeek-V2在所有长度达128K的上下文窗口中表现良好。

对话模型

标准基准测试（大于67B的模型）

基准测试	领域	QWen1.5 72B Chat	Mixtral 8x22B	LLaMA3 70B Instruct	DeepSeek-V1 Chat (SFT)	DeepSeek-V2 Chat (SFT)	DeepSeek-V2 Chat (RL)
MMLU	英语	76.2	77.8	80.3	71.1	78.4	77.8
BBH	英语	65.9	78.4	80.1	71.7	81.3	79.7
C-Eval	中文	82.2	60.0	67.9	65.2	80.9	78.0
CMMLU	中文	82.9	61.0	70.7	67.8	82.4	81.6
HumanEval	代码	68.9	75.0	76.2	73.8	76.8	81.1
MBPP	代码	52.2	64.4	69.8	61.4	70.4	72.0
LiveCodeBench (0901-0401)	代码	18.8	25.0	30.5	18.3	28.7	32.5
GSM8K	数学	81.9	87.9	93.2	84.1	90.8	92.2
Math	数学	40.6	49.8	48.5	32.6	52.7	53.9

</div>

标准基准测试（小于16B的模型）

基准测试	领域	DeepSeek 7B Chat (SFT)	DeepSeekMoE 16B Chat (SFT)	DeepSeek-V2-Lite 16B Chat (SFT)
MMLU	英语	49.7	47.2	55.7
BBH	英语	43.1	42.2	48.1
C-Eval	中文	44.7	40.0	60.1
CMMLU	中文	51.2	49.3	62.5
HumanEval	代码	45.1	45.7	57.3
MBPP	代码	39.0	46.2	45.8
GSM8K	数学	62.6	62.2	72.0
Math	数学	14.7	15.2	27.9

</div>

英语开放式生成评估

我们使用AlpacaEval 2.0和MTBench评估我们的模型，展示了DeepSeek-V2-Chat-RL在英语对话生成方面的竞争力。

中文开放式生成评估

Alignbench (https://arxiv.org/abs/2311.18743)

模型	开源/闭源	总分	中文推理	中文语言
gpt-4-1106-preview	闭源	8.01	7.73	8.29
DeepSeek-V2 Chat (RL)	开源	7.91	7.45	8.36
erniebot-4.0-202404 (文心一言)	闭源	7.89	7.61	8.17
DeepSeek-V2 Chat (SFT)	开源	7.74	7.30	8.17
gpt-4-0613	闭源	7.53	7.47	7.59
erniebot-4.0-202312 (文心一言)	闭源	7.36	6.84	7.88
moonshot-v1-32k-202404 (月之暗面)	闭源	7.22	6.42	8.02
Qwen1.5-72B-Chat (通义千问)	开源	7.19	6.45	7.93
DeepSeek-67B-Chat	开源	6.43	5.75	7.11
Yi-34B-Chat (零一万物)	开源	6.12	4.86	7.38
gpt-3.5-turbo-0613	闭源	6.08	5.35	6.71
DeepSeek-V2-Lite 16B Chat	开源	6.01	4.71	7.32

</div>

编程基准测试

我们在LiveCodeBench（0901-0401）上评估了我们的模型，这是一个为实时编程挑战设计的基准测试。如图所示，DeepSeek-V2在LiveCodeBench上展示了相当的熟练度，其Pass@1分数超过了几个其他复杂的模型。这一表现突显了该模型在处理实时编程任务方面的有效性。

5. 模型架构

DeepSeek-V2采用创新架构，以确保经济的训练和高效的推理：

对于注意力机制，我们设计了MLA（多头潜在注意力），它利用低秩键值联合压缩来消除推理时键值缓存的瓶颈，从而支持高效推理。
对于前馈网络（FFNs），我们采用DeepSeekMoE架构，这是一种高性能的MoE架构，能够以更低的成本训练更强大的模型。

6. 聊天网站

您可以在DeepSeek的官方网站上与DeepSeek-V2聊天：chat.deepseek.com

7. API平台

我们还在DeepSeek平台上提供兼容OpenAI的API：platform.deepseek.com。注册即可获得超过数百万的免费令牌。您还可以以无与伦比的价格按使用量付费。

<p align="center"> <img width="40%" src="https://yellow-cdn.veclightyear.com/835a84d5/d882af83-5061-47d8-87d6-39ab610e5c4b.png?raw=true"> </p> ## 8. 如何本地运行 **要使用BF16格式的DeepSeek-V2进行推理，需要80GB*8的GPU。** ### 使用Huggingface的Transformers进行推理您可以直接使用[Huggingface的Transformers](https://github.com/huggingface/transformers)进行模型推理。

文本补全

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/DeepSeek-V2"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# `max_memory`应根据您的设备进行设置
max_memory = {i: "75GB" for i in range(8)}
# `device_map`不能设置为`auto`
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

text = "注意力函数可以被描述为将一个查询和一组键值对映射到一个输出，其中查询、键、值和输出都是向量。输出是"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

对话补全

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/DeepSeek-V2-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# `max_memory`应根据您的设备进行设置
max_memory = {i: "75GB" for i in range(8)}
# `device_map`不能设置为`auto`
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="sequential", torch_dtype=torch.bfloat16, max_memory=max_memory, attn_implementation="eager")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

messages = [
    {"role": "user", "content": "用C++写一段快速排序代码"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)

完整的对话模板可以在huggingface模型仓库中的tokenizer_config.json文件中找到。

以下是一个对话模板的示例：

<｜begin▁of▁sentence｜>User: {user_message_1}

A: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}

A:

您还可以添加一个可选的系统消息：

<｜begin▁of▁sentence｜>{system_message}

User: {user_message_1}

A: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}

A:

使用vLLM进行推理（推荐）

要使用vLLM进行模型推理，请将此Pull Request合并到您的vLLM代码库中：https://github.com/vllm-project/vllm/pull/4650。

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 8
model_name = "deepseek-ai/DeepSeek-V2-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])

messages_list = [
    [{"role": "user", "content": "你是谁？"}],
    [{"role": "user", "content": "直接将以下内容翻译成中文：DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference."}],
    [{"role": "user", "content": "用C++写一段快速排序代码。"}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]

outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

LangChain支持

由于我们的API与OpenAI兼容，您可以轻松地在langchain中使用它。以下是一个示例：

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
    model='deepseek-chat',
    openai_api_key=<your-deepseek-api-key>,
    openai_api_base='https://api.deepseek.com/v1',
    temperature=0.85,
    max_tokens=8000)

9. 许可证

此代码仓库根据MIT许可证授权。DeepSeek-V2 Base/Chat模型的使用受模型许可证约束。DeepSeek-V2系列（包括Base和Chat）支持商业用途。

10. 引用

@misc{deepseekv2,
      title={DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model}, 
      author={DeepSeek-AI},
      year={2024},
      eprint={2405.04434},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}