DeepSeek-Math

<div align="center"> <img src="https://yellow-cdn.veclightyear.com/835a84d5/9884ba28-4953-41b8-b360-3059da6347e3.svg" width="60%" alt="DeepSeek LLM" /> </div> <hr> <div align="center"> <a href="https://www.deepseek.com/" target="_blank"> <img alt="主页" src="https://yellow-cdn.veclightyear.com/835a84d5/73dc9c74-72b8-4990-b94d-b454d96bbed8.svg" /> </a> <a href="https://chat.deepseek.com/" target="_blank"> <img alt="聊天" src="https://img.shields.io/badge/🤖%20聊天-DeepSeek%20LLM-536af5?color=536af5&logoColor=white" /> </a> <a href="https://huggingface.co/deepseek-ai" target="_blank"> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" /> </a> <a href="https://replicate.com/cjwbw/deepseek-math-7b-base" target="_parent"><img src="https://replicate.com/cjwbw/deepseek-math-7b-base/badge" alt="Replicate"/></a> </div> <div align="center"> <a href="https://discord.gg/Tc7c45Zzu5" target="_blank"> <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" /> </a> <a href="images/qr.jpeg" target="_blank"> <img alt="微信" src="https://img.shields.io/badge/微信-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" /> </a> <a href="https://twitter.com/deepseek_ai" target="_blank"> <img alt="Twitter关注" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" /> </a> </div> <div align="center"> <a href="LICENSE-CODE"> <img alt="代码许可" src="https://img.shields.io/badge/代码许可-MIT-f5de53?&color=f5de53"> </a> <a href="LICENSE-MODEL"> <img alt="模型许可" src="https://img.shields.io/badge/模型许可-模型协议-f5de53?&color=f5de53"> </a> </div> <p align="center"> <a href="#4-model-downloads">模型下载</a> | <a href="#2-evaluation-results">评估结果</a> | <a href="#5-quick-start">快速开始</a> | <a href="#6-license">许可</a> | <a href="#7-citation">引用</a> </p> <p align="center"> <a href="https://arxiv.org/pdf/2402.03300.pdf"><b>论文链接</b>👁️</a> </p>

1. 简介

DeepSeekMath以DeepSeek-Coder-v1.5 7B为初始模型，并在来自Common Crawl的数学相关标记上继续预训练，同时结合自然语言和代码数据，总共训练了5000亿个标记。DeepSeekMath 7B在竞赛级MATH基准测试中取得了令人印象深刻的**51.7%**的得分，而无需依赖外部工具包和投票技术，接近了Gemini-Ultra和GPT-4的性能水平。为了研究目的，我们向公众发布了基础、指令和强化学习模型的检查点。

2. 评估结果

DeepSeekMath-Base 7B

我们对DeepSeekMath-Base 7B的数学能力进行了全面评估，重点关注其产生自包含数学解决方案的能力（无需依赖外部工具）、使用工具解决数学问题的能力以及进行形式化定理证明的能力。除了数学之外，我们还提供了基础模型的更一般性能概况，包括其自然语言理解、推理和编程技能的表现。

逐步推理的数学问题解决

使用工具的数学问题解决

自然语言理解、推理和代码

上述表格中的评估结果可以总结如下：

**卓越的数学推理能力：**在竞赛级MATH数据集上，DeepSeekMath-Base 7B通过少样本思维链提示，在绝对值上超越了现有开源基础模型10%以上，同时也超越了Minerva 540B。
**强大的工具使用能力：**继续以DeepSeekCoder-Base-7B-v1.5为基础进行预训练，使DeepSeekMath-Base 7B能够更有效地通过编写程序来解决和证明数学问题。
**comparable的推理和编码性能：**DeepSeekMath-Base 7B在推理和编码方面达到了与DeepSeekCoder-Base-7B-v1.5相comparable的性能。

DeepSeekMath-Instruct和-RL 7B

DeepSeekMath-Instruct 7B是基于DeepSeekMath-Base 7B的数学指令调优模型，而DeepSeekMath-RL 7B则是在DeepSeekMath-Instruct 7B的基础上，使用我们提出的群体相对策略优化（GRPO）算法进行训练。

我们在4个英文和中文的定量推理基准测试上评估了不使用工具和使用工具的数学性能。如表所示，DeepSeekMath-Instruct 7B展示了强大的逐步推理能力，而DeepSeekMath-RL 7B在使用工具的情况下，在MATH上的准确率接近60%，超越了所有现有的开源模型。

3. 数据收集

步骤1：选择OpenWebMath作为我们的初始种子语料库，用于训练FastText模型。OpenWebMath是一个高质量数学网页文本集合。
步骤2：使用FastText模型从去重后的Common Crawl数据库中检索数学相关网页。
步骤3：通过统计分析识别潜在的数学相关域名。
步骤4：手动标注这些已识别域名中与数学内容相关的URL。
步骤5：将与这些已标注URL相链接但尚未收集的网页添加到种子语料库中。返回步骤1，重复四次迭代。

经过四轮数据收集，我们最终得到了3550万个数学网页，总计1200亿个标记。

4. 模型下载

我们向公众发布了DeepSeekMath 7B，包括基础、指令和强化学习模型，以支持学术和商业社区更广泛、更多样化的研究。请注意，本模型的使用受许可证章节中列出的条款约束。根据这些条款，允许商业使用。

Huggingface

模型	序列长度	下载链接
DeepSeekMath-Base 7B	4096	🤗 HuggingFace
DeepSeekMath-Instruct 7B	4096	🤗 HuggingFace
DeepSeekMath-RL 7B	4096	🤗 HuggingFace

5. 快速开始

您可以直接使用Huggingface的Transformers进行模型推理。

文本补全

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/deepseek-math-7b-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

text = "The integral of x^2 from 0 to 2 is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

对话补全

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/deepseek-math-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

messages = [
    {"role": "user", "content": "what is the integral of x^2 from 0 to 2?\nPlease reason step by step, and put your final answer within \boxed{}."}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)

如果不使用提供的apply_chat_template函数，您也可以按照示例模板与我们的模型进行交互。请注意，messages应替换为您的输入。

User: {messages[0]['content']}

A: {messages[1]['content']}<｜end▁of▁sentence｜>User: {messages[2]['content']}

A:

**注意：**默认情况下（add_special_tokens=True），我们的分词器会在输入文本前自动添加一个bos_token（<｜begin▁of▁sentence｜>）。此外，由于系统提示与此版本的模型不兼容，我们不建议在输入中包含系统提示。

❗❗❗ 请使用思维链提示来测试DeepSeekMath-Instruct和DeepSeekMath-RL：

英文问题：{question}\nPlease reason step by step, and put your final answer within \boxed{}.
中文问题：{question}\n请通过逐步推理来解答问题，并把最终答案放置于\boxed{}中。

6. 许可证

此代码仓库采用MIT许可证。DeepSeekMath模型的使用受模型许可证约束。DeepSeekMath支持商业使用。

详情请参阅LICENSE-CODE和LICENSE-MODEL。

7. 引用

@misc{deepseek-math,
  author = {Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo},
  title = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models},
  journal = {CoRR},
  volume = {abs/2402.03300},
  year = {2024},
  url = {https://arxiv.org/abs/2402.03300},
}