OpenCodeInterpreter-DS-6.7B

OpenCodeInterpreter-DS-6.7B项目介绍

项目背景

OpenCodeInterpreter项目通过开源代码生成系统，将大型语言模型与高级专有系统（如GPT-4 Code Interpreter）相结合，大幅提升代码生成能力。项目的核心思想是将执行与迭代改进功能整合进代码生成过程，从而增强其功能。

更多信息及相关研究可以查阅我们的论文：OpenCodeInterpreter: A System for Enhanced Code Generation and Execution。

模型信息

该项目的模型基于deepseek-coder-6.7b-base。

基准测试成绩

OpenCodeInterpreter模型系列展现了编码模型性能的演进，尤其是通过引入执行反馈这个重要功能所带来的显著提升。项目的测试基于两个关键基准：HumanEval和MBPP。以下表格展示了OpenCodeInterpreter-DS-6.7B模型系列在这些基准中的表现，揭示了执行反馈如何提升代码解读和执行任务的能力。

基准	HumanEval (+)	MBPP (+)	平均 (+)
OpenCodeInterpreter-DS-6.7B	76.2 (72.0)	73.9 (63.7)	75.1 (67.9)
+ 执行反馈	81.1 (78.7)	82.7 (72.4)	81.9 (75.6)
+ 合成人工反馈	87.2 (86.6)	86.2 (74.2)	86.7 (80.4)
+ 合成人工反馈（Oracle）	89.7 (86.6)	87.2 (75.2)	88.5 (80.9)

注意："(+)" 的符号表示从扩展版本的HumanEval和MBPP基准中获得的分数。所示结果基于仅一次反馈迭代后的执行反馈，以展示执行反馈对性能的直接提升效果。

模型使用

推断示例

以下是如何使用OpenCodeInterpreter-DS-6.7B模型进行推断的代码示例：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path="m-a-p/OpenCodeInterpreter-DS-6.7B"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()

prompt = "Write a function to find the shared elements from the given two lists."
inputs = tokenizer.apply_chat_template(
        [{'role': 'user', 'content': prompt }],
        return_tensors="pt"
    ).to(model.device)
outputs = model.generate(
    inputs, 
    max_new_tokens=1024,
    do_sample=False,
    pad_token_id=tokenizer.eos_token_id,
    eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))