AutoCoder

AutoCoder 项目介绍

项目近况

AutoCoder 项目最近上传了一个新模型：AutoCoder_QW_7B。这个模型修复了之前的问题，现在在用户请求代码验证时才会启动代码解释器。AutoCoder_QW_7B 的基础模型是 CodeQwen1.5-7b。

项目简介

AutoCoder 是一个专为代码生成任务设计的新模型。在 HumanEval 基础数据集上的测试准确率超过了 GPT-4 Turbo（2024年4月），达到 90.9%（相比 GPT-4 Turbo 的 90.2%）。与以往的开源模型相比，AutoCoder 提供了一个全新的特性：在用户希望执行代码时，它能自动安装所需软件包，并尝试运行代码直至确认没有问题。

与 GPT-4 Turbo 的区别

AutoCoder 的代码解释器能够自动安装必需的库，这扩展了代码解释器的应用范围。相对而言，GPT-4 Turbo 无法访问外部库。

与 OpenCodeInterpreter 的区别

AutoCoder 的代码解释器与 GPT-4 Turbo 一样，仅当用户需要验证代码时才会被调用，而 OpenCodeInterpreter 则会运行所有生成的 Python 代码。

模型发布

AutoCoder 的模型可以在 Huggingface 上找到：

这两个模型的基础模型是 deepseeker-coder。AutoCoder_QW_7B 则基于 CodeQwen1.5-7b。

快速开始

创建 conda 环境：

conda create -n AutoCoder python=3.11
conda activate AutoCoder
pip install -r requirements.txt

在 HumanEval 上测试性能，基准准确率为 90.9%，扩展为 78.0%。如果不需要测试基准性能，可跳过此步骤：
```
cd Evaluation
python test_humaneval.py
```
完成后会生成一个名为 AutoCoder_HumanEval+.jsonl 的文件。接下来按照 EvalPlus GitHub 的测试框架查看结果。
在 MBPP 上测试性能，基准准确率为 82.5%，扩展为 70.6%。可选择略过：
```
python test_humaneval.py
```
随后进行后处理删除自然语言：
```
python postprocess_mbpp.py
```
结果会生成一个 AutoCoder_Mbpp+-sanitized.jsonl 文件，用于直接测试。
在 DS-1000 上测试：
```
python test_ds1000.py
```
之后的步骤与之前相同。
网页演示（包含代码解释器）：

安装 gradio 并运行：
```
pip install gradio==3.48.0
cd /Web_demo
python chatbot.py
```

注意事项

在使用代码解释器时，建议设置 do_sample = True（默认设置）。最好在 Linux 环境下部署。

联系方式

如需了解更多信息或有任何问题，可通过邮件 leib2765@gmail.com 联系我们。

引用信息

@misc{lei2024autocoder,
      title={AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}}, 
      author={Bin Lei and Yuchen Li and Qiuwu Chen},
      year={2024},
      eprint={2405.14906},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}