deepseek-coder-33B-instruct-AWQ

DeepSeek Coder 33B Instruct-AWQ 项目介绍

项目概览

DeepSeek Coder 33B Instruct-AWQ 是由 TheBloke 对 DeepSeek 公司开发的 DeepSeek Coder 33B Instruct 模型进行量化处理后的版本。这个项目旨在提供一个更加轻量级、易于部署的大型代码语言模型，同时保持原模型的强大性能。

模型特点

基于先进的代码模型: 原始模型 DeepSeek Coder 33B Instruct 是在 2T 代码和语言数据上训练而成的大型语言模型，具有卓越的代码理解和生成能力。
AWQ 量化技术: 使用 AWQ（Activation-aware Weight Quantization）技术将模型量化为 4 位精度，大幅减小模型体积，提高推理速度。
保持性能: 相比常用的 GPTQ 量化方法，AWQ 在相同或更好的质量下提供更快的基于 Transformers 的推理。
多场景适用: 可用于代码补全、代码生成、问答等多种编程相关任务。
灵活部署: 支持在多种平台和框架中使用，包括 Text Generation WebUI、vLLM、Hugging Face TGI 等。

使用方法

1. Text Generation WebUI

在模型标签页下载 TheBloke/deepseek-coder-33B-instruct-AWQ。
选择 "Loader: AutoAWQ"。
加载模型并开始使用。

2. vLLM

使用 --quantization awq 参数启动 vLLM 服务器：

python3 -m vllm.entrypoints.api_server --model TheBloke/deepseek-coder-33B-instruct-AWQ --quantization awq

3. Hugging Face TGI

使用 Docker 运行 TGI 服务：

--model-id TheBloke/deepseek-coder-33B-instruct-AWQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096

4. Python 代码中使用 AutoAWQ

安装 AutoAWQ 包后，可以使用以下代码加载和使用模型：

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_name_or_path = "TheBloke/deepseek-coder-33B-instruct-AWQ"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
                                          trust_remote_code=False, safetensors=True)

# 使用模型生成代码或回答问题