Q-扩散:量化扩散模型 [网站] [论文]

[新消息!] Q-扩散已被NVIDIA TensorRT采用!查看官方示例。

Q-扩散能够以无需训练的方式将全精度无条件扩散模型量化为4位,同时保持相当的性能(与传统PTQ相比,FID变化最多为2.34,而不是>100)。

lsun示例

我们的方法还可以应用于文本引导的图像生成,我们首次在4位权重下运行稳定扩散并实现高质量生成。

sd示例

本仓库提供了Q-扩散的官方实现,包含经过校准(模拟)的量化检查点。

概述

预览图

扩散模型通过迭代使用深度神经网络估计噪声,在图像合成方面取得了显著成功。然而,推理速度慢以及噪声估计模型的内存和计算密集性阻碍了扩散模型的高效实现。虽然训练后量化(PTQ)被认为是其他任务的首选压缩方法,但它并不能无缝适用于扩散模型。我们提出了一种专为扩散模型独特的多时间步骤流程和模型架构设计的新型PTQ方法,该方法压缩噪声估计网络以加速生成过程。我们确定扩散模型量化的主要挑战是噪声估计网络在多个时间步骤上的输出分布变化,以及噪声估计网络内部快捷层的双峰激活分布。在这项工作中,我们通过时间步感知校准和分离快捷量化来解决这些挑战。

入门指南

安装

克隆此仓库,然后使用以下命令创建并激活名为qdiff的合适conda环境:

git clone https://github.com/Xiuyu-Li/q-diffusion.git
cd q-diffusion
conda env create -f environment.yml
conda activate qdiff

使用方法

对于潜在扩散和稳定扩散实验,首先按照CompVis的latent-diffusion和stable-diffusion仓库中的说明下载相关检查点。我们目前使用sd-v1-4.ckpt进行稳定扩散。
从Google Drive [链接]下载量化检查点。仅使用4/8位权重量化的检查点与使用4/8位权重和8位激活量化的检查点相同。
然后使用以下命令运行量化检查点的推理脚本:

# CIFAR-10 (DDIM)
# 4/8位仅权重
python scripts/sample_diffusion_ddim.py --config configs/cifar10.yml --use_pretrained --timesteps 100 --eta 0 --skip_type quad --ptq --weight_bit <4 or 8> --quant_mode qdiff --split --resume -l <output_path> --cali_ckpt <quantized_ckpt_path>
# 4/8位权重, 8位激活
python scripts/sample_diffusion_ddim.py --config configs/cifar10.yml --use_pretrained --timesteps 100 --eta 0 --skip_type quad --ptq --weight_bit <4 or 8> --quant_mode qdiff --quant_act --act_bit 8 --a_sym --split --resume -l <output_path> --cali_ckpt <quantized_ckpt_path>

# LSUN 卧室 (LDM-4)
# 4/8位仅权重
python scripts/sample_diffusion_ldm.py -r models/ldm/lsun_beds256/model.ckpt -n 20 --batch_size 10 -c 200 -e 1.0 --seed 41 --ptq --weight_bit <4 or 8> --resume -l <output_path> --cali_ckpt <quantized_ckpt_path>
# 4/8位权重, 8位激活
python scripts/sample_diffusion_ldm.py -r models/ldm/lsun_beds256/model.ckpt -n 20 --batch_size 10 -c 200 -e 1.0 --seed 41 --ptq --weight_bit <4 or 8> --quant_act --act_bit 8 --a_sym --resume -l <output_path> --cali_ckpt <quantized_ckpt_path>

# LSUN 教堂 (LDM-8)
# 4/8位仅权重
python scripts/sample_diffusion_ldm.py -r models/ldm/lsun_churches256/model.ckpt -n 20 --batch_size 10 -c 400 -e 0.0 --seed 41 --ptq --weight_bit <4 or 8> --resume -l <output_path> --cali_ckpt <quantized_ckpt_path>
# 4/8位权重, 8位激活
python scripts/sample_diffusion_ldm.py -r models/ldm/lsun_churches256/model.ckpt -n 20 --batch_size 10 -c 400 -e 0.0 --seed 41 --ptq --weight_bit <4 or 8> --quant_act --act_bit 8 --resume -l <output_path> --cali_ckpt <quantized_ckpt_path>

# 稳定扩散
# 4/8位仅权重
python scripts/txt2img.py --prompt <提示语, 例如 "一只戴帽子的小狗"> --plms --cond --ptq --weight_bit <4 or 8> --quant_mode qdiff --no_grad_ckpt --split --n_samples 5 --resume --outdir <output_path> --cali_ckpt <quantized_ckpt_path>
# 4/8位权重, 8位激活 (softmax后的注意力矩阵使用16位)
python scripts/txt2img.py --prompt <提示语, 例如 "一只戴帽子的小狗"> --plms --cond --ptq --weight_bit <4 or 8> --quant_mode qdiff --no_grad_ckpt --split --n_samples 5 --resume --quant_act --act_bit 8 --sm_abit 16 --outdir <output_path> --cali_ckpt <quantized_ckpt_path>

校准

要进行校准过程，你必须首先生成相应的校准数据集。我们在这里提供了一些示例校准数据集。这些数据集在每个时间步骤包含约1000-2000个中间输出样本，远远超过校准所需的数量。我们将很快上传满足校准最低要求的较小子集。同时，你可以考虑按照论文中描述的程序自行生成校准数据集。

要复现校准后的检查点，你可以使用以下命令：

# CIFAR-10 (DDIM)
python scripts/sample_diffusion_ddim.py --config configs/cifar10.yml --use_pretrained --timesteps 100 --eta 0 --skip_type quad --ptq --weight_bit <4 或 8> --quant_mode qdiff --cali_st 20 --cali_batch_size 32 --cali_n 256 --quant_act --act_bit 8 --a_sym --split --cali_data_path <校准数据路径> -l <输出路径>

# LSUN 卧室 (LDM-4)
python scripts/sample_diffusion_ldm.py -r models/ldm/lsun_beds256/model.ckpt -n 50000 --batch_size 10 -c 200 -e 1.0  --seed 40 --ptq  --weight_bit <4 或 8> --quant_mode qdiff --cali_st 20 --cali_batch_size 32 --cali_n 256 --quant_act --act_bit 8 --a_sym --a_min_max --running_stat --cali_data_path <校准数据路径> -l <输出路径>

# LSUN 教堂 (LDM-8)
python scripts/sample_diffusion_ldm.py -r models/ldm/lsun_churches256/model.ckpt -n 50000 --batch_size 10 -c 400 -e 0.0 --seed 40 --ptq --weight_bit <4 或 8> --quant_mode qdiff --cali_st 20 --cali_batch_size 32 --cali_n 256 --quant_act --act_bit 8 --cali_data_path <校准数据路径> -l <输出路径>

# Stable Diffusion
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --cond --ptq --weight_bit <4 或 8> --quant_mode qdiff --quant_act --act_bit 8 --cali_st 25 --cali_batch_size 8 --cali_n 128 --no_grad_ckpt --split --running_stat --sm_abit 16 --cali_data_path <校准数据路径> --outdir <输出路径>

请注意，使用不同的校准超参数可能会导致略微不同的性能。

引用

如果你在研究中发现这项工作有用，请考虑引用我们的论文：

@InProceedings{li2023qdiffusion,
  author={Li, Xiuyu and Liu, Yijiang and Lian, Long and Yang, Huanrui and Dong, Zhen and Kang, Daniel and Zhang, Shanghang and Keutzer, Kurt},
  title={Q-Diffusion: Quantizing Diffusion Models},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month={October},
  year={2023},
  pages={17535-17545}
}