[CVPR2024] StableVITON：使用潜在扩散模型学习虚拟试穿的语义对应

这个仓库是StableVITON的官方实现

StableVITON：使用潜在扩散模型学习虚拟试穿的语义对应<br> Jeongho Kim、Gyojung Gu、Minho Park、Sunghyun Park、Jaegul Choo

预览图

待办事项

~~推理代码~~
~~发布模型权重~~
~~训练代码~~

环境配置

git clone https://github.com/rlawjdghek/StableVITON
cd StableVITON

conda create --name StableVITON python=3.10 -y
conda activate StableVITON

# 安装包
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
pip install pytorch-lightning==1.5.0
pip install einops
pip install opencv-python==4.7.0.72
pip install matplotlib
pip install omegaconf
pip install albumentations
pip install transformers==4.33.2
pip install xformers==0.0.19
pip install triton==2.0.0
pip install open-clip-torch==2.19.0
pip install diffusers==0.20.2
pip install scipy==1.10.1
conda install -c anaconda ipython -y

权重和数据

我们在VITONHD上的检查点已经发布！<br> 你可以从这里下载VITON-HD数据集。<br> 对于训练和推理，需要以下数据集结构：

train
|-- image
|-- image-densepose
|-- agnostic
|-- agnostic-mask
|-- cloth
|-- cloth_mask
|-- gt_cloth_warped_mask（用于ATV损失）

test
|-- image
|-- image-densepose
|-- agnostic
|-- agnostic-mask
|-- cloth
|-- cloth_mask

预处理

VITON-HD数据集作为基准，提供了一个不可知遮罩。然而，你可以尝试使用像SAM这样的分割工具在任意图像上进行虚拟试穿。请注意，对于densepose，你应该使用与VITON-HD中相同的densepose模型。

推理

#### 配对
CUDA_VISIBLE_DEVICES=4 python inference.py \
 --config_path ./configs/VITONHD.yaml \
 --batch_size 4 \
 --model_load_path <模型权重路径> \
 --save_dir <保存目录>

#### 非配对
CUDA_VISIBLE_DEVICES=4 python inference.py \
 --config_path ./configs/VITONHD.yaml \
 --batch_size 4 \
 --model_load_path <模型权重路径> \
 --unpair \
 --save_dir <保存目录>

#### 配对重绘
CUDA_VISIBLE_DEVICES=4 python inference.py \
 --config_path ./configs/VITONHD.yaml \
 --batch_size 4 \
 --model_load_path <模型权重路径> \
 --repaint \
 --save_dir <保存目录>

#### 非配对重绘
CUDA_VISIBLE_DEVICES=4 python inference.py \
 --config_path ./configs/VITONHD.yaml \
 --batch_size 4 \
 --model_load_path <模型权重路径> \
 --unpair \
 --repaint \
 --save_dir <保存目录>

你也可以通过'--repaint'选项保留未遮罩的区域。

训练

对于VITON训练，我们基于Paint-by-Example (PBE)模型将U-Net的第一个块从9个通道增加到13个通道（添加零卷积）。因此，你应该从链接下载修改后的检查点（名为'VITONHD_PBE_pose.ckpt'）并将其放在'./ckpts/'文件夹中。

此外，为了获得更精细的人物纹理，我们使用了在VITONHD数据集上微调的VAE。你还应该从链接下载检查点（名为'VITONHD_VAE_finetuning.ckpt'）并将其放在'./ckpts/'文件夹中。

### 基础模型训练
CUDA_VISIBLE_DEVICES=3,4 python train.py \
 --config_name VITONHD \
 --transform_size shiftscale3 hflip \
 --transform_color hsv bright_contrast \
 --save_name Base_test

### ATV损失微调
CUDA_VISIBLE_DEVICES=5,6 python train.py \
 --config_name VITONHD \
 --transform_size shiftscale3 hflip \
 --transform_color hsv bright_contrast \
 --use_atv_loss \
 --resume_path <第一阶段模型路径> \
 --save_name ATVloss_test

引用

如果你发现我们的工作对你的研究有用，请引用我们：

@artical{kim2023stableviton,
    title={StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On},
    author={Kim, Jeongho and Gu, Gyojung and Park, Minho and Park, Sunghyun and Choo, Jaegul},
    booktitle={arXiv preprint arxiv:2312.01725},
    year={2023}
}

致谢 Sunghyun Park是通讯作者。