FontDiffuser: 基于多尺度内容聚合和风格对比学习的一次性字体生成去噪扩散模型

</div>

FontDiffuser_LOGO

</div> <p align="center"> <strong><a href="#🔥-模型库">🔥 模型库 </a></strong> • <strong><a href="#🛠️-安装">🛠️ 安装 </a></strong> • <strong><a href="#🏋️-训练">🏋️ 训练</a></strong> • <strong><a href="#📺-采样">📺 采样</a></strong> • <strong><a href="#📱-运行网页界面">📱 运行网页界面</a></strong> </p>

🌟 亮点

可视化_1 可视化_2

我们提出了FontDiffuser，它可以生成未见过的字符和风格，并可扩展到跨语言生成，如中文到韩文。
FontDiffuser在生成复杂字符和处理大幅风格变化方面表现出色，并达到了最先进的性能。
FontDiffuser生成的结果可以完美用于InstructPix2Pix进行装饰，如上图所示。
我们发布了在线的💻Hugging Face演示！欢迎尝试！

📅 新闻

2024.01.27: 第二阶段训练已发布。
2023.12.20: 我们的代码库已公开！👏🤗
2023.12.19: 🔥🎉 💻Hugging Face演示已公开！欢迎尝试！
2023.12.16: Gradio应用演示已发布。
2023.12.10: 发布源代码，包含第一阶段训练和采样。
2023.12.09: 🎉🎉 我们的论文被AAAI2024接收。
之前: 我们的文本图像扩散模型推荐代码库已公开，其中包含最近用于文本图像生成任务的扩散模型论文集。欢迎查看！

🔥 模型库

模型	检查点	状态
FontDiffuer	谷歌云盘 / 百度网盘:gexg	已发布
SCR	谷歌云盘 / 百度网盘:gexg	已发布

🚧 待办事项

添加第一阶段训练和采样脚本。
添加网页界面演示。
将演示推送到Hugging Face。
添加第二阶段训练脚本和检查点。
添加SCR模块的预训练。
与InstructPix2Pix结合。

🛠️ 安装

先决条件（推荐）

Linux
Python 3.9
Pytorch 1.13.1
CUDA 11.7

环境设置

克隆此仓库：

git clone https://github.com/yeungchenwa/FontDiffuser.git

步骤 0: 从官方网站下载并安装Miniconda。

步骤 1: 创建一个conda环境并激活它。

conda create -n fontdiffuser python=3.9 -y
conda activate fontdiffuser

步骤 2: 按照这里安装相关版本的Pytorch。

# 建议
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

步骤 3: 安装所需的包。

pip install -r requirements.txt

🏋️ 训练

数据构建

训练数据文件树应为（数据示例显示在目录data_examples/train/中）：

├──data_examples
│   └── train
│       ├── ContentImage
│       │   ├── char0.png
│       │   ├── char1.png
│       │   ├── char2.png
│       │   └── ...
│       └── TargetImage.png
│           ├── style0
│           │     ├──style0+char0.png
│           │     ├──style0+char1.png
│           │     └── ...
│           ├── style1
│           │     ├──style1+char0.png
│           │     ├──style1+char1.png
│           │     └── ...
│           ├── style2
│           │     ├──style2+char0.png
│           │     ├──style2+char1.png
│           │     └── ...
│           └── ...

训练配置

在运行训练脚本（包括以下三种模式）之前，您应通过以下方式设置训练配置，例如分布式训练：

accelerate config

训练 - SCR预训练

即将推出 ...

训练 - 第一阶段

sh train_phase_1.sh

data_root: 数据根目录，如./data_examples
output_dir: 训练输出日志和检查点保存目录。
resolution: 我们扩散模型中UNet的分辨率。
style_image_size: 风格图像的分辨率，可以与resolution不同。
content_image_size: 内容图像的分辨率，应与resolution相同。
channel_attn: 是否在MCA块中使用通道注意力。
train_batch_size: 训练中的批量大小。
max_train_steps: 训练步骤的最大值。
learning_rate: 训练时的学习率。
ckpt_interval: 训练时检查点保存间隔。
drop_prob: 无分类器指导训练概率。

训练 - 第二阶段

完成第二阶段训练后，你应该将训练好的检查点文件（unet.pth、content_encoder.pth和style_encoder.pth）放入phase_1_ckpt目录。在第二阶段，这些参数将被恢复使用。

sh train_phase_2.sh

phase_2：第二阶段训练的标签。
phase_1_ckpt_dir：第一阶段训练后的模型检查点保存目录。
scr_ckpt_path：预训练SCR模块的检查点路径。你可以从上面的🔥模型库下载。
sc_coefficient：用于监督的风格对比损失系数。
num_neg：负样本数量，默认为16。

📺 采样

步骤1 => 准备检查点

选项（1）从GoogleDrive / 百度网盘:gexg下载检查点，然后将ckpt放到根目录，包括文件unet.pth、content_encoder.pth和style_encoder.pth。选项（2）将你重新训练的检查点文件夹ckpt放到根目录，包括文件unet.pth、content_encoder.pth和style_encoder.pth。

步骤2 => 运行脚本

（1）从内容图像和参考图像采样。

sh script/sample_content_image.sh

ckpt_dir：模型检查点保存目录。
content_image_path：内容/源图像路径。
style_image_path：风格/参考图像路径。
save_image：如果保存为图像则设置为True。
save_image_dir：图像保存目录，保存的文件包括out_single.png和out_with_cs.png。
device：采样设备，推荐使用GPU加速。
guidance_scale：无分类器采样引导比例。
num_inference_steps：DPM-Solver++的推理步骤数。

（2）从内容字符采样。 注意你可能需要一个包含大量中文字符的ttf文件，可以从百度网盘:wrth下载。

sh script/sample_content_character.sh

character_input：如果设置为True，则使用字符串作为内容/源输入。
content_character：内容/源内容字符串。
其他参数与上面的选项（1）相同。

📱 运行WebUI

（1）通过FontDiffuser采样

gradio gradio_app.py

示例：

（2）通过FontDiffuser采样并使用InstructPix2Pix渲染

即将推出...

🌄 展示

复杂度高的字符

vis_hard

复杂度中等的字符

vis_medium

复杂度低的字符

vis_easy

跨语言生成（中文到韩文）

vis_korean

💙 致谢

diffusers

版权

本仓库仅可用于非商业研究目的。
商业用途请联系金连文教授（eelwjin@scut.edu.cn）。

引用

@inproceedings{yang2024fontdiffuser,
  title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
  author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  year={2024}
}