stable-diffusion-2-1-realistic

stable-diffusion-2-1-realistic 项目介绍

stable-diffusion-2-1-realistic 是一个基于 Stable Diffusion v2.1 模型微调而来的文本到图像生成模型。这个项目由 friedrichor 开发，旨在提高图像生成的真实感和质量。

模型概述

该模型是一种基于扩散的文本到图像生成模型，主要用英语进行操作。它采用了潜在扩散模型（Latent Diffusion Model）的架构，并使用了预训练的文本编码器（OpenCLIP-ViT/H）。这个模型不仅可以用于文本到图像的任务，还是多模态对话响应生成模型 Tiger 的一部分。

数据集

模型使用 friedrichor/PhotoChat_120_square_HQ 数据集进行微调。这个数据集包含 120 对精心筛选的图像-文本对。图像来源于 PhotoChat 数据集，经过裁剪和质量提升处理。图像说明则由 BLIP-2 模型生成。

使用方法

使用该模型非常简单，开发者可以通过 Hugging Face 的 Diffusers 库轻松调用。以下是一个基本的使用示例：

首先导入必要的库和模型
设置设备和生成参数
编写提示词和负面提示词
调用模型生成图像
保存生成的图像

提示词模板

为了提高生成图像的质量，项目推荐使用特定的提示词模板。例如，对于包含人物的真实世界图像，可以使用如下模板：

{{描述}}, facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography

对于不包含人物的真实世界图像，可以使用如下模板：

{{描述}}, depth of field. bokeh. soft light. by Yasmin Albatoul, Harry Fayt. centered. extremely detailed. Nikon D850, (35mm|50mm|85mm). award winning photography.

负面提示词

项目还建议使用负面提示词来进一步提升图像质量。例如：

cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs