ComfyUI-ELLA

ELLA在ComfyUI上的实现。

:star2: 更新日志

[2024.4.30] 新增"ELLA文本编码"节点，自动连接ella和clip条件。
[2024.4.24] 升级ELLA应用方法。与ComfyUI生态系统更好兼容。参考ComfyUI_ELLA PR #25中提到的方法。
- 已弃用: 不带"simgas"的"应用ELLA"已被弃用，将在未来版本中移除。
[2024.4.22] 修复多批次时图像质量不稳定的问题。添加CLIP连接（现在支持lora触发词）。
[2024.4.19] 文档化节点。
[2024.4.19] 初始仓库。

:pushpin: 注意事项

"BasicScheduler"节点的SIGMAS或"设置ELLA时间步"节点的TIMESTEPS必须与KSampler设置相同。因为引入了时间步感知语义连接器（TSC），它可以动态调整采样时间步长的语义特征。
如果需要连接CLIP"CONDITIONING"以使LoRA触发词生效，ELLA输出的"CONDITIONING"始终需要连接到"条件（连接）"节点的"conditioning_to"。

:books: 示例工作流程

示例目录中有工作流程示例。您可以直接将这些图像作为工作流程加载到ComfyUI中使用。

工作流程示例

所有旧版工作流程都兼容。但它已被弃用，将在未来版本中移除。

旧版工作流程示例

:tada: 它可以与controlnet一起使用！

controlnet工作流程

:tada: 通过连接CLIP CONDITIONING，它可以与lora触发词一起使用！

:warning: 再次注意，"ELLA CONDITIONING"始终需要连接到"条件（连接）"节点的"conditioning_to"。

lora工作流程

使用"ELLA文本编码"节点可以简化工作流程。

通过升级（2024.4.24），可以实现一些有趣的工作流程，比如仅在正面使用ELLA。如下所示：

仅正面ELLA的lora工作流程

正面 + 负面	仅正面

然而，不能保证仅正面会带来更好的结果。

带AYS的工作流程。

ella_ays工作流程

AYS获得了更多视觉细节和更好的文本对齐效果，参考论文。

带 AYS	不带 AYS

而EMMA正在开发中。

:green_book: 安装

在ComfyUI/custom_nodes/目录下下载或git克隆此仓库。ComfyUI-ELLA需要最新版本的ComfyUI。如果出现问题，请确保升级。

cd ComfyUI/custom_nodes
git clone https://github.com/TencentQQGYLab/ComfyUI-ELLA

接下来安装依赖项。

cd ComfyUI-ELLA
pip install -r requirements.txt

:orange_book: 模型

这些模型必须放置在models下相应的目录中。

请记住，您也可以在extra_model_paths.yaml文件中设置ella和ella_encoder条目，使用任何自定义位置。

ComfyUI/models/ella，如果不存在则创建。
- 在此处放置ELLA模型
ComfyUI/models/ella_encoder，如果不存在则创建。
- 在此处放置FLAN-T5 XL文本编码器，它应该是一个具有transformers结构的文件夹，包含config.json

总之，您应该有以下模型目录结构：

ComfyUI/models/ella/
└── ella-sd1.5-tsc-t5xl.safetensors

ComfyUI/models/ella_encoder/
└── models--google--flan-t5-xl--text_encoder
    ├── config.json
    ├── model.safetensors
    ├── special_tokens_map.json
    ├── spiece.model
    ├── tokenizer_config.json
    └── tokenizer.json

:book: 节点参考

节点参考

:mag: 常见问题

XXX未为'Half'实现。参见问题#12
AYS + Ella生成的图像较暗。参见问题#39
- 检查SamplerCustom节点的add_noise是否启用。
- 降低SamplerCustom节点的cfg。

:memo: 待办事项

支持提示词权重

:hugs: 贡献者（直接和间接）

<table> <tr> <td align="center"><a href="https://github.com/JettHu"><img src="https://avatars.githubusercontent.com/u/35261585?s=460&v=4" width="32px;" alt=""/> JettHu</a></td> <td align="center"><a href="https://github.com/budui"><img src="https://avatars.githubusercontent.com/u/16448529?s=460&v=4" width="32px;" alt=""/> budui</a></td> <td align="center"><a href="https://github.com/kijai"><img src="https://avatars.githubusercontent.com/u/40791699?s=460&v=4" width="32px;" alt=""/> kijai</a></td> <td align="center"><a href="https://github.com/huagetai"><img src="https://avatars.githubusercontent.com/u/1137341?s=460&v=4" width="32px;" alt=""/> huagetai</a></td> </tr> </table>

:yum: 致谢

ComfyUI: https://github.com/comfyanonymous/ComfyUI
Diffusers（借用了时间步模块）: https://github.com/huggingface/diffusers

:wink: 引用

@misc{hu2024ella,
      title={ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment}, 
      author={Xiwei Hu and Rui Wang and Yixiao Fang and Bin Fu and Pei Cheng and Gang Yu},
      year={2024},
      eprint={2403.05135},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}