万物修复:分割任何物体与图像修复的结合

万物修复可以修复图像、视频和3D场景中的任何物体!

作者:于涛、冯润森、冯若愚、刘金明、金鑫、曾文军和陈志波。
机构:中国科学技术大学;东方高等研究院。
[论文] [网站] [Hugging Face主页]

简而言之:用户可以通过点击选择图像中的任何对象。借助强大的视觉模型,如SAM、LaMa和Stable Diffusion (SD),万物修复能够顺畅地移除对象(即移除任何物体)。此外,在用户输入文本的提示下,万物修复可以用任何所需的内容填充对象(即填充任何物体)或任意替换其背景(即替换任何物体)。

📜 新闻

[2023/9/15] 移除任何3D物体代码已发布!
[2023/4/30] 移除任何视频中的物体已发布!您可以从视频中移除任何物体!
[2023/4/24] 支持本地网页界面!您可以在本地运行演示网站!
[2023/4/22] 网站上线!您可以通过界面体验万物修复!
[2023/4/22] 移除任何3D物体已发布!您可以从3D场景中移除任何3D物体!
[2023/4/13] arXiv上的技术报告已发布!

🌟 功能

移除任何物体
填充任何物体
替换任何物体
移除任何3D物体 (<span style="color:red">🔥新功能</span>)
填充任何3D物体
替换任何3D物体
移除任何视频中的物体 (<span style="color:red">🔥新功能</span>)
填充任何视频中的物体
替换任何视频中的物体

💡 亮点

支持任何宽高比
支持2K分辨率
arXiv上的技术报告已发布 (<span style="color:red">🔥新功能</span>)
网站已上线 (<span style="color:red">🔥新功能</span>)
本地网页界面已发布 (<span style="color:red">🔥新功能</span>)
支持多种模态(即图像、视频和3D场景) (<span style="color:red">🔥新功能</span>)

<span id="remove-anything">📌 移除任何物体</span>

在图像中点击一个物体,万物修复将立即移除它!

点击一个物体;
分割任何物体模型 (SAM)将物体分割出来;
修复模型(如LaMa)填充"空洞"。

安装

需要 python>=3.8

python -m pip install torch torchvision torchaudio
python -m pip install -e segment_anything
python -m pip install -r lama/requirements.txt

在Windows系统中,我们建议您先安装miniconda, 然后以管理员身份打开Anaconda Powershell Prompt (miniconda3)。接着使用pip安装./lama_requirements_windows.txt, 而不是./lama/requirements.txt。

使用方法

下载Segment Anything和LaMa提供的模型检查点(例如,sam_vit_h_4b8939.pth和big-lama),并将它们放入./pretrained_models。为简便起见,您也可以前往此处,直接下载pretrained_models,将目录放入./中,得到./pretrained_models。

对于MobileSAM,sam_model_type应使用"vit_t",sam_ckpt应使用"./weights/mobile_sam.pt"。有关MobileSAM项目,请参考MobileSAM

bash script/remove_anything.sh

指定一张图像和一个点,移除任何物体将移除该点处的物体。

python remove_anything.py \
    --input_img ./example/remove-anything/dog.jpg \
    --coords_type key_in \
    --point_coords 200 450 \
    --point_labels 1 \
    --dilate_kernel_size 15 \
    --output_dir ./results \
    --sam_model_type "vit_h" \
    --sam_ckpt ./pretrained_models/sam_vit_h_4b8939.pth \
    --lama_config ./lama/configs/prediction/default.yaml \
    --lama_ckpt ./pretrained_models/big-lama

如果您的机器有显示设备,可以将--coords_type key_in更改为--coords_type click。如果设置为click,运行上述命令后,图像将被显示。(1)使用左键单击记录点击坐标。支持修改点,只记录最后一个点的坐标。(2)使用右键单击完成选择。

演示

<span id="fill-anything">📌 填充任何物体</span>

<p align="center">文本提示: "长凳上的泰迪熊"</p> <p align="center"> <img src="https://yellow-cdn.veclightyear.com/2b54e442/95722038-bb37-4cf7-82e2-d9aa269e4137.gif" alt="image" style="width:400px;"> </p>

点击一个物体,输入您想填充的内容,万物修复将填充它!

点击一个物体;
SAM将物体分割出来;
输入文本提示;
文本引导的修复模型(如Stable Diffusion)根据文本填充"空洞"。

安装

需要 python>=3.8

python -m pip install torch torchvision torchaudio
python -m pip install -e segment_anything
python -m pip install diffusers transformers accelerate scipy safetensors

使用方法

下载Segment Anything提供的模型检查点(例如,sam_vit_h_4b8939.pth)并将它们放入./pretrained_models。为简便起见,您也可以前往此处,直接下载pretrained_models,将目录放入./中,得到./pretrained_models。

对于MobileSAM,sam_model_type应使用"vit_t",sam_ckpt应使用"./weights/mobile_sam.pt"。有关MobileSAM项目,请参考MobileSAM

bash script/fill_anything.sh

指定一张图像、一个点和文本提示,然后运行:

python fill_anything.py \
    --input_img ./example/fill-anything/sample1.png \
    --coords_type key_in \
    --point_coords 750 500 \
    --point_labels 1 \
    --text_prompt "长凳上的泰迪熊" \
    --dilate_kernel_size 50 \
    --output_dir ./results \
    --sam_model_type "vit_h" \
    --sam_ckpt ./pretrained_models/sam_vit_h_4b8939.pth

演示

<table> <caption align="center">文本提示: "手中的相机镜头"</caption> <tr> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/a0cf8654-6b01-4392-b5be-3063f7700657.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/e42144e8-ba8a-4edb-8dc0-714439897baf.png" width="100%"></td> <td><img src <table> <caption align="center">文本提示: "墙上的毕加索画作"</caption> <tr> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/9be8d8ef-50a2-4742-a5d9-cad85ed8ba24.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/7dc14575-8f1e-440a-be9a-58dcccf713a2.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/ee4d1e09-170f-48ba-ba1a-841a648b85a4.png" width="100%"></td> </tr> </table> <table> <caption align="center">文本提示: "海上的航空母舰"</caption> <tr> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/c50fdc6a-42d6-4dc5-bcf5-77c84087dc80.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/1ad13459-461f-4a6b-b6c4-b98cea2ac75e.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/fc3c9ded-6b3a-41dd-8374-83879472644d.png" width="100%"></td> </tr> </table> <table> <caption align="center">文本提示: "道路上的跑车"</caption> <tr> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/4e7d535c-e6aa-4fa1-868a-aa7f85ae52bf.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/d1a785fc-e4c0-4067-82ce-4465f6e7f85c.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/658965be-9222-4fc9-9ec8-bcd74c731ebe.png" width="100%"></td> </tr> </table>

<span id="replace-anything">📌 替换任何物体</span>

<p align="center">文本提示: "办公室里的男人"</p> <p align="center"> <img src="https://yellow-cdn.veclightyear.com/2b54e442/c1f2f935-a854-4656-9a84-60c1405826cd.gif" alt="image" style="width:400px;"> </p>

点击一个对象，输入你想替换的背景，Inpaint Anything 就会替换它！

点击一个对象;
SAM分割出该对象;
输入文本提示;
文本提示引导的修复模型(例如Stable Diffusion)根据文本替换背景。

安装

需要 python>=3.8

python -m pip install torch torchvision torchaudio
python -m pip install -e segment_anything
python -m pip install diffusers transformers accelerate scipy safetensors

使用方法

下载Segment Anything提供的模型检查点(例如sam_vit_h_4b8939.pth)并将它们放入./pretrained_models。为简便起见，你也可以直接到这里下载pretrained_models，将目录放入./中得到./pretrained_models。

对于MobileSAM，sam_model_type应使用"vit_t"，sam_ckpt应使用"./weights/mobile_sam.pt"。关于MobileSAM项目，请参考MobileSAM

bash script/replace_anything.sh

指定一张图像、一个点和文本提示，然后运行:

python replace_anything.py \
    --input_img ./example/replace-anything/dog.png \
    --coords_type key_in \
    --point_coords 750 500 \
    --point_labels 1 \
    --text_prompt "坐在秋千上" \
    --output_dir ./results \
    --sam_model_type "vit_h" \
    --sam_ckpt ./pretrained_models/sam_vit_h_4b8939.pth

演示

<table> <caption align="center">文本提示: "坐在秋千上"</caption> <tr> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/9366ce80-79a1-4bb6-9e97-da55beaa3565.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/909364df-de02-4acb-81e1-2e0492ecd5c8.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/250faed6-2612-42ed-a1f0-d7b389ded2ad.png" width="100%"></td> </tr> </table> <table> <caption align="center">文本提示: "一辆公交车，在乡间公路中央，夏天"</caption> <tr> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/017c4287-62fd-48c6-b775-0f352ad91a95.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/df28105b-b83c-494d-9e88-c87c9e4c4fe4.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/146b6fbb-b4ae-42b6-8c8d-3e9fb7688de1.png" width="100%"></td> </tr> </table> <table> <caption align="center">文本提示: "早餐"</caption> <tr> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/fe07f6c1-c826-49ba-9862-d3435eeb53cd.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/835d6361-a328-4684-beb2-2f493df4e029.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/d7a9a3d1-f7e1-40fc-9421-a81e928b8f1f.png" width="100%"></td> </tr> </table> <table> <caption align="center">文本提示: "城市中的十字路口"</caption> <tr> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/24e943a8-b514-4b0a-a2af-f33d99454f90.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/5a4d0166-2915-4e65-860b-cad3dd478191.png" width="100%"></td> <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/43294d9e-69fc-43fa-b775-17497998465b.png" width="100%"></td> </tr> </table>

<span id="remove-anything-3d">📌 3D移除任何物体</span>

只需在源视图的第一个视图中对一个对象进行单次点击，3D移除任何物体就能从整个场景中移除该对象！

在源视图的第一个视图中点击一个对象;
SAM分割出该对象(有三种可能的掩码);
选择一个掩码;
使用诸如OSTrack这样的跟踪模型来跟踪这些视图中的对象;
SAM根据跟踪结果在每个源视图中分割出该对象;
使用诸如LaMa这样的修复模型来修复每个源视图中的对象。
使用诸如NeRF这样的新视图合成模型来合成没有该对象的场景的新视图。

安装

需要 python>=3.8

python -m pip install torch torchvision torchaudio
python -m pip install -e segment_anything
python -m pip install -r lama/requirements.txt
python -m pip install jpeg4py lmdb

使用方法

下载Segment Anything和LaMa提供的模型检查点(例如sam_vit_h_4b8939.pth)，并将它们放入./pretrained_models。此外，从这里下载OSTrack预训练模型(例如vitb_384_mae_ce_32x4_ep300.pth)并将其放入./pytracking/pretrain。另外，下载nerf_llff_data，并将它们放入./example/3d。为简便起见，你也可以直接到这里下载pretrained_models，将目录放入./中得到./pretrained_models。此外，下载pretrain，将目录放入./pytracking中得到./pytracking/pretrain。

对于MobileSAM，sam_model_type应使用"vit_t"，sam_ckpt应使用"./weights/mobile_sam.pt"。关于MobileSAM项目，请参考MobileSAM

bash script/remove_anything_3d.sh

指定一个3D场景、一个点、场景配置和掩码索引(指示使用第一个视图的哪个掩码结果)，3D移除任何物体就会从整个场景中移除该对象。

python remove_anything_3d.py \
      --input_dir ./example/3d/horns \
      --coords_type key_in \
      --point_coords 830 405 \
      --point_labels 1 \
      --dilate_kernel_size 15 \
      --output_dir ./results \
      --sam_model_type "vit_h" \
      --sam_ckpt ./pretrained_models/sam_vit_h_4b8939.pth \
      --lama_config ./lama/configs/prediction/default.yaml \
      --
## <span id="remove-anything-video">📌 移除视频中的任何物体</span>
<table>
    <tr>
      <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/5f000ef6-c4e4-47fc-a051-b8ecfe869eba.gif" width="100%"></td>
      <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/205b529e-570b-4de8-b698-3e5c584f1a16.gif" width="100%"></td>
      <td><img src="https://yellow-cdn.veclightyear.com/2b54e442/82a65b64-4c2f-4fdd-9da3-c1b02fd8c4f8.gif" width="100%"></td>
    </tr>
</table>

只需在视频第一帧中**单击**一个物体，"移除视频中的任何物体"功能就可以从整个视频中删除该物体！
- 在视频第一帧中点击一个物体；
- [SAM](https://segment-anything.com/)对物体进行分割（提供三种可能的蒙版）；
- 选择一个蒙版；
- 使用如[OSTrack](https://github.com/botaoye/OSTrack)等跟踪模型在视频中跟踪该物体；
- SAM根据跟踪结果在每一帧中分割出该物体；
- 使用如[STTN](https://github.com/researchmm/STTN)等视频修复模型对每一帧中的物体进行修复。

### 安装
需要`python>=3.8`
```bash
python -m pip install torch torchvision torchaudio
python -m pip install -e segment_anything
python -m pip install -r lama/requirements.txt
python -m pip install jpeg4py lmdb

使用方法

下载Segment Anything和STTN提供的模型检查点（例如，sam_vit_h_4b8939.pth和sttn.pth），并将它们放入./pretrained_models。此外，从这里下载OSTrack预训练模型（例如，vitb_384_mae_ce_32x4_ep300.pth）并将其放入./pytracking/pretrain。为简便起见，你也可以直接前往这里，下载pretrained_models，将目录放入./中并获得./pretrained_models。另外，下载pretrain，将目录放入./pytracking中并获得./pytracking/pretrain。

对于MobileSAM，sam_model_type应使用"vit_t"，sam_ckpt应使用"./weights/mobile_sam.pt"。关于MobileSAM项目，请参考MobileSAM

bash script/remove_anything_video.sh

指定一个视频、一个点、视频FPS和蒙版索引（表示使用第一帧的哪个蒙版结果），"移除视频中的任何物体"功能将从整个视频中删除该物体。

python remove_anything_video.py \
    --input_video ./example/video/paragliding/original_video.mp4 \
    --coords_type key_in \
    --point_coords 652 162 \
    --point_labels 1 \
    --dilate_kernel_size 15 \
    --output_dir ./results \
    --sam_model_type "vit_h" \
    --sam_ckpt ./pretrained_models/sam_vit_h_4b8939.pth \
    --lama_config lama/configs/prediction/default.yaml \
    --lama_ckpt ./pretrained_models/big-lama \
    --tracker_ckpt vitb_384_mae_ce_32x4_ep300 \
    --vi_ckpt ./pretrained_models/sttn.pth \
    --mask_idx 2 \
    --fps 25

--mask_idx通常设置为2，这通常是第一帧最可信的蒙版结果。如果物体分割效果不好，你可以尝试其他蒙版（0或1）。

演示

致谢

其他有趣的仓库

引用

如果你发现这项工作对你的研究有用，请引用我们：

@article{yu2023inpaint,
  title={Inpaint Anything: Segment Anything Meets Image Inpainting},
  author={Yu, Tao and Feng, Runseng and Feng, Ruoyu and Liu, Jinming and Jin, Xin and Zeng, Wenjun and Chen, Zhibo},
  journal={arXiv preprint arXiv:2304.06790},
  year={2023}
}