[CVPR2024 亮点] LangSplat：3D语言高斯散射

预览图

本仓库包含了与论文"LangSplat: 3D语言高斯散射"（CVPR 2024）相关的官方作者实现，该论文可在此处找到。我们还提供了预处理的带有语言特征的3D-OVS数据集以及预训练模型。

<section class="section" id="BibTeX"> <div class="container is-max-desktop content"> <h2 class="title">BibTeX</h2> <pre><code>@article{qin2023langsplat, title={LangSplat: 3D Language Gaussian Splatting}, author={Qin, Minghan and Li, Wanhua and Zhou, Jiawei and Wang, Haoqian and Pfister, Hanspeter}, journal={arXiv preprint arXiv:2312.16084}, year={2023} }</code></pre> </div> </section>

克隆仓库

该仓库包含子模块，因此请使用以下命令检出：

# SSH
git clone git@github.com:minghanqin/LangSplat.git --recursive

或

# HTTPS
git clone https://github.com/minghanqin/LangSplat.git --recursive

概述

代码库有3个主要组成部分：

基于PyTorch的优化器，用于从带有语言特征输入的SfM数据集生成LangSplat模型
场景级语言自编码器，用于缓解显式建模带来的大量内存需求
帮助您将自己的图像转换为可优化的SfM数据集（带有语言特征）的脚本

这些组件已在Ubuntu Linux 18.04上进行了测试。下面各节中提供了设置和运行每个组件的说明。

数据集

在我们论文的实验部分，我们主要使用了两个数据集：3D-OVS数据集和LERF数据集。

3D-OVS数据集可通过以下链接下载：下载3D-OVS数据集。

对于LERF数据集，我们扩展了其现有集合，并提供了相应的COLMAP数据。这些资源可通过以下链接访问：下载扩展LERF数据集和COLMAP数据。

优化器

优化器在Python环境中使用PyTorch和CUDA扩展来生成训练模型。

硬件要求

支持CUDA的GPU，计算能力7.0+
24 GB显存（用于训练达到论文评估质量）

软件要求

Conda（推荐用于简单设置）
用于PyTorch扩展的C++编译器（我们使用VS Code）
用于PyTorch扩展的CUDA SDK 11（我们使用11.8）
C++编译器和CUDA SDK必须兼容

设置

环境设置

我们默认提供的安装方法基于Conda包和环境管理：

conda env create --file environment.yml
conda activate langsplat

快速开始

将预训练模型下载到output/，然后简单使用

python render.py -m output/$CASENAME --include_feature

处理您自己的场景

开始之前

首先，将您的图像放入数据目录。

<dataset_name>
|---input
|   |---<image 0>
|   |---<image 1>
|   |---...

其次，您需要按照3dgs仓库获取以下数据集格式和预训练的RGB模型。

<dataset_name>
|---images
|   |---<image 0>
|   |---<image 1>
|   |---...
|---input
|   |---<image 0>
|   |---<image 1>
|   |---...
|---output
|   |---<dataset_name>
|   |   |---point_cloud/iteration_30000/point_cloud.ply
|   |   |---cameras.json
|   |   |---cfg_args
|   |   |---chkpnt30000.pth
|   |   |---input.ply
|---sparse
    |---0
        |---cameras.bin
        |---images.bin
        |---points3D.bin

环境设置

请安装segment-anything-langsplat并从此处下载SAM的检查点到ckpts/。

流程

按照process.sh在您自己的场景上训练LangSplat。

步骤1：生成场景的语言特征。 将图像数据放入<dataset_name>/下的"input"目录，然后运行以下代码。
```
python preprocess.py --dataset_path $dataset_path 
```

步骤2：训练自编码器并获取低维特征。

# 训练自编码器
cd autoencoder
python train.py --dataset_name $dataset_path --encoder_dims 256 128 64 32 3 --decoder_dims 16 32 64 128 256 256 512 --lr 0.0007 --output ae_ckpt
# 获取场景的3维语言特征
python test.py --dataset_name $dataset_path --output

我们的模型期望在源路径位置有以下数据集结构：

<dataset_name>
|---images
|   |---<image 0>
|   |---<image 1>
|   |---...
|---language_feature
|   |---00_f.npy
|   |---00_s.npy
|   |---...
|---language_feature_dim3
|   |---00_f.npy
|   |---00_s.npy
|   |---...
|---output
|   |---<dataset_name>
|   |   |---point_cloud/iteration_30000/point_cloud.ply
|   |   |---cameras.json
|   |   |---cfg_args
|   |   |---chkpnt30000.pth
|   |   |---input.ply
|---sparse
    |---0
        |---cameras.bin
        |---images.bin
        |---points3D.bin

步骤3：训练LangSplat。

python train.py -s dataset_path -m output/${casename} --start_checkpoint $dataset_path/output/$casename/chkpnt30000.pth --feature_level ${level}

步骤4：渲染LangSplat。

python render.py -s dataset_path -m output/${casename} --feature_level ${level}

步骤5：评估。 首先，我们通过步骤4生成3维语言特征图。随后，解码器将特征从3维提升到512维。有关进一步操作和详细说明，请参阅补充材料。
- LERF上的3D对象定位和LERF上的3D语义分割。我们的评估代码基于LERF和NerfStudio，感谢这些令人印象深刻的开源项目！
  - 请首先下载lerf_ovs。
  - 将gt_folder设置为lerf_ovs/label的路径。
  - 确保在运行评估代码之前完成步骤4。
```
cd eval
sh eval.sh
```