RGBD semantic segmentation

A paper list of RGBD semantic segmentation.

*Last updated: 2023/10/07

Update log

2020/May - update all of recent papers and make some diagram about history of RGBD semantic segmentation.
2020/July - update some recent papers (CVPR2020) of RGBD semantic segmentation.
2020/August - update some recent papers (ECCV2020) of RGBD semantic segmentation.
2020/October - update some recent papers (CVPR2020, WACV2020) of RGBD semantic segmentation.
2020/November - update some recent papers (ECCV2020, arXiv), the links of papers and codes for RGBD semantic segmentation.
2020/December - update some recent papers (PAMI, PRL, arXiv, ACCV) of RGBD semantic segmentation.
2021/February - update some recent papers (TMM, NeurIPS, arXiv) of RGBD semantic segmentation.
2021/April - update some recent papers (CVPR2021, ICRA2021, IEEE SPL, arXiv) of RGBD semantic segmentation.
2021/July - update some recent papers (CVPR2021, ICME2021, arXiv) of RGBD semantic segmentation.
2021/August - update some recent papers (IJCV, ICCV2021, IEEE SPL, arXiv) of RGBD semantic segmentation.
2022/January - update some recent papers (TITS, PR, IEEE SPL, arXiv) of RGBD semantic segmentation.
2022/March - update benchmark results on Cityscapes and ScanNet datasets.
2022/April - update some recent papers (CVPR, BMVC, IEEE TMM, arXiv) of RGBD semantic segmentation.
2022/May - update some recent papers of RGBD semantic segmentation.
2022/July - update some recent papers of RGBD semantic segmentation.
2023/January - update some recent papers of RGBD semantic segmentation.
2023/October - update some recent papers of RGBD semantic segmentation.

Datasets

The papers related to datasets used mainly in natural/color image segmentation are as follows.

[NYUDv2] The NYU-Depth V2 dataset consists of 1449 RGB-D images showing interior scenes, which all labels are usually mapped to 40 classes. The standard training and test set contain 795 and 654 images, respectively.
[SUN RGB-D] The SUN RGB-D dataset contains 10,335 RGBD images with semantic labels organized in 37 categories. The 5,285 images are used for training, and 5050 images are used for testing.
[2D-3D-S] Stanford-2D-3D-Semantic dataset contains 70496 RGB and depth images as well as 2D annotation with 13 object categories. Areas 1, 2, 3, 4, and 6 are utilized as the training and Area 5 is used as the testing set.
[Cityscapes] Cityscapes contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames.
[ScanNet] ScanNet is an RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations.

Metrics

The papers related to metrics used mainly in RGBD semantic segmentation are as follows.

[PixAcc] Pixel accuracy
[mAcc] Mean accuracy
[mIoU] Mean intersection over union
[f.w.IOU] Frequency weighted IOU

Performance tables

Speed is related to the hardware spec (e.g. CPU, GPU, RAM, etc), so it is hard to make an equal comparison. We select four indexes namely PixAcc, mAcc, mIoU, and f.w.IOU to make comparison. The closer the segmentation result is to the ground truth, the higher the above four indexes are.

NYUDv2

Method	PixAcc	mAcc	mIoU	f.w.IOU	Input	Ref. from	Published	Year
POR	59.1	28.4	29.1		RGBD		CVPR	2013
RGBD R-CNN	60.3	35.1	31.3	47(in LSD-GF)	RGBD		ECCV	2014
DeconvNet	69.9	56.4	42.7	56	RGB	LSD-GF	ICCV	2015
DeepLab	68.7	46.9	36.8	52.5	RGBD	STD2P	ICLR	2015
CRF-RNN	66.3	48.9	35.4	51	RGBD	STD2P	ICCV	2015
Multi-Scale CNN	65.6	45.1	34.1	51.4	RGB	LCSF-Deconv	ICCV	2015
FCN	65.4	46.1	34	49.5	RGBD	LCSF-Deconv	CVPR	2015
Mutex Constraints	63.8	31.5		48.5 (in LSD-GF)	RGBD		ICCV	2015
E2S2	58.1	52.9	31	44.2	RGBD	STD2P	ECCV	2016
BI-3000	58.9	39.3	27.7	43	RGBD	STD2P	ECCV	2016
BI-1000	57.7	37.8	27.1	41.9	RGBD	STD2P	ECCV	2016
LCSF-Deconv		47.3			RGBD		ECCV	2016
LSTM-CF		49.4			RGBD		ECCV	2016
CRF+RF+RFS	73.8				RGBD		PRL	2016
RDFNet-152	76	62.8	50.1		RGBD		ICCV	2017
SCN-ResNet152			49.6		RGBD		ICCV	2017
RDFNet-50	74.8	60.4	47.7		RGBD		ICCV	2017
CFN(RefineNet)			47.7		RGBD		ICCV	2017
RefineNet-152	73.6	58.9	46.5		RGB		CVPR	2017
LSD-GF	71.9	60.7	45.9	59.3	RGBD		CVPR	2017
3D-GNN		55.7	43.1		RGBD		ICCV	2017
DML-Res50			40.2		RGB		IJCAI	2017
STD2P	70.1	53.8	40.1	55.7	RGBD		CVPR	2017
PBR-CNN			33.2		RGB		ICCBS	2017
B-SegNet	68	45.8	32.4		RGB		BMVC	2017
FC-CRF	63.1	39	29.5	48.4	RGBD		TIP	2017
LCR	55.6	31.7	21.8	39.9	RGBD		ICIP	2017
SegNet	54.1	30.5	21	38.5	RGBD	LCR	TPAMI	2017
D-Refine-152	74.1	59.5	47		RGB		ICPR	2018
TRL-ResNet50	76.2	56.3	46.4		RGB		ECCV	2018
D-CNN		56.3	43.9		RGBD		ECCV	2018
RGBD-Geo	70.3	51.7	41.2	54.2	RGBD		MTA	2018
Context	70	53.6	40.6		RGB		TPAMI	2018
DeepLab-LFOV	70.3	49.6	39.4	54.7	RGBD	STD2P	TPAMI	2018
D-depth-reg	66.7	46.3	34.8	50.6	RGBD		PRL	2018
PU-Loop	72.1		44.5		RGB		CVPR	2018
C-DCNN	69	50.8	39.8		RGB		TNNLS	2018
GAD	84.8	68.7	59.6		RGB		CVPR	2019
CTS-IM	76.3		50.6		RGBD		ICIP	2019
PAP	76.2	62.5	50.4		RGB		CVPR	2019
KIL-ResNet101	75.1	58.4	50.2		RGB		ACPR	2019
2.5D-Conv	75.9		49.1		RGBD		ICIP	2019
ACNet			48.3		RGBD		ICIP	2019
3M2RNet	76	63	48		RGBD		SIC	2019
FDNet-16s	73.9	60.3	47.4		RGB		AAAI	2019
DMFNet	74.4	59.3	46.8		RGBD		IEEE Access	2019
MMAF-Net-152	72.2	59.2	44.8		RGBD		arXiv	2019
RTJ-AA			42		RGB		ICRA	2019
JTRL-ResNet50	81.3	60.0	50.3		RGB		TPAMI	2019
3DN-Conv	52.4		39.3		RGB		3DV	2019
SGNet	76.8	63.1	51		RGBD		TIP	2020
SCN-ResNet101			48.3		RGBD		TCYB	2020
RefineNet-Res152-Pool4	74.4	59.6	47.6		RGB		TPAMI	2020
TSNet	73.5	59.6	46.1		RGBD		IEEE IS	2020
PSD-ResNet50	77.0	58.6	51.0		RGB		CVPR	2020
Malleable 2.5D	76.9		50.9		RGBD		ECCV	2020
BCMFP+SA-Gate	77.9		52.4		RGBD		ECCV	2020
MTI-Net	75.3	62.9	49.0		RGB