Open-Vocabulary 3D Scene Segmentation via Dual-Modal Interaction

Wuyang Luan; Lei Pan; Junhui Li; Yuan Zheng; Chang Xu

doi:10.1109/JAS.2024.124857

Volume 12 Issue 10

Oct. 2025

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > 12(10): 2156-2158

W. Luan, L. Pan, J. Li, Y. Zheng, and C. Xu, “Open-vocabulary 3D scene segmentation via dual-modal interaction,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 10, pp. 2156–2158, Oct. 2025. doi: 10.1109/JAS.2024.124857

Citation:

W. Luan, L. Pan, J. Li, Y. Zheng, and C. Xu, “Open-vocabulary 3D scene segmentation via dual-modal interaction,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 10, pp. 2156–2158, Oct. 2025. doi: 10.1109/JAS.2024.124857

Citation:

PDF( 829 KB)

Open-Vocabulary 3D Scene Segmentation via Dual-Modal Interaction

doi: 10.1109/JAS.2024.124857

More Information

Abstract

FullText(HTML)

References(15)

References

[1]	Y. Liu, B. Jiang, and J. Xu, “Axial assembled correspondence network for few-shot semantic segmentation,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 711–721, 2022.
[2]	W. Ren, Y. Tang, Q. Sun, C. Zhao, and Q.-L. Han, “Visual semantic segmentation based on few/zero-shot learning: An overview,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 5, pp. 1106–1126, 2024. doi: 10.1109/JAS.2023.123207
[3]	A. K. Bhandari, A. Ghosh, and I. V. Kumar, “A local contrast fusion based 3D OTSU algorithm for multilevel image segmentation,” IEEE CAA J. Autom. Sinica, vol. 7, no. 1, pp. 200–213, 2020. doi: 10.1109/JAS.2019.1911843
[4]	H. Bangalath, M. Maaz, M. U. Khattak, S. H. Khan, and F. Shahbaz Khan, “Bridging the gap between object and image-level representations for open-vocabulary detection,” Advances in Neural Inform. Processing Systems, vol. 35, no. 33781, p. 33794, 2022.
[5]	M. Xu, Z. Zhang, F. Wei, et al., “A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model,” arXiv preprint arXiv: 2112.14757, 2021.
[6]	W. Zhao, Y. Yan, C. Yang, J. Ye, X. Yang, and K. Huang, “Divide and conquer: 3D point cloud instance segmentation with point-wise binarization,” in Proc. IEEE/CVF International Conf. Computer Vision, 2023, pp. 562−571.
[7]	R. Zhang, Z. Guo, W. Zhang, K. Li, X. Miao, B. Cui, Y. Qiao, P. Gao, and H. Li, “Pointclip: Point cloud understanding by clip,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2022, pp. 8552−8562.
[8]	B. Michele, A. Boulch, G. Puy, M. Bucher, and R. Marlet, “Generative zero-shot learning for semantic segmentation of 3D point clouds,” in Proc. Int. Conf. 3D Vision, 2021, pp. 992−1002.
[9]	A. Cheraghian, S. Rahman, D. Campbell, and L. Petersson, “Transductive zero-shot learning for 3D point cloud classification,” in Proc. IEEE/CVF Winter Conf. Applications of Computer Vision, 2020, pp. 923−933.
[10]	P.-S. Wang, “Octformer: Octree-based transformers for 3D point clouds,” ACM Trans. Graphics, vol. 42, no. 4, pp. 1–11, 2023.
[11]	R. Ding, J. Yang, C. Xue, W. Zhang, S. Bai, and X. Qi, “PLA: Languagedriven open-vocabulary 3D scene understanding,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2023, pp. 7010−7019.
[12]	A. Kirillov, E. Mintun, N. Ravi, et al., “Segment anything,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2023, pp. 4015−4026.
[13]	X. Zhou, R. Girdhar, A. Joulin, P. Krähenbühl, and I. Misra, “Detecting twenty-thousand classes using image-level supervision,” in Proc. European Conf. Computer Vision, 2022, pp. 350−368.
[14]	J. Li, D. Li, S. Savarese, and S. Hoi, “Blip-2: Bootstrapping languageimage pre-training with frozen image encoders and large language models,” in Proc. Int. Conf. Machine Learning, 2023, pp. 19730−19742.
[15]	A. Muzahid, W. Wan, F. Sohel, L. Wu, and L. Hou, “CurveNet: Curvature-based multitask learning deep networks for 3D object recognition,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1177–1187, 2020.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(1) / Tables(4)

Get Citation

PDF

XML

Article Metrics

Article views (15) PDF downloads(1)

Open-Vocabulary 3D Scene Segmentation via Dual-Modal Interaction

doi: 10.1109/JAS.2024.124857

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content