IEEE/CAA Journal of Automatica Sinica
Citation: | W. Luan, L. Pan, J. Li, Y. Zheng, and C. Xu, “Open-vocabulary 3D scene segmentation via dual-modal interaction,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 10, pp. 2156–2158, Oct. 2025. doi: 10.1109/JAS.2024.124857 |
[1] |
Y. Liu, B. Jiang, and J. Xu, “Axial assembled correspondence network for few-shot semantic segmentation,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 711–721, 2022.
|
[2] |
W. Ren, Y. Tang, Q. Sun, C. Zhao, and Q.-L. Han, “Visual semantic segmentation based on few/zero-shot learning: An overview,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 5, pp. 1106–1126, 2024. doi: 10.1109/JAS.2023.123207
|
[3] |
A. K. Bhandari, A. Ghosh, and I. V. Kumar, “A local contrast fusion based 3D OTSU algorithm for multilevel image segmentation,” IEEE CAA J. Autom. Sinica, vol. 7, no. 1, pp. 200–213, 2020. doi: 10.1109/JAS.2019.1911843
|
[4] |
H. Bangalath, M. Maaz, M. U. Khattak, S. H. Khan, and F. Shahbaz Khan, “Bridging the gap between object and image-level representations for open-vocabulary detection,” Advances in Neural Inform. Processing Systems, vol. 35, no. 33781, p. 33794, 2022.
|
[5] |
M. Xu, Z. Zhang, F. Wei, et al., “A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model,” arXiv preprint arXiv: 2112.14757, 2021.
|
[6] |
W. Zhao, Y. Yan, C. Yang, J. Ye, X. Yang, and K. Huang, “Divide and conquer: 3D point cloud instance segmentation with point-wise binarization,” in Proc. IEEE/CVF International Conf. Computer Vision, 2023, pp. 562−571.
|
[7] |
R. Zhang, Z. Guo, W. Zhang, K. Li, X. Miao, B. Cui, Y. Qiao, P. Gao, and H. Li, “Pointclip: Point cloud understanding by clip,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2022, pp. 8552−8562.
|
[8] |
B. Michele, A. Boulch, G. Puy, M. Bucher, and R. Marlet, “Generative zero-shot learning for semantic segmentation of 3D point clouds,” in Proc. Int. Conf. 3D Vision, 2021, pp. 992−1002.
|
[9] |
A. Cheraghian, S. Rahman, D. Campbell, and L. Petersson, “Transductive zero-shot learning for 3D point cloud classification,” in Proc. IEEE/CVF Winter Conf. Applications of Computer Vision, 2020, pp. 923−933.
|
[10] |
P.-S. Wang, “Octformer: Octree-based transformers for 3D point clouds,” ACM Trans. Graphics, vol. 42, no. 4, pp. 1–11, 2023.
|
[11] |
R. Ding, J. Yang, C. Xue, W. Zhang, S. Bai, and X. Qi, “PLA: Languagedriven open-vocabulary 3D scene understanding,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2023, pp. 7010−7019.
|
[12] |
A. Kirillov, E. Mintun, N. Ravi, et al., “Segment anything,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2023, pp. 4015−4026.
|
[13] |
X. Zhou, R. Girdhar, A. Joulin, P. Krähenbühl, and I. Misra, “Detecting twenty-thousand classes using image-level supervision,” in Proc. European Conf. Computer Vision, 2022, pp. 350−368.
|
[14] |
J. Li, D. Li, S. Savarese, and S. Hoi, “Blip-2: Bootstrapping languageimage pre-training with frozen image encoders and large language models,” in Proc. Int. Conf. Machine Learning, 2023, pp. 19730−19742.
|
[15] |
A. Muzahid, W. Wan, F. Sohel, L. Wu, and L. Hou, “CurveNet: Curvature-based multitask learning deep networks for 3D object recognition,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1177–1187, 2020.
|