Scribble-Supervised Video Object Segmentation

Peiliang Huang; Junwei Han; Nian Liu; Jun Ren; Dingwen Zhang

doi:10.1109/JAS.2021.1004210

Volume 9 Issue 2

Feb. 2022

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2022 > 9(2): 339-353

P. L. Huang, J. W. Han, N. Liu, J. Ren, and D. W. Zhang, “Scribble-supervised video object segmentation,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 2, pp. 339–353, Feb. 2022. doi: 10.1109/JAS.2021.1004210

Citation:

P. L. Huang, J. W. Han, N. Liu, J. Ren, and D. W. Zhang, “Scribble-supervised video object segmentation,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 2, pp. 339–353, Feb. 2022. doi: 10.1109/JAS.2021.1004210

Citation:

PDF( 1415 KB)

Scribble-Supervised Video Object Segmentation

doi: 10.1109/JAS.2021.1004210

Peiliang Huang^1
,,
Junwei Han^1
,,
Nian Liu^2
,,
Jun Ren^3
,,
Dingwen Zhang^{1
,
,}

1.
Brain and Artificial Intelligence Laboratory, School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
2.
Department of Engagement Services, Mohamed Bin Zayed University of Artificial Intelligence, AbuDhabi, United Arab Emirate
3.
Science and Technology on Complex System Control and Intelligent Agent Cooperation Laboratory, Beijing, China

Funds: This work was supported in part by the National Key R&D Program of China (2017YFB0502904) and the National Science Foundation of China (61876140)

More Information

Author Bio:
Peiliang Huang received the M.S. degree from Guilin University of Technology, Guilin, China, in 2017. He is currently a Ph.D. candidate at the School of Automation, Northwestern Polytechnical University, Xi’an, China. His research interests include computer vision and deep learning, especially on object segmentation

Junwei Han (Senior Member, IEEE) received the Ph.D. degree from Northwestern Polytechnical University, Xi’an, China, in 2003. He is a Professor with Northwestern Polytechnical University, Xi’an, China. He was a Research Fellow at Nanyang Technological University, the Chinese University of Hong Kong, and University of Dundee. His research interests include computer vision and brain imaging analysis. He has published over 100 papers in IEEE Transactions and top tier conferences

Nian Liu received the Ph.D. degree from Northwestern Polytechnical University, Xi’an, China in 2020. He is currently with the Department of Engagement Services, Mohamed Bin Zayed University of Artificial Intelligence, AbuDhabi, United Arab Emirate. His research interests include computer vision and multimedia processing, especially on saliency detection and deep learning

Jun Ren received the M.E. degree from the School of Optics and Photonics, Beijing Institute of Technology, Beijing, China in 2013. She is currently working at Science and Technology on Complex System Control and Intelligent Agent Cooperation Laboratory, Beijing, China. Her current research interests include machine learning and computer vision, with particular interests in object detection and tracking

Dingwen Zhang (Member, IEEE) received the B.E. and Ph.D. degrees from Northwestern Polytechnical University, Xi’an, China, in 2012 and 2018, respectively. From 2015 to 2017, he was a Visiting Scholar at the Robotic Institute, Carnegie Mellon University. He is currently a Professor with Northwestern Polytechnical University, Xi’an, China. His research interests include computer vision and multimedia processing, especially on saliency detection, video object segmentation, and weakly supervised learning
Corresponding author: Dingwen Zhang, e-mail: zhangdingwen2006yyy@gmail.com
Received Date: 2021-01-12
Accepted Date: 2021-03-14

Available Online: 2023-03-29

Abstract

Abstract

Recently, video object segmentation has received great attention in the computer vision community. Most of the existing methods heavily rely on the pixel-wise human annotations, which are expensive and time-consuming to obtain. To tackle this problem, we make an early attempt to achieve video object segmentation with scribble-level supervision, which can alleviate large amounts of human labor for collecting the manual annotation. However, using conventional network architectures and learning objective functions under this scenario cannot work well as the supervision information is highly sparse and incomplete. To address this issue, this paper introduces two novel elements to learn the video object segmentation model. The first one is the scribble attention module, which captures more accurate context information and learns an effective attention map to enhance the contrast between foreground and background. The other one is the scribble-supervised loss, which can optimize the unlabeled pixels and dynamically correct inaccurate segmented areas during the training stage. To evaluate the proposed method, we implement experiments on two video object segmentation benchmark datasets, YouTube-video object segmentation (VOS), and densely annotated video segmentation (DAVIS)-2017. We first generate the scribble annotations from the original per-pixel annotations. Then, we train our model and compare its test performance with the baseline models and other existing works. Extensive experiments demonstrate that the proposed method can work effectively and approach to the methods requiring the dense per-pixel annotations.
- Convolutional neural networks (CNNs),
- scribble,
- self-attention,
- video object segmentation,
- weakly supervised

FullText(HTML)

References(81)

References

[1]	Z. Teng, J. Xing, Q. Wang, C. Lang, S. Feng, and Y. Jin, “Robust object tracking based on temporal and spatial deep networks,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 1144–1153.
[2]	Z. Ji, K. Xiong, Y. Pang, and X. Li, “Video summarization with attention-based encoder-decoder networks,” IEEE Trans. Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1709–1717, 2019.
[3]	D. Shao, Y. Xiong, Y. Zhao, Q. Huang, Y. Qiao, and D. Lin, “Find and focus: Retrieve and localize video events with natural language queries,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 200–216.
[4]	Q. Wang, J. Gao, and X. Li, “Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes,” IEEE Trans. Image Processing, vol. 28, no. 9, pp. 4376–4386, 2019. doi: 10.1109/TIP.2019.2910667
[5]	Y. Chen, W. Li, and L. Van Gool, “Road: Reality oriented adaptation for semantic segmentation of urban scenes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7892–7901.
[6]	F. Perazzi, A. Khoreva, R. Benenson, B. Schiele, and A. SorkineHornung, “Learning video object segmentation from static images,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 2663–2672.
[7]	S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool, “One-shot video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 221–230.
[8]	C. Ventura, M. Bellver, A. Girbau, A. Salvador, F. Marques, and X. Giro-i Nieto, “Rvos: End-to-end recurrent network for video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 5277–5286.
[9]	J. Johnander, M. Danelljan, E. Brissman, F. S. Khan, and M. Felsberg, “A generative appearance model for end-to-end video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 8953–8962.
[10]	P. Voigtlaender, Y. Chai, F. Schroff, H. Adam, B. Leibe, and L.-C. Chen, “Feelvos: Fast end-to-end embedding learning for video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 9481–9490.
[11]	Y. J. Koh and C.-S. Kim, “Primary object segmentation in videos based on region augmentation and reduction,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017, pp. 7417–7425.
[12]	T. Brox and J. Malik, “Object segmentation by long term analysis of point trajectories,” in Proc. European Conf. Computer Vision, Springer, 2010, pp. 282–295.
[13]	H. Song, W. Wang, S. Zhao, J. Shen, and K.-M. Lam, “Pyramid dilated deeper convlstm for video salient object detection,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 715–731.
[14]	X. Lu, W. Wang, C. Ma, J. Shen, L. Shao, and F. Porikli, “See more, know more: Unsupervised video object segmentation with co-attention siamese networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 3623–3632.
[15]	X. Lu, W. Wang, J. Shen, Y.-W. Tai, D. Crandall, and S. C. Hoi, “Learning video object segmentation from unlabeled videos,” arXiv preprint arXiv: 2003.05020, 2020.
[16]	M. Tang, A. Djelouah, F. Perazzi, Y. Boykov, and C. Schroers, “Normalized cut loss for weakly-supervised cnn segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 1818–1827.
[17]	M. Tang, F. Perazzi, A. Djelouah, I. Ben Ayed, C. Schroers, and Y. Boykov, “On regularized losses for weakly-supervised CNN segmentation,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 507–522.
[18]	P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” in Proc. Advances in Neural Information Processing Systems, 2011, pp. 109–117.
[19]	K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool, “Video object segmentation without temporal information,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 41, no. 6, pp. 1515–1530, 2018.
[20]	P. Voigtlaender and B. Leibe, “Online adaptation of convolutional neural networks for video object segmentation,” arXiv preprint arXiv: 1706.09364, 2017.
[21]	Y. Chen, J. Pont-Tuset, A. Montes, and L. Van Gool, “Blazingly fast video object segmentation with pixel-wise metric learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 1189–1198.
[22]	Y. J. Lee, J. Kim, and K. Grauman, “Key-segments for video object segmentation,” in Proc. Int. Conf. Computer Vision, IEEE, 2011, pp. 1995–2002.
[23]	T. Ma and L. J. Latecki, “Maximum weight cliques with mutex constraints for video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012, pp. 670–677.
[24]	X. Lu, C. Ma, B. Ni, X. Yang, I. Reid, and M.-H. Yang, “Deep regression tracking with shrinkage loss,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 353–369.
[25]	D. Zhang, O. Javed, and M. Shah, “Video object segmentation through spatially accurate and temporally dense extraction of primary object regions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2013, pp. 628–635.
[26]	L. Chen, J. Shen, W. Wang, and B. Ni, “Video object segmentation via dense trajectories,” IEEE Trans. Multimedia, vol. 17, no. 12, pp. 2225–2234, 2015. doi: 10.1109/TMM.2015.2481711
[27]	K. Fragkiadaki, G. Zhang, and J. Shi, “Video segmentation by tracing discontinuities in a trajectory embedding,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, IEEE, 2012, pp. 1846–1853.
[28]	P. Ochs and T. Brox, “Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions,” in Proc. Int. Conf. Computer Vision, IEEE, 2011, pp. 1583–1590.
[29]	A. Papazoglou and V. Ferrari, “Fast object segmentation in unconstrained video,” in Proc. IEEE Int. Conf. Computer Vision, 2013, pp. 1777–1784.
[30]	W.-D. Jang, C. Lee, and C.-S. Kim, “Primary object segmentation in videos via alternate convex optimization of foreground and background distributions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 696–704.
[31]	A. Faktor and M. Irani, “Video segmentation by non-local consensus voting.” in Proc. British Machine Vision Conference (BMVC), 2014, vol. 2, no. 7, Article No. 8.
[32]	Y.-T. Hu, J.-B. Huang, and A. G. Schwing, “Unsupervised video object segmentation using motion saliency-guided spatio-temporal propagation,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 786–802.
[33]	W. Wang, J. Shen, and F. Porikli, “Saliency-aware geodesic video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2015, pp. 3395–3402.
[34]	S. D. Jain, B. Xiong, and K. Grauman, “Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 3664–3673.
[35]	J. Cheng, Y.-H. Tsai, S. Wang, and M.-H. Yang, “Segflow: Joint learning for video object segmentation and optical flow,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 686–695.
[36]	S. Li, B. Seybold, A. Vorobyov, A. Fathi, Q. Huang, and C.-C. Jay Kuo, “Instance embedding transfer to unsupervised video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 6526–6535.
[37]	W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 3064–3074.
[38]	D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “Scribblesup: Scribblesupervised convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 3159–3167.
[39]	B. Wang, G. Qi, S. Tang, T. Zhang, Y. Wei, L. Li, and Y. Zhang, “Boundary perception guidance: A scribble-supervised semantic segmentation approach,” in Proc. 28th Int. Joint Conf. Artificial Intelligence. AAAI Press, 2019, pp. 3663–3669.
[40]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. Advances Neural Information Processing Systems, 2017, pp. 5998–6008.
[41]	X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7794–7803.
[42]	P. Yi, Z. Wang, K. Jiang, J. Jiang, and J. Ma, “Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations,” in Proc. IEEE Int. Conf. Computer Vision, 2019, pp. 3106–3115.
[43]	J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 3146–3154.
[44]	H. Zhang, H. Zhang, C. Wang, and J. Xie, “Co-occurrent features in semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 548–557.
[45]	L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2017.
[46]	H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” arXiv preprint arXiv: 1805.08318, 2018.
[47]	T. Joy, A. Desmaison, T. Ajanthan, R. Bunel, M. Salzmann, P. Kohli, P. H. Torr, and M. P. Kumar, “Efficient relaxations for dense crfs with sparse higher-order potentials,” SIAM Journal on Imaging Sciences, vol. 12, no. 1, pp. 287–318, 2019. doi: 10.1137/18M1178104
[48]	B. Zhang, J. Xiao, Y. Wei, M. Sun, and K. Huang, “Reliability does matter: An end-to-end weakly supervised semantic segmentation approach,” arXiv preprint arXiv: 1911.08039, 2019.
[49]	Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001. doi: 10.1109/34.969114
[50]	C. Liu, W. T. Freeman, E. H. Adelson, and Y. Weiss, “Human-assisted motion annotation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, IEEE, 2008, pp. 1–8.
[51]	V. Karasev, A. Ravichandran, and S. Soatto, “Active frame, location, and detector selection for automated and manual video annotation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2014, pp. 2123–2130.
[52]	B. Luo, H. Li, F. Meng, Q. Wu, and C. Huang, “Video object segmentation via global consistency aware query strategy,” IEEE Trans. Multimedia, vol. 19, no. 7, pp. 1482–1493, 2017. doi: 10.1109/TMM.2017.2671447
[53]	D. Moltisanti, S. Fidler, and D. Damen, “Action recognition from single timestamp supervision in untrimmed videos,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 9915–9924.
[54]	N. Xu, L. Yang, Y. Fan, D. Yue, Y. Liang, J. Yang, and T. Huang, “Youtube-VOS: A large-scale video object segmentation benchmark,” arXiv preprint arXiv: 1809.03327, 2018.
[55]	J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbeláez, A. Sorkine-Hornung, and L. Van Gool, “The 2017 DAVIS challenge on video object segmentation,” arXiv preprint arXiv: 1704.00675, 2017.
[56]	Z. Guo and R. W. Hall, “Parallel thinning with two-subiteration algorithms,” Communications of the ACM, vol. 32, no. 3, pp. 359–373, 1989. doi: 10.1145/62065.62074
[57]	F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 724–732.
[58]	H. Lin, X. Qi, and J. Jia, “Agss-VOS: Attention guided single-shot video object segmentation,” in Proc. IEEE Int. Conf. Computer Vision, 2019, pp. 3949–3957.
[59]	A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, “Pytorch: An imperative style, high-performance deep learning library,” in Proc. Advances in Neural Information Processing Systems, 2019, pp. 8024–8035.
[60]	D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, 2014.
[61]	H. Zhao, Y. Zhang, S. Liu, J. Shi, C. Change Loy, D. Lin, and J. Jia, “Psanet: Point-wise spatial attention network for scene parsing,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 267–283.
[62]	Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “Ccnet: Criss-cross attention for semantic segmentation,” in Proc. IEEE Int. Conf. Computer Vision, 2019, pp. 603–612.
[63]	L. Yang, Y. Wang, X. Xiong, J. Yang, and A. K. Katsaggelos, “Efficient video object segmentation via network modulation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 6499–6507.
[64]	S. Wug Oh, J.-Y. Lee, K. Sunkavalli, and S. Joo Kim, “Fast video object segmentation by reference-guided mask propagation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7376–7385.
[65]	J. Luiten, P. Voigtlaender, and B. Leibe, “Premvos: Proposal-generation, refinement and merging for video object segmentation,” in Proc. Asian Conf. Computer Vision, Springer, 2018, pp. 565–580.
[66]	J. Cheng, Y.-H. Tsai, W.-C. Hung, S. Wang, and M.-H. Yang, “Fast and accurate online video object segmentation via tracking parts,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7415–7424.
[67]	Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr, “Fast online object tracking and segmentation: A unifying approach,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 1328–1338.
[68]	X. Luo, M. Zhou, S. Li, L. Hu, and M. Shang, “Non-negativity constrained missing data estimation for high-dimensional and sparse matrices from industrial applications,” IEEE Trans. Cybernetics, vol. 50, no. 5, pp. 1844–1855, 2019.
[69]	X. Luo, Y. Yuan, M. Zhou, Z. Liu, and M. Shang, “Non-negative latent factor model based onβ-divergence for recommender systems,” IEEE Trans. Systems,Man,and Cybernetics:Systems, vol. 51, no. 8, pp. 4612–4623, 2019.
[70]	X. Luo, M. Zhou, S. Li, and M. Shang, “An inherently nonnegative latent factor model for high-dimensional and sparse matrices from industrial applications,” IEEE Trans. Industrial Informatics, vol. 14, no. 5, pp. 2011–2022, 2017.
[71]	X. Luo, M. Zhou, Y. Xia, Q. Zhu, A. C. Ammari, and A. Alabdulwahab, “Generating highly accurate predictions for missing qos data via aggregating nonnegative latent factor models,” IEEE Trans. Neural Networks and Learning Systems, vol. 27, no. 3, pp. 524–537, 2015.
[72]	Z. Yizhuo, W. Zhirong, P. Houwen, and L. Stephen, “A transductive approach for video object segmentation,” arXiv preprint arXiv: 2004.07193, 2020.
[73]	P. M. Kebria, A. Khosravi, S. M. Salaken, and S. Nahavandi, “Deep imitation learning for autonomous vehicles based on convolutional neural networks,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, pp. 82–95, 2019.
[74]	Y. Xia, H. Yu, and F.-Y. Wang, “Accurate and robust eye center localization via fully convolutional networks,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 5, pp. 1127–1138, 2019. doi: 10.1109/JAS.2019.1911684
[75]	T. D. Pham, K. Wardell, A. Eklund, and G. Salerud, “Classification of short time series in early parkinson’s disease with deep learning of fuzzy recurrence plots,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 6, pp. 1306–1317, 2019. doi: 10.1109/JAS.2019.1911774
[76]	S. Gao, M. Zhou, Y. Wang, J. Cheng, H. Yachi, and J. Wang, “Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction,” IEEE Trans. Neural Networks and Learning Systems, vol. 30, no. 2, pp. 601–614, 2018.
[77]	D. Zhang, J. Han, L. Zhao, and D. Meng, “Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework,” Int. J. Computer Vision, vol. 127, no. 4, pp. 363–380, 2019. doi: 10.1007/s11263-018-1112-4
[78]	D. Zhang, J. Han, L. Zhao, and T. Zhao, “From discriminant to complete: Reinforcement searching-agent learning for weakly supervised object detection,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5549–5560, 2020.
[79]	J. Han, Y. Yang, D. Zhang, D. Huang, D. Xu, and F. De La Torre, “Weakly-supervised learning of category-specific 3D object shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1423–1437, 2019.
[80]	D. Zhang, J. Han, G. Guo, and L. Zhao, “Learning object detectors with semi-annotated weak labels,” IEEE Trans. Circuits and Systems for Video Technology, vol. 29, no. 12, pp. 3622–3635, 2018.
[81]	D. Zhang, G. Guo, D. Huang, and J. Han, “Poseflow: A deep motion representation for understanding human behaviors in videos,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 6762–6770.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(8) / Tables(7)

Get Citation

PDF

XML

Article Metrics

Article views (3638) PDF downloads(169)

Highlights

An early attempt to achieve video object segmentation with scribble-level supervision, which can alleviate large amounts of human labors for collecting the manual annotation
The proposed scribble attention module can capture more accurate context information and learns an effective attention map to enhance the contrast between foreground and background
The proposed scribble-supervised loss can optimize the unlabeled pixels and dynamically corrects inaccurate segmented areas during the training stage

Scribble-Supervised Video Object Segmentation

doi: 10.1109/JAS.2021.1004210

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content