CAMERA-BASED SEMANTIC IMAGE SEGMENTATION FOR AUTONOMOUS VEHICLE
— Existing models still struggle with cross-domain adaptability and generalisation, despite notable advancements in deep learning for image and video segmentation. Segmenting images and videos is a basic computer vision task with many applications in autonomous driving, industrial inspection, healthcare, and agriculture. Since the introduction of large-scale foundation models, SAM2—an enhanced version of SAM (Segment Anything Model)—has shown improved performance in complex scenarios after being optimised for segmentation tasks. However, more research is needed to fully understand SAM2's versatility and limitations in particular domains. This study assesses SAM2's performance across a range of domains and methodically examines its use in image and video segmentation. We start by defining the fundamental ideas of image segmentation, classifying foundation models, and examining the technical features of SAM and SAM2. We then explore SAM2's use in static image and video segmentation, highlighting the difficulties of cross-domain adaptability and its performance in specialised fields like medical imaging. We examined more than 200 relevant papers as part of our investigation to offer a thorough analysis of the subject. The study concludes by highlighting SAM2's advantages and disadvantages in segmentation tasks, pointing out the technical difficulties it encounters, and suggesting potential future development paths. This review offers insightful analysis and useful suggestions for maximising and utilising SAM2 in practical situations
Tiwari, A., Bilgi, S. & Chaudhari, A. (2026). Camera-Based Semantic Image Segmentation for Autonomous Vehicle. International Journal of Science, Strategic Management and Technology, 02(03). https://doi.org/10.55041/ijsmt.v2i3.358
Tiwari, Aparna, et al.. "Camera-Based Semantic Image Segmentation for Autonomous Vehicle." International Journal of Science, Strategic Management and Technology, vol. 02, no. 03, 2026, pp. . doi:https://doi.org/10.55041/ijsmt.v2i3.358.
Tiwari, Aparna,Shubham Bilgi, and Anuja Chaudhari. "Camera-Based Semantic Image Segmentation for Autonomous Vehicle." International Journal of Science, Strategic Management and Technology 02, no. 03 (2026). https://doi.org/https://doi.org/10.55041/ijsmt.v2i3.358.
[2] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proc. IEEE CVPR, 2015, pp. 3431–3440.
[3] V. Badrinarayanan, A. Kendall, and R. Cipolla, "SegNet: A deep convolutional encoder-decoder architecture for image segmentation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495, 2017.
[4] O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional networks for biomedical image segmentation," in Proc. MICCAI, 2015, pp. 234–241.
[5] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, "Encoder-decoder with atrous separable convolution for semantic image segmentation," in Proc. ECCV, 2018, pp. 833–851.
[6] K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask R-CNN," IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, 2020.
[7] A. Kirillov et al., "Segment anything," in Proc. IEEE ICCV, 2023, pp. 4015–4026.
[8] N. Ravi et al., "SAM 2: Segment anything in images and videos," arXiv preprint arXiv:2408.00714, 2024.
[9] M. Cordts et al., "The Cityscapes dataset for semantic urban scene understanding," in Proc. IEEE CVPR, 2016, pp. 3213–3223.
[10] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, "Segmentation and recognition using structure from motion point clouds," in Proc. ECCV, 2008, pp. 44–57.