LABELGROUND: AN OFFLINE ZERO-SHOT AI PLATFORM FOR EFFICIENT DATASET ANNOTATION IN COMPUTER VISION
This study presents Labelground, a fully offline AI-augmented annotation platform that addresses the high cost of dataset labeling in computer vision. The system integrates a zero-shot ensemble of YOLO-World, Grounding DINO, and Segment Anything Model (SAM), combined using Non-Maximum Suppression (NMS), enabling annotation from the first image without requiring prior project-specific training. A correction-driven active learning loop continuously improves model performance through user feedback. Experimental evaluation on a 500-image PASCAL VOC 2012 subset demonstrates a 72.6% reduction in annotation time (from 44.6 s to 12.2 s per image, p < 0.001, Cohen's d = 3.92) while maintaining detection accuracy within 0.7% mAP@0.5 of fully human-annotated baselines. The NMS ensemble delivers a 6.9-point F1 gain over the best single-model baseline, and augmentation yields up to +9.3 mAP in low-data regimes. The system is particularly suitable for privacy-sensitive and air-gapped environments including medical imaging, defence, and industrial inspection.
G, T. S. (2026). Labelground: An Offline Zero-Shot AI Platform for Efficient Dataset Annotation in Computer Vision. International Journal of Science, Strategic Management and Technology, 02(05). https://doi.org/10.55041/ijsmt.v2i5.011
G, Thamizh. "Labelground: An Offline Zero-Shot AI Platform for Efficient Dataset Annotation in Computer Vision." International Journal of Science, Strategic Management and Technology, vol. 02, no. 05, 2026, pp. . doi:https://doi.org/10.55041/ijsmt.v2i5.011.
G, Thamizh. "Labelground: An Offline Zero-Shot AI Platform for Efficient Dataset Annotation in Computer Vision." International Journal of Science, Strategic Management and Technology 02, no. 05 (2026). https://doi.org/https://doi.org/10.55041/ijsmt.v2i5.011.
[2] B. Sekachev et al., "Computer Vision Annotation Tool (CVAT)," in Proc. IEEE/CVF CVPRW, 2020, pp. 128–134. DOI: 10.5281/zenodo.4009388
[3] J. Nelson, B. Dwyer, and J. Solawetz, "Roboflow: Give Your Software the Sense of Sight," Roboflow Inc., 2021. [Online]. Available: https://roboflow.com
[4] S. Liu et al., "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection," arXiv preprint arXiv:2303.05499, 2023. DOI: 10.48550/arXiv.2303.05499
[5] T. Cheng et al., "YOLO-World: Real-Time Open-Vocabulary Object Detection," in Proc. IEEE/CVF CVPR, 2024, pp. 16901–16911. DOI: 10.1109/CVPR52733.2024.01599
[6] B. Settles, "Active Learning Literature Survey," Comput. Sci. Tech. Rep. 1648, Univ. of Wisconsin–Madison, 2009.
[7] Y. Gal and Z. Ghahramani, "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning," in Proc. ICML, 2016, pp. 1050–1059.
[8] T. DeVries and G. W. Taylor, "Improved Regularization of Convolutional Neural Networks with Cutout," arXiv preprint arXiv:1708.04552, 2017. DOI: 10.48550/arXiv.1708.04552
[9] H. Zhang et al., "MixUp: Beyond Empirical Risk Minimization," in Proc. ICLR, 2018. DOI: 10.48550/arXiv.1710.09412
[10] S. Yun et al., "CutMix: Training Strategy that Makes Use of Sample Pairing," in Proc. IEEE/CVF ICCV, 2019, pp. 6023–6032. DOI: 10.1109/ICCV.2019.00612