TAMIL NADU SEED IMAGE DATASET WITH OPENCV PREPROCESSING FOR PRECISION AGRICULTURE
Manual seed quality assessment in Indian agriculture remains labor-intensive, error-prone, and unscalable, contributing to 20-30% crop losses from poor germination. This paper introduces a novel seed image dataset comprising 5,000+ high-resolution images of paddy and millet seeds across three quality classes (good, medium, bad) collected from Tamil Nadu farms under varying humidity conditions. We present a comprehensive preprocessing pipeline using OpenCV for noise reduction, CLAHE enhancement, GrabCut background removal, and Albumentations-based augmentation (rotation, flipping, brightness adjustment), achieving 95% clean images and
expanding the dataset to 15,000 samples. The pipeline delivers PSNR >30dB and 98% valid segmentation masks, with baseline CNN validation showing 92% accuracy on preprocessed vs. 85% on raw data. Released publicly on Kaggle, this regionally-specific, Non-IID dataset addresses gaps in existing maize/soybean collections by focusing on tropical Indian crops. The work enables deep learning applications for automated seed sorting and supports precision agriculture for smallholder farmers
Rajalingam, B., A, M. S., H, A. A. & S, D. K. (2026). Tamil Nadu Seed Image Dataset with OpenCV Preprocessing for Precision Agriculture. International Journal of Science, Strategic Management and Technology, 02(03). https://doi.org/10.55041/ijsmt.v2i3.143
Rajalingam, B., et al.. "Tamil Nadu Seed Image Dataset with OpenCV Preprocessing for Precision Agriculture." International Journal of Science, Strategic Management and Technology, vol. 02, no. 03, 2026, pp. . doi:https://doi.org/10.55041/ijsmt.v2i3.143.
Rajalingam, B.,Mohamed A,Arshath H, and Dinesh S. "Tamil Nadu Seed Image Dataset with OpenCV Preprocessing for Precision Agriculture." International Journal of Science, Strategic Management and Technology 02, no. 03 (2026). https://doi.org/https://doi.org/10.55041/ijsmt.v2i3.143.
[2] “High-resolution RGB image dataset for wheat seed varietal identification and purity assessment,” Data in Brief, Elsevier, Apr. 2025.
[3] “Dataset of cannabis seeds for machine learning applications,” Data in Brief, Elsevier, Jan. 2023.
[4] “Corn seed dataset based on hyperspectral and RGB images,” Data in Brief, Elsevier, Dec. 2025.
[5] “Automated Seed Quality Assessment and Classification Using Watershed Algorithm and Ensemble Learning,” in Proc. IEEE Conference, Apr. 2024.
[6] “Sesame Seed Disease Detection Using Image Classification,” in Proc. IEEE Conference, Feb. 2021.
[7] “Contributing to agriculture by using soybean seed data from the tetrazolium test,” Data in Brief, Elsevier, Jan. 2019.
[8] “EfficientMaize: A Lightweight Dataset for Maize Classification on Resource-Constrained Devices,” Data in Brief, PMC, Mar. 2024.
[9] “Deep Convolutional Neural Network for Plant Seedlings Classification,” arXiv preprint arXiv:1811.08404, Nov. 2018.
[10] “Estimating compositions and nutritional values of seed mixes based on vision transformers,” Smart Agricultural Technology, Nov. 2023.