QUANTUM-ENHANCED SPATIO-TEMPORAL DEEP LEARNING FRAMEWORK FOR REAL-TIME SIGN LANGUAGE TRANSLATION
Real-time sign language translation remains a significant challenge in assistive technology due to the complexity of capturing dynamic hand gestures and interpreting them accurately. This paper presents a Hybrid Quantum CNN-LSTM based Real-Time Sign Language Translation System, designed to bridge the communication gap between hearing-impaired individuals and the general population. The system integrates computer vision, deep learning, and quantum machine learning, where live video input is captured using OpenCV and hand landmarks are extracted using MediaPipe to obtain 42 three-dimensional joint coordinates. These landmarks are processed as temporal sequences and passed through a hybrid architecture combining Convolutional Neural Networks (CNN) for spatial feature extraction, Long Short-Term Memory (LSTM) for temporal modeling, and a Quantum Convolutional Neural Network (QCNN) layer implemented using PennyLane to capture complex non-linear relationships through quantum embedding and entanglement. Unlike traditional deep learning models, the proposed system employs a parallel quantum-classical framework that enhances feature representation while maintaining computational efficiency. The model is trained using augmented datasets with techniques such as temporal expansion, geometric transformations, and noise injection to improve robustness and generalization. Experimental results demonstrate an F1-score of approximately 0.93, achieving competitive performance with reduced parameter complexity and enabling real-time translation on standard computing devices. This work highlights the potential of hybrid quantum-classical models in real-time computer vision applications and provides an efficient, scalable solution for assistive communication technologies
T, S. D. M. (2026). Quantum-Enhanced Spatio-Temporal Deep Learning Framework for Real-Time Sign Language Translation. International Journal of Science, Strategic Management and Technology, 02(04). https://doi.org/10.55041/ijsmt.v2i4.285
T, Suriya. "Quantum-Enhanced Spatio-Temporal Deep Learning Framework for Real-Time Sign Language Translation." International Journal of Science, Strategic Management and Technology, vol. 02, no. 04, 2026, pp. . doi:https://doi.org/10.55041/ijsmt.v2i4.285.
T, Suriya. "Quantum-Enhanced Spatio-Temporal Deep Learning Framework for Real-Time Sign Language Translation." International Journal of Science, Strategic Management and Technology 02, no. 04 (2026). https://doi.org/https://doi.org/10.55041/ijsmt.v2i4.285.
2.Donahue et al., “Long-term Recurrent Convolutional Networks for Visual Recognition and Description,” IEEE CVPR, 2015.
3.Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
4.Koller, H. Ney, and R. Bowden, “Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled,” IEEE CVPR, 2016.
5.C. Camgoz et al., “Neural Sign Language Translation,” IEEE CVPR, 2018.
6.Vaswani et al., “Attention Is All You Need,”NeurIPS, 2017.
7.Dosovitskiy et al., “An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale,” ICLR, 2021.
8.Pu et al., “Iterative Alignment Network for Continuous Sign Language Recognition,” IEEE CVPR, 2019.
9.Chen et al., “A Simple Framework for Contrastive Learning of Visual Representations,” ICML, 2020.
10.M. Schuld and F. Petruccione, Supervised Learning with Quantum Computers, Springer, 201