JARVIS: A HUMAN ASSISTANT ROBOT USING ROS2 AND COMPUTER VISION
Human assistant robots are becoming increasingly important in domestic and workplace environments due to their ability to automate routine tasks and improve human–robot interaction. This paper presents JARVIS, a multifunctional human assistant robot capable of performing voice interaction, hand gesture control, human tracking, object detection, remote operation, obstacle avoidance, and autonomous docking. The system is developed using Raspberry Pi, ESP32, ROS2, OpenCV, MediaPipe, and an offline voice recognition module. Hand ges-tures are recognized using MediaPipe-based landmark detection, while computer vision techniques enable face tracking and object detection. Voice commands allow users to switch between differ-ent operational modes and control robot functions intuitively. The robot employs mecanum wheels for omnidirectional movement and integrates ultrasonic and infrared sensors for navigation and docking. A distributed architecture is adopted in which Raspberry Pi performs high-level processing and ESP32 manages real-time motor control. Experimental results demonstrate suc-cessful implementation of all features with reliable performance under indoor conditions. The proposed system provides a low-cost, scalable, and modular solution for smart assistant and automation applications.
Hadge, S., Mahajan, S. & Khatavkar, A. (2026). JARVIS: A Human Assistant Robot using ROS2 and Computer Vision. International Journal of Science, Strategic Management and Technology, 02(6). https://doi.org/10.55041/ijsmt.v2i6.093
Hadge, Shantanu, et al.. "JARVIS: A Human Assistant Robot using ROS2 and Computer Vision." International Journal of Science, Strategic Management and Technology, vol. 02, no. 6, 2026, pp. . doi:https://doi.org/10.55041/ijsmt.v2i6.093.
Hadge, Shantanu,Shweta Mahajan, and Asmita Khatavkar. "JARVIS: A Human Assistant Robot using ROS2 and Computer Vision." International Journal of Science, Strategic Management and Technology 02, no. 6 (2026). https://doi.org/https://doi.org/10.55041/ijsmt.v2i6.093.
2.Macenski, F. Martin, R. White, and J. Clavero, “The robot operating system 2 (ros 2): Design, architecture, and uses in the wild,” Science Robotics, vol. 7, no. 66, 2022.
3.Bradski and A. Kaehler, Learning OpenCV: Computer Vision with the OpenCV Library. O’Reilly Media, 2021.
4.Lugaresi, J. Tang, H. Nash et al., “Mediapipe: A framework for building perception pipelines,” arXiv preprint arXiv:1906.08172, 2019.
5.Thrun, W. Burgard, and D. Fox, “Probabilistic robotics,” MIT Press, 2005.
6.Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Informa-tion Processing Systems, vol. 25, 2012.
7.Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Op-timal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
8.Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
9.Garrido-Jurado, R. Munoz-Salinas, F. J. Madrid-Cuevas, and M. J. Marin-Jimenez, “Automatic generation and detection of highly reliable fiducial markers under occlusion,” Pattern Recognition, vol. 47, no. 6,2280–2292, 2014.
10.Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira,