IJSMT Journal

International Journal of Science, Strategic Management and Technology

An International, Peer-Reviewed, Open Access Scholarly Journal Indexed in recognized academic databases · DOI via Crossref The journal adheres to established scholarly publishing, peer-review, and research ethics guidelines set by the UGC

ISSN: 3108-1762 (Online)
webp (1)

Plagiarism Passed
Peer reviewed
Open Access

VISION-BASED CONTEXT UNDERSTANDING USING MULTIMODAL AI

AUTHORS:
Rudra Gupta
Abdul Khalid
Mentor
Affiliation
B.Tech (Information Technology) NIET, Greater Noida
CC BY 4.0 License:
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract

Understanding an image in a meaningful way re- quires more than identifying isolated objects. A useful description of a scene must also capture actions, relations, setting, intent, and often a coarse sense of social or emotional context. This ability, which humans perform naturally, remains a difficult

Keywords
Article Metrics
Article Views
39
PDF Downloads
2
HOW TO CITE
APA

MLA

Chicago

Copy

Gupta, R. & Khalid, A. (2026). Vision-Based Context Understanding using Multimodal AI. International Journal of Science, Strategic Management and Technology, 02(05). https://doi.org/10.55041/ijsmt.v2i5.160

Gupta, Rudra, and Abdul Khalid. "Vision-Based Context Understanding using Multimodal AI." International Journal of Science, Strategic Management and Technology, vol. 02, no. 05, 2026, pp. . doi:https://doi.org/10.55041/ijsmt.v2i5.160.

Gupta, Rudra, and Abdul Khalid. "Vision-Based Context Understanding using Multimodal AI." International Journal of Science, Strategic Management and Technology 02, no. 05 (2026). https://doi.org/https://doi.org/10.55041/ijsmt.v2i5.160.

References
1.Radford et al., “Learning Transferable Visual Models From Natural Language Supervision,” in Proc. ICML, 2021.

2.Li et al., “BLIP-2: Bootstrapping Language-Image Pre-training With Frozen Image Encoders and Large Language Models,” in Proc. ICML, 2023.

3.-B. Alayrac et al., “Flamingo: a Visual Language Model for Few-Shot Learning,” in Proc. NeurIPS, 2022.

4.Liu et al., “Visual Instruction Tuning,” arXiv preprint arXiv:2304.08485, 2023.

5.Dai et al., “InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning,” in Proc. NeurIPS, 2023.

6.Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proc. NeurIPS, 2012.

7.He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. CVPR, 2016.

8.Vaswani et al., “Attention Is All You Need,” in Proc. NeurIPS, 2017.

9.Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in Proc. ICLR, 2021.

10.Lu et al., “ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks,” in Proc. NeurIPS, 2019.
Ethics and Compliance
✓ All ethical standards met
This article has undergone plagiarism screening and double-blind peer review. Editorial policies have been followed. Authors retain copyright under CC BY-NC 4.0 license. The research complies with ethical standards and institutional guidelines.
Indexed In
Similar Articles
Mentorship in Education: Fostering Student Growth and Development
string(19) "Dr. Purabi Talukdar" Talukdar, D. P.
(2026)
DOI: 10.55041/ijsmt.v2i5.289
Restus -Real-World AI/ML-Based Phishing Detection and Prevention System
string(15) "Priyanka Thange" Thange, P.et al.
(2026)
DOI: 10.55041/ijsmt.v2i4.316
Emerging Technologies in Digital Learning Ecosystems: Implications for Online Education and Business Innovation
string(16) "Dr. M. A. Shukur" Shukur, D. M. A.
(2026)
DOI: 10.55041/ijsmt.v2i4.661
Scroll to Top