VISION-BASED CONTEXT UNDERSTANDING USING MULTIMODAL AI
Understanding an image in a meaningful way re- quires more than identifying isolated objects. A useful description of a scene must also capture actions, relations, setting, intent, and often a coarse sense of social or emotional context. This ability, which humans perform naturally, remains a difficult
Gupta, R. & Khalid, A. (2026). Vision-Based Context Understanding using Multimodal AI. International Journal of Science, Strategic Management and Technology, 02(05). https://doi.org/10.55041/ijsmt.v2i5.160
Gupta, Rudra, and Abdul Khalid. "Vision-Based Context Understanding using Multimodal AI." International Journal of Science, Strategic Management and Technology, vol. 02, no. 05, 2026, pp. . doi:https://doi.org/10.55041/ijsmt.v2i5.160.
Gupta, Rudra, and Abdul Khalid. "Vision-Based Context Understanding using Multimodal AI." International Journal of Science, Strategic Management and Technology 02, no. 05 (2026). https://doi.org/https://doi.org/10.55041/ijsmt.v2i5.160.
2.Li et al., “BLIP-2: Bootstrapping Language-Image Pre-training With Frozen Image Encoders and Large Language Models,” in Proc. ICML, 2023.
3.-B. Alayrac et al., “Flamingo: a Visual Language Model for Few-Shot Learning,” in Proc. NeurIPS, 2022.
4.Liu et al., “Visual Instruction Tuning,” arXiv preprint arXiv:2304.08485, 2023.
5.Dai et al., “InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning,” in Proc. NeurIPS, 2023.
6.Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proc. NeurIPS, 2012.
7.He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. CVPR, 2016.
8.Vaswani et al., “Attention Is All You Need,” in Proc. NeurIPS, 2017.
9.Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in Proc. ICLR, 2021.
10.Lu et al., “ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks,” in Proc. NeurIPS, 2019.