A DEEP LEARNING FRAMEWORK FOR EMOTION-CONDITIONED PERSONALIZED MUSIC RECOMMENDATION
While music acts as a powerful emotional regulator, traditional recommendation systems often fail to account for a user’s immediate affective state, relying instead on static historical logs. We present EmotionMuse, a modular deep learning framework that bridges this gap by integrating real-time facial expression analysis with history-conditioned music suggestions. Our architecture utilizes a VGG-16 CNN, enhanced with Squeeze-and-Excitation attention, to achieve 87.2% accuracy in emotion classification on the FER-2013 dataset. These detections are mapped onto Russell’s valence-arousal plane to generate 64-dimensional affective embeddings. These embeddings condition a Bidirectional LSTM (Bi-LSTM) model, which processes user listening sequences from the Million Song Dataset. Cross-dataset alignment is established through audio feature matching in Spotify’s space to ensure theoretically grounded emotion-to-music correspondence. Experimental results demonstrate a Precision@10 of 0.791 and an NDCG@10 of 0.813, representing a performance gain of 5.8% over affect-blind baselines. Our system maintains an end-to-end latency of 94 ms, supporting real-time deployment on standard consumer hardware.
BARGOTI, S. (2026). A Deep Learning Framework for Emotion-Conditioned Personalized Music Recommendation. International Journal of Science, Strategic Management and Technology, 02(05). https://doi.org/10.55041/ijsmt.v2i5.178
BARGOTI, SAGAR. "A Deep Learning Framework for Emotion-Conditioned Personalized Music Recommendation." International Journal of Science, Strategic Management and Technology, vol. 02, no. 05, 2026, pp. . doi:https://doi.org/10.55041/ijsmt.v2i5.178.
BARGOTI, SAGAR. "A Deep Learning Framework for Emotion-Conditioned Personalized Music Recommendation." International Journal of Science, Strategic Management and Technology 02, no. 05 (2026). https://doi.org/https://doi.org/10.55041/ijsmt.v2i5.178.
[2] P. Ekman, “An argument for basic emotions,” Cognition and Emotion, vol. 6, no. 3–4, pp. 169–200, 1992.
[3] J. A. Russell, “A circumplex model of affect,” J. Pers. Soc. Psychol., vol. 39, no. 6, pp. 1161–1178, 1980.
[4] Y. Li, J. Zeng, S. Shan, and X. Chen, “Occlusion aware facial expression recognition using CNN with attention mechanism,” IEEE Trans. Image Process., vol. 28, no. 5, pp. 2439–2450, May 2019.
[5] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, Aug. 2009.
[6] M. Schedl, H. Zamani, C.-W. Chen, Y. Deldjoo, and M. Elahi, “Current challenges and visions in music recommender systems research,” Int. J. Multimed. Inf. Retr., vol. 7, no. 2, pp. 95–116, Jun. 2018.
[7] E. Zangerle, M. Pichl, W. Gassler, and G. Specht, “Exploiting Twitter’s collective knowledge for music recommendations,” in Proc. 4th Making Sense of Microposts Workshop, Seoul, Korea, Apr. 2014, pp. 14–17.
[8] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,” in Proc. 26th Int. Conf. World Wide Web (WWW), Perth, Australia, Apr. 2017, pp. 173–182.
[9] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk, “Session-based recommendations with recurrent neural networks,” in Proc. ICLR, San Juan, Puerto Rico, May 2016.
[10] W.-C. Kang and J. McAuley, “Self-attentive sequential recommendation,” in Proc. IEEE ICDM, Singapore, Nov. 2018, pp. 197–206.