APPROXIMATE SOFTMAX ARCHITECTURE FOR ENERGY-EFFICIENT DEEP NEURAL NETWORKS
Softmax is commonly used in neural network- based classification systems to convert output values into probabilities, but its conventional implementation involves complex operations such as exponentials and division, which are inefficient for FPGA and VLSI-based hardware due to high area, power, and latency requirements. This project presents a hardware- efficient approximate softmax architecture designed for low-power FPGA systems using a simplified Top-1 approximation approach. The proposed design identifies the dominant output class using comparator-based logic and fixed- point arithmetic, thereby eliminating computationally expensive operations while maintaining correct decision-making. The approximate softmax module is implemented using synthesizable SystemVerilog and validated through simulation and synthesis using Xilinx Vivado, with Python used only for basic numerical verification. The developed design is hardware-ready and suitable for integration into FPGA-based neural network accelerators and embedded VLSI systems, with scope for future hardware implementation.
R,, A. K., J, D. S. & A, F. M. (2026). Approximate Softmax Architecture for Energy-Efficient Deep Neural Networks. International Journal of Science, Strategic Management and Technology, 02(03). https://doi.org/10.55041/ijsmt.v2i3.262
R,, Athul, et al.. "Approximate Softmax Architecture for Energy-Efficient Deep Neural Networks." International Journal of Science, Strategic Management and Technology, vol. 02, no. 03, 2026, pp. . doi:https://doi.org/10.55041/ijsmt.v2i3.262.
R,, Athul,Deepak J, and Fayas A. "Approximate Softmax Architecture for Energy-Efficient Deep Neural Networks." International Journal of Science, Strategic Management and Technology 02, no. 03 (2026). https://doi.org/https://doi.org/10.55041/ijsmt.v2i3.262.
2.Chen and O. Temam, “DianNao: A Small- Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning,” ACM SIGARCH Computer Architecture News, vol. 42, no. 1, pp. 269–284, 2014.
3.LeCun, Y. Bengio and G. Hinton, “Deep Learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
4.Rastegari, V. Ordonez, J. Redmon and
5.Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” European Conference on Computer Vision, vol. 9908, no. 1, pp. 525–542,2016.
6.Mittal, “A Survey of Techniques for Improving Energy Efficiency in Deep Neural Networks,” International Journal of Computer Vision and Image Processing, vol. 6, no. 4, pp. 1–21, 2016.
7.Sze, Y. H. Chen, T. J. Yang and J. S. Emer, “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329,2017.
8.P. Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” IEEE Micro, vol. 38, no. 2, pp. 10–19,2018.
9.Shatravin, D. Shashev and S. Shidlovskiy, “Implementation of the Softmax Activation for Reconfigurable Neural Network Hardware Accelerators,” Applied Sciences, vol. 13, no. 23, pp. 12784–12795, 2023.
10.Kim, D. Lee, J. Kim, J. Park and S. E. Lee, “Hardware Accelerator for Approximation- Based Softmax and Layer Normalization in Transformers,” Electronics, vol. 14, no. 12, pp. 2337–2347, 2025.