Speech Decoding on Hardware (Pani Prithvi Raj (EE16D202))

  • 05



Name of the Guide:

Automatic Speech Recognition (ASR) on low-resource devices (like mobile phones, wearables, etc) is in high demand. Cloud-based ASRs have inherent dependencies on the quality and security of the network. And in several localized applications like ATMs or information kiosks, depending on high-speed internet merely for the ASR can be overkill. Alternatively, the offline speech decoder does the Viterbi decoding locally. The large speech models are stored in the external memory (DRAM) and based on the speech input; the most probable word sequence is obtained. However, the energy cost of single DRAM access is almost two orders more than single on-chip memory (OCM) access. On the other hand, the SRAMs that are used for OCM, are expensive than DRAMs in terms of their design. This precludes us from using large OCM. Hence, we need to reduce the number of DRAM accesses and at the same time reduce the OCM storage, without adversely affecting the performance or accuracy. In this work, the tradeoff between external memory and OCM is addressed using a binary search tree – max heap (BST-MH) data structure. This, along with the cache, uses much less OCM than the existing works with very minimal degradation of accuracy. Consequently, the need for external memory accesses is also reduced, while decoding the speech in real-time. The results of its implementation on the Xilinx FPGA along with the comparison with other works are discussed in this seminar