Automatic digit recognition and synthesis system for marathi
Abstract
In this work we cohabite the automatic recognition process with the automatic synthesis process applied on the first ten Marathi digits in one system that we called Automatic Recognition and Synthesis System of Marathi Digits (ARSSAD). The system is then composed of two sub-systems; a recognizer and a synthesizer. The main task of the recognizer is the automatic recognition of the pronounced digit, so it transforms the input sound wave into a text representing the appropriate digit, the second sub-system perform the opposite process of the first sub-system; in another word, it transforms the text (digit) produced by the recognizer to a speech generated by the synthesizer. The methodology used for the system design is based on three essential stages: the creation of the acoustic database (corpus development), the recognition of the read signal and the generation of the synthetic speech. We explain the basics modules that compose it, starting from the signal acquisition and finishing to the decision taken. For the recognition sub-system we make the choice to use the Dynamic Time Warping (DTW) method for the comparison task. ARSSAD contains a Front-End and a back-end module, the front-end module convert the input sound into feature vectors that are based on Mel Frequency Cepstral Coefficients (MFCCs), to be used in the DTW method. The back-end module uses the concatenative method to perform the synthesis of the recognized digit, for this end we create a sound database that contained di-phones of the Marathi alphabets. The obtained results show that the system presents a success rate of 94.85% on the three corpuses which we recorded in a noised environment. Keywords: analysis techniques, speech recognition, speech synthesis, synthesis by di-phones, synthesis by phonemes, PRAAT, MFCC, DTW, Standard Marathi.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Engineering Science and Generic Research (IJESAR) by Articles is licensed under a Creative Commons Attribution 4.0 International License.