Mozhi - Text to Speech Saas & Software

Updates

Chrome extension Mozhi Page Reader is available

by Krishna Sankar M on 7th September 2024

Announcing availability of chrome extension Mozhi Page Reader using APIs ! This can be used to get a proof-of-concept demo of potential usage of Text to Speech Synthesis integrated to a website.

Availability of model trained with StyleTTS on a larger dataset

by Krishna Sankar M on 5th September 2024

The authors of the model StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis, Li et.al, 2022 has made available source code for training.

As the pre-trained model was available only for English, the model was trained again in bi-lingual way using both Malayalam and English datasets.
speaker count : 1344
total duration : 278 hours
malayalam speaker count : 210
english speaker count : 1134
duration malayalam speakers : 78.6 hours
duration english speakers : 199.4 hours
databases for english : nplt', 'ljspeech', 'libritts-r'
databases for malayalam : imasc', 'commonvoice', 'nplt', 'openslr_ml', 'smc_msc', 'gmasc'

From the total speaker count of 1344, around 451 speakers how had atleast 10 minutes of audio was taken for reference embeddings. They are avalilable for speech generation as speaker ids :
spkr : ml_male_<1-12>, ml_female_<1-13>, en_male_<1-185>, en_female_<1-241>

The model is relatively light weight of around 110 million parameters and is available for use with :
modelver : 'v5.0'

For word and sentence tokenizatization, nltk tokenizer was used. For phoneme conversion, the popular phonemizer package was used. Though the model does not have other hooks like pitch/pace & control for noise filter, the quality of generated audio seems to be much better. Hope you will also feel the same! :-)

Model with tunings for reducing high frequency noise

by Krishna Sankar M on 1st May 2023

Updated model with hooks in the api for controlling the high frequency noise is made available. Use the input parameter noise_scale along with modelver : 'v4.0' .

Updated Website

by Krishna Sankar M on 18th February 2023

Happy to announce that we have improved bare-bones website to a more colourful website, with improved readability :)

Dynamic Pitch & Pace Transform

by Krishna Sankar M on 21st December 2022

The naturalness of the sysnthesized audio is improved by having randomness in the synthesis of breaks due to punctuations. The support for dynamic pitch_scale, pitch_offset, pace_scale and new parameter punctuation_breaks was added from model v3.3 onwards. Please see API documention for more details.

Improving Efficiency in Synthesis of Sentence Breaks

by Krishna Sankar M on 4th September 2022

The breaks in sentences due to punctuations can be efficiently synthesized by insertion of dummy durations. There was efficiency improvements under-the-hood which reduced the sythensis time for long articles. This is available from model v3.2 onwards. Please see API documention for more details.

Adding Read Along Feature

by Krishna Sankar M on 12th August 2022

We have added support for read along i.e. the audio and corresponding text will be paired. In the output, we will be adding a new key "durations" which captures this information. This is available from model v3.1 onwards. Please see API documention for more details.

Supporting Punctuation

by Krishna Sankar M on 10th July 2022

We have added support for punctuation with model from v3.0. Please see API documention for more details.

More Naturally sounding Model

by Krishna Sankar M on 19th May 2022

We have made available more naturally sounding model as v2.0. Please see API documention for more details.

Availability of APIs

by Krishna Sankar M on 5th May 2022

We have made available API access to the text to speech service. Please see API documention.

keywords : api, cloud text to speech synthesis

Inital Release

by Krishna Sankar M on 20th February 2022

മലയാളം അക്ഷരത്തിൽ നിന്നും ശബ്ദം സാധ്യമാകുന്ന ഒരു സോഫ്റ്റ്‌വെയർ അവതരിപ്പിക്കുന്നതിൽ സന്തോഷം ഉണ്ട്. ഇത് നിലവിലുള്ള ന്യൂറൽ നെറ്റ്‌വർക്ക് സാങ്കേതിക വിദ്യയും, അതോടൊപ്പമുള്ള ഓപ്പൺ സോഴ്സ് സോഫ്റ്റ്‌വെയറും ഉപയോഗിച്ച് നിർമിച്ചതാണ്. ഇന്നത്തെ നിലയ്ക്ക് ഈ ശബ്ദം വൈകാരികമായ ഉള്ളടക്കം ഇല്ലാത്ത വാർത്തകൾ വായിക്കാൻ ഉചിതമാണ് . ഇത് ഒരു തുടക്കം മാത്രം. ഒരോ വ്യക്തിയുടെയും വൈകാരികമായ ശബ്ദം ഉത്‌പാദിപ്പിക്കുന്ന സോഫ്റ്റ്‌വെയർ ആണ് ലക്‌ഷ്യം.

Happy to announce the availability of malayalam text to speech synthesis. This is built using some of the latest neural network architectures and open source code around it. In its current shape, the generated speech is good for reading out news or articles which does not have emotional content. This is just a start, and the goal is to reach personalized emotional text to speech synthesis.

keywords : cloud text to speech synthesis, malayalam, bi-lingual, english, multiple speakers