Announcing availability of chrome extension Mozhi Page Reader using APIs ! This can be used to get a proof-of-concept demo of potential usage of Text to Speech Synthesis integrated to a website.
The authors of the model StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis, Li et.al, 2022 has made available source code for training.
As the pre-trained model was available only for English, the model was trained again in
bi-lingual way using both Malayalam and English datasets.
speaker count : 1344
total duration : 278 hours
malayalam speaker count : 210
english speaker count : 1134
duration malayalam speakers : 78.6 hours
duration english speakers : 199.4 hours
databases for english : nplt', 'ljspeech', 'libritts-r'
databases for malayalam : imasc', 'commonvoice', 'nplt', 'openslr_ml', 'smc_msc', 'gmasc'
From the total speaker count of 1344, around 451 speakers how had atleast 10 minutes of
audio was taken for reference embeddings. They are avalilable for speech generation as
speaker ids :
spkr : ml_male_<1-12>, ml_female_<1-13>, en_male_<1-185>, en_female_<1-241>
The model is relatively light weight of around 110 million parameters and is available for
use with :
modelver : 'v5.0'
For word and sentence tokenizatization, nltk tokenizer was used. For phoneme conversion, the popular phonemizer package was used. Though the model does not have other hooks like pitch/pace & control for noise filter, the quality of generated audio seems to be much better. Hope you will also feel the same! :-)
Updated model with hooks in the api for controlling the high frequency noise is made
available. Use the input parameter noise_scale
along with
modelver : 'v4.0'
.
Happy to announce that we have improved bare-bones website to a more colourful website, with improved readability :)
The naturalness of the sysnthesized audio is improved by having randomness in the synthesis
of breaks due to punctuations. The support for dynamic pitch_scale
,
pitch_offset
, pace_scale
and new parameter
punctuation_breaks
was added from model v3.3 onwards. Please see
API
documention for more details.
The breaks in sentences due to punctuations can be efficiently synthesized by insertion of dummy durations. There was efficiency improvements under-the-hood which reduced the sythensis time for long articles. This is available from model v3.2 onwards. Please see API documention for more details.
We have added support for read along i.e. the audio and corresponding text will be paired.
In
the output, we will be adding a new key "durations"
which captures this
information. This is available from model v3.1 onwards. Please see API
documention for more details.
We have added support for punctuation with model from v3.0. Please see API documention for more details.
We have made available more naturally sounding model as v2.0. Please see API documention for more details.
We have made available API access to the text to speech service. Please see API documention.
keywords : api, cloud text to speech synthesis
മലയാളം അക്ഷരത്തിൽ നിന്നും ശബ്ദം സാധ്യമാകുന്ന ഒരു സോഫ്റ്റ്വെയർ അവതരിപ്പിക്കുന്നതിൽ സന്തോഷം ഉണ്ട്. ഇത് നിലവിലുള്ള ന്യൂറൽ നെറ്റ്വർക്ക് സാങ്കേതിക വിദ്യയും, അതോടൊപ്പമുള്ള ഓപ്പൺ സോഴ്സ് സോഫ്റ്റ്വെയറും ഉപയോഗിച്ച് നിർമിച്ചതാണ്. ഇന്നത്തെ നിലയ്ക്ക് ഈ ശബ്ദം വൈകാരികമായ ഉള്ളടക്കം ഇല്ലാത്ത വാർത്തകൾ വായിക്കാൻ ഉചിതമാണ് . ഇത് ഒരു തുടക്കം മാത്രം. ഒരോ വ്യക്തിയുടെയും വൈകാരികമായ ശബ്ദം ഉത്പാദിപ്പിക്കുന്ന സോഫ്റ്റ്വെയർ ആണ് ലക്ഷ്യം.
Happy to announce the availability of malayalam text to speech synthesis. This is built using some of the latest neural network architectures and open source code around it. In its current shape, the generated speech is good for reading out news or articles which does not have emotional content. This is just a start, and the goal is to reach personalized emotional text to speech synthesis.
keywords : cloud text to speech synthesis, malayalam, bi-lingual, english, multiple speakers