Speech synthesis is difficult. If you're on the look for text-to-speech software, you don't have many options, at least not for a few more years, when computers will be stronger and machine learning will make a few more advancements. Meanwhile you have to resort to old TTS software, or give up on yourself and use a online service.
Here we'll explore two options:
Browse the related snippets on the dedicated lab page.
You can provide the text to read as
stdin, as a file with
-f, or as arguments.
It's very useful to make espeak output to
stdout with the
espeak [option]... ["<words>"]0
Partial list of
||file to read|
||write to file
||change the used voice|
||list all voices|
||pitch adjustment, default=50||0-99|
||speed in words per minute, default=175||80-450|
||speak the names of punctuation characters|
espeak "hello world"0
This will greet us with a very robotic "Hello world". There's some kind of flanger effect going on. The result would be good enough if we were to build a Jarvis assistant in the 1980s.
Try making it say other things, and you'll notice that sometimes it's hard to understand what it's saying.
You can install additional voices to use with espeak, for instance MBROLA voices.
Let's check out the list of voices available. To do that, run
espeak know which voice to use with the
-v option. Pass any of the voices that you discovered earlier (use the values from the
VoiceName or the
To better test this, we'll switch from Hello world to something more useful.
espeak -v english-mb-en1 "you need to leave in 20 minutes"0
Now that's interesting, it would make a fine voice for a Doom Jarvis. Still robotic, but deeper and at a slower pace. This particular voice is a MBROLA voice. Explore the other voices on your own.
You'll notice that all the voices (not pronounciations) sound kind of the same. The only good usecase I've found for espeak is for Stephen Hawking's simulator voice.
To get a deeper understanding of how Festival and the FestVox suite work, read the document.
flite gets the text to read from
stdin, a file, or as arguments, and its output can be played or written to a wave file.
flite TEXT/FILE [output_file]0
[output_file] is not specified or "play", flite will just play the output.
Partial list of
||explicitly set the output file|
||explicitly set the input file|
||explicitly set the input text|
||set the used voice||voice name from
or a voice file
||list available voices|
flite "hello world"0
FestVox voices sound more human, so we're on the right track to find a more modern voice for our imaginary friend, Jarvis.
Just like with
espeak, you can get more voices for
You can grab these ones for testing.
You can pass the voice files directly to
flite using the
echo "today we will learn how to code" \ | flite -voice ./cmu_us_axb.flitevox0 1
Oh, the shivers. It sounds like every video course I don't take because of the indian accent (sorry guys, I'm no native english speaker either, but your accent is like acid for my ears).
SoX - the Swiss Army knife of audio manipulation - is a tool we can use to make some small adjustments to the audio that comes out from
We'll use the
play command that SoX provides for this.
echo "very interestingly" \ | flite -voice ./cmu_indic_aup_mr.flitevox -o tmp.wav -pw play - pitch -50 speed 1.1 treble 10 < tmp.wav0 1 2
Yes, this is a poor attempt at synthesizing Bisqwit's voice. You'd think that it would be easier to synthesize an already synthesized voice, but this is the best I could do.
Playing with TTS is always fun. Go write your personal assistant, put it on an embedded computer, and let it speak to your brain through some unicorn-powered tech (like this one - recently expired, and before you ask - NO, DO NOT CHEAT ON YOUR EXAMS, YOU HUFFLEPUFF).