Complete Text to Speech Tutorial โ From Experience to Export
From reading aloud directly in the browser, to batch-exporting MP3s with Python scripts, to fine-tuning with SSML. Step by step โ by the end you'll be able to do your own video voiceovers.
Zero Barrier: Edge Browser "Read Aloud"
No installation needed โ just open the Edge browser:
Ctrl+Shift+U to read the entire page aloudThis feature is great for listening to articles or checking how your writing flows. But it can't export audio files, so the next section shows you how to export.
Method 1: edge-tts Python Script (Recommended)
This is currently the most convenient way to export Edge TTS speech as MP3 files. Cross-platform โ works on Mac, Windows, and Linux.
Installation Steps
python --version)pip install edge-ttsUsage Commands
# Basic usage: text to MP3
edge-tts --text "Hello, welcome to the text-to-speech tool" --voice en-US-AriaNeural --write-media output.mp3
# Read from a text file
edge-tts --file input.txt --voice en-US-GuyNeural --write-media output.mp3
# List all available voices
edge-tts --list-voices
Popular voice codes:
en-US-AriaNeuralโ Aria (female, warm)en-US-GuyNeuralโ Guy (male, clear)en-US-JennyNeuralโ Jenny (female, friendly)en-GB-SoniaNeuralโ Sonia (British female)
Method 2: Balabolka (Windows GUI)
If you don't want to touch the command line, Balabolka is the most intuitive choice.
cross-plus-a.com/balabolka.htm and download BalabolkaBalabolka also supports batch processing: File โ Batch File Conversion โ select folder โ set parameters โ Start. This feature is a lifesaver when making audiobooks.
Method 3: Recording Method (Simplest but Most Universal)
If none of the above methods work for you:
The downside is 1:1 time โ a 5,000-character article takes half an hour to record. But the upside is it works 100% of the time, regardless of your operating system.
Advanced: SSML Fine-Tuning
SSML (Speech Synthesis Markup Language) allows fine-grained control over TTS output. edge-tts supports SSML:
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-AriaNeural">
This is a sentence<break time="500ms"/>with a pause.
<prosody rate="slow">This sentence is read slower.</prosody>
<prosody pitch="high">This sentence has a higher pitch!</prosody>
</voice>
</speak>
Save as script.ssml then use: edge-tts --file script.ssml --write-media output.mp3
Common SSML tags: <break> controls pauses, <prosody> controls speed and pitch, <emphasis> adds emphasis. Once you get comfortable with these, you can produce incredibly natural voiceovers โ your audience won't even realize it's AI.