Text to Speech FAQ

All the common issues and questions about TTS tools, gathered right here.

Q: How do I export MP3 files from Edge TTS?

Three methods: โ‘  Use the edge-tts Python package (recommended), pip install edge-tts then export with a single command; โ‘ก On Windows, use Balabolka โ€” paste your text and save the audio file directly; โ‘ข Use Audacity to record system audio. See the Guide for detailed steps.

Q: Can TTS-generated voices be used commercially?

Edge TTS (Microsoft Azure) voices can be used commercially โ€” Azure's terms of service don't prohibit it. However, some third-party online services like NaturalReader's free tier explicitly ban commercial use. Always check the terms of service before commercial use to avoid copyright complaints after publishing your video. Edge TTS is the safest choice.

Q: Are there any good mobile TTS apps?

iOS's built-in "Speak Screen" is actually quite good (Settings โ†’ Accessibility โ†’ Spoken Content). On Android, there's an app called "T2S" that works well. But if you're making video voiceovers, it's better to do it on a computer โ€” mobile app voices generally aren't as good as Edge TTS, and exporting is more cumbersome. It's easier to make the MP3 on a computer and then transfer it to your phone.

Q: Which Chinese TTS voice sounds the most natural?

The consensus is that Edge TTS's "Xiaoxiao" female voice is the most natural. Next is "Yunxi" male voice. Baidu's voices are decent but their free quota is stingy. iFlytek's TTS is also good but requires payment. The best combination of free + natural is Edge TTS, no contest.

Q: Can it handle polyphonic characters correctly? E.g., "read" (present vs past tense).

Edge TTS handles polyphonic characters quite well โ€” most common cases are read correctly. But obscure polyphonic characters may trip it up. If you need a specific pronunciation for a word, you can specify it via SSML phonemes. See the SSML section in the Guide.

Q: The generated speech sounds too robotic. What can I do?

Three approaches: โ‘  Switch voices โ€” Edge TTS's "Xiaoxiao" and "Yunxi" are already very natural. If you're using older TTS engines like the default Windows ones, switch to Edge voices immediately; โ‘ก Adjust speed โ€” too fast or too slow both sound unnatural. A 0.9-1.1x speed range feels most natural; โ‘ข Add pauses โ€” use SSML to insert <break time="300ms"/> between sentences to make it sound like a real person thinking.

Q: Can it generate dialects? Like Cantonese or regional accents?

Edge TTS supports Cantonese (yue-CN), but the quality isn't as good as Mandarin. Regional dialects like Sichuanese or Northeastern Mandarin aren't supported by free TTS tools currently. If you need dialect voiceovers, you may need to look at commercial solutions or... record it yourself.

Q: How do I use Edge TTS on Mac? I don't have the Edge browser installed.

You don't need the Edge browser. On Mac, just use the edge-tts Python package: pip install edge-tts, then call Microsoft's online API from the command line. The core of Edge TTS is Microsoft Azure's cloud service โ€” the Edge browser is just one of its clients and isn't really related to the TTS engine itself.

Q: How long of a speech can I generate at once? Is there a limit?

edge-tts theoretically has no hard limit, but a single request that's too long may time out. In practice, generating up to 30 minutes of speech in one go is fine. For anything longer, it's recommended to split it into segments and stitch them together with audio software. Balabolka also processes in segments and auto-stitches.