🐸Coqui.ai News
•
•

🚀 Pretrained models in +1100 languages.
🛠️ Tools for training new models and fine-tuning existing models in any language.
📚 Utilities for dataset analysis and curation.
______________________________________________________________________
______________________________________________________________________
💬 Where to ask questions
Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly so that more people can benefit from it.
| Type | Platforms |
| ------------------------------- | --------------------------------------- |
| 🚨 Bug Reports | [GitHub Issue Tracker] |
| 🎁 Feature Requests & Ideas | [GitHub Issue Tracker] |
| 👩💻 Usage Questions | [GitHub Discussions] |
| 🗯 General Discussion | [GitHub Discussions] or [Discord] |
[github issue tracker]: https://github.com/coqui-ai/tts/issues
[github discussions]: https://github.com/coqui-ai/TTS/discussions
[discord]: https://discord.gg/5eXr5seRrv
[Tutorials and Examples]: https://github.com/coqui-ai/TTS/wiki/TTS-Notebooks-and-Tutorials
🔗 Links and Resources
| Type | Links |
| ------------------------------- | --------------------------------------- |
| 💼 Documentation | ReadTheDocs
| 💾 Installation | TTS/README.md|
| 👩💻 Contributing | CONTRIBUTING.md|
| 📌 Road Map | Main Development Plans
| 🚀 Released Models | TTS Releases and Experimental Models|
| 📰 Papers | TTS Papers|
🥇 TTS Performance
Underlined "TTS" and "Judy" are internal 🐸TTS models that are not released open-source. They are here to show the potential. Models prefixed with a dot (.Jofish .Abe and .Janice) are real human voices.
Features
•
•
•
•
•
•
•
•
Trainer API
.
•
•
dataset_analysis
``.
•
•
Model Implementations
Spectrogram models
End-to-End Models
Attention Methods
Speaker Encoder
Vocoders
Voice Conversion
You can also help us implement more models.
Installation
🐸TTS is tested on Ubuntu 18.04 with python >= 3.9, < 3.12..
If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option.
1pip install TTS
If you plan to code or train models, clone 🐸TTS and install it locally.
1git clone https://github.com/coqui-ai/TTS
2pip install -e .[all,dev,notebooks] # Select the relevant extras
If you are on Ubuntu (Debian), you can also run following commands for installation.
1$ make system-deps # intended to be used on Ubuntu (Debian). Let us know if you have a different OS.
2$ make install
If you are on Windows, 👑@GuyPaddock wrote installation instructions here.
Docker Image
You can also try TTS without install with the docker image.
Simply run the following command and you will be able to run TTS without installing it.
1docker run --rm -it -p 5002:5002 --entrypoint /bin/bash ghcr.io/coqui-ai/tts-cpu
2python3 TTS/server/server.py --list_models #To get the list of available models
3python3 TTS/server/server.py --model_name tts_models/en/vctk/vits # To start a server
You can then enjoy the TTS server here
More details about the docker images (like GPU support) can be found here
Synthesizing speech by 🐸TTS
🐍 Python API
Running a multi-speaker and multi-lingual model
1import torch
2from TTS.api import TTS
3
4# Get device
5device = "cuda" if torch.cuda.is_available() else "cpu"
6
7# List available 🐸TTS models
8print(TTS().list_models())
9
10# Init TTS
11tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
12
13# Run TTS
14# ❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language
15# Text to speech list of amplitude values as output
16wav = tts.tts(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en")
17# Text to speech to a file
18tts.tts_to_file(text="Hello world!", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")
Running a single speaker model
1# Init TTS with the target model name
2tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False).to(device)
3
4# Run TTS
5tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH)
6
7# Example voice cloning with YourTTS in English, French and Portuguese
8tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False).to(device)
9tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")
10tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr-fr", file_path="output.wav")
11tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt-br", file_path="output.wav")
Example voice conversion
Converting the voice in source_wav
to the voice of target_wav
1tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda")
2tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav")
Example voice cloning together with the voice conversion model.
This way, you can clone voices by using any model in 🐸TTS.
1tts = TTS("tts_models/de/thorsten/tacotron2-DDC")
2tts.tts_with_vc_to_file(
3 "Wie sage ich auf Italienisch, dass ich dich liebe?",
4 speaker_wav="target/speaker.wav",
5 file_path="output.wav"
6)
Example text to speech using Fairseq models in ~1100 languages 🤯.
For Fairseq models, use the following name format: tts_models/
.
You can find the language ISO codes here
and learn about the Fairseq models here.
1# TTS with on the fly voice conversion
2api = TTS("tts_models/deu/fairseq/vits")
3api.tts_with_vc_to_file(
4 "Wie sage ich auf Italienisch, dass ich dich liebe?",
5 speaker_wav="target/speaker.wav",
6 file_path="output.wav"
7)
Command-line tts
Synthesize speech on command line.
You can either use your trained model or choose a model from the provided list.
If you don't specify any models, then it uses LJSpeech based English model.
Single Speaker Models
•
1$ tts --list_models
•
•
The model_info_by_name uses the name as it from the --list_models.
1$ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
For example:
1$ tts --model_info_by_name tts_models/tr/common-voice/glow-tts
2 $ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2
•
The model_query_idx uses the corresponding idx from --list_models.
1$ tts --model_info_by_idx "<model_type>/<model_query_idx>"
For example:
1$ tts --model_info_by_idx tts_models/3
•
1$ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
•
1$ tts --text "Text for TTS" --out_path output/path/speech.wav
•
1$ tts --text "Text for TTS" --pipe_out --out_path output/path/speech.wav | aplay
•
1$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav
For example:
1$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav
•
1$ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav
For example:
1$ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav
•
1$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
•
1$ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
2 --vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json
Multi-speaker Models
•
1$ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs
•
1$ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --speaker_idx <speaker_id>
•
1$ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/model.pth --config_path path/to/config.json --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>
Voice Conversion Models
1$ tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav>
Directory Structure
1|- notebooks/ (Jupyter Notebooks for model evaluation, parameter selection and data analysis.)
2|- utils/ (common utilities.)
3|- TTS
4 |- bin/ (folder for all the executables.)
5 |- train*.py (train your target model.)
6 |- ...
7 |- tts/ (text to speech models)
8 |- layers/ (model layer definitions)
9 |- models/ (model definitions)
10 |- utils/ (model specific utilities.)
11 |- speaker_encoder/ (Speaker Encoder models.)
12 |- (same)
13 |- vocoder/ (Vocoder models.)
14 |- (same)