Skip to content

pinguy/kokoro-tts-addon

Repository files navigation

Kokoro TTS Add-on

🧠 Local Neural Text-to-Speech for Firefox — fast, private, offline.

Tested on a Xeon E3-1265L v3 (2013) — Ran multiple TTS jobs in parallel with barely perceptible lag.
If it works on this, it'll fly on your machine.


🔍 What is This?

Kokoro TTS is a browser extension that lets you convert selected or pasted text into natural-sounding speech — without needing an internet connection.
It uses a lightweight Flask server and the Kokoro model running locally on your system.

  • ✅ No accounts or logins
  • ✅ No cloud APIs or telemetry
  • ✅ No GPU required but helps a lot, if no usable GPU falls to using the CPU.

🚀 Features

  • 🎙️ Neural TTS with multiple voice options
  • 🔒 Offline-first & privacy-respecting
  • 🧊 Lightweight: Small 82M parameters
  • 🥔 Works on low-end CPUs
  • 🌍 Linux, macOS, and Windows support

⚙️ Installation

1. Download from Releases

Head to the Releases Page and grab:

  • latest kokoro-tts-addon.xpi
  • server.py

2. Install the Add-on in Firefox

  • Go to about:addons
  • Click the gear icon → Install Add-on From File...
  • Select the .xpi you downloaded

3. Start the Local Server

macOS / Linux:

nohup python3 /path/to/server.py &

Windows:

Create a .bat file like this:

cd C:\path\to\server
start python server.py

Drop a shortcut to it in the Startup folder (Win + R → shell:startup).

To install espeak-ng on Windows:

  1. Go to espeak-ng releases
  2. Click on Latest release
  3. Download the appropriate *.msi file (e.g. espeak-ng-20191129-b702b03-x64.msi)
  4. Run the downloaded installer

For advanced configuration and usage on Windows, see the official espeak-ng Windows guide


🧪 How to Test

  1. Visit http://localhost:8000/health
  2. You should see a simple “healthy” JSON response
  3. Use the extension: paste text, pick a voice, click “Generate Speech” 🎉

📌 Notes

  • First-time run will download the model
  • Make sure Python 3.8+ is installed and in PATH
  • All processing is local — nothing leaves your machine

🧩 Dependencies

You’ll need Python 3.8+ and pip installed. Most systems already have them.
To install all required Python packages (including some optional extras for extended model usage), run:

python3 -m pip install --upgrade pip
pip install --upgrade pip setuptools
cat requirements.txt | xargs -n 1 pip3 install
pip3 install -U flask-cors

📄 License

Licensed under the Apache License 2.0


❤️ Credits

Powered by the Kokoro TTS model


Feature Preview
Popup UI: Select text, and this pops up. UI Preview
Playback in Action: After clicking "Generate Speech" Playback Preview
System Notifications: Get notified when playback starts (not pictured)
Settings Panel: configuration options Settings
Voice List: Browse the models available Voices
Accents Supported: 🇺🇸 American English, 🇬🇧 British English, 🇪🇸 Spanish, 🇫🇷 French, 🇮🇹 Italian, 🇧🇷 Portuguese (BR), 🇮🇳 Hindi, 🇯🇵 Japanese, 🇨🇳 Mandarin Chines Accents

Video - Kokoro Text-to-Speech - Local on a Potato Vs Hugging Face

Watch the video

Comparison of offline using MKLDNN vs online generation using WASM/WebGPU.