A cutting-edge Cascading voice assistant combining speech recognition, AI reasoning, and neural text-to-speech capabilities. Built with real-time interaction in mind ( LLM tool calls )
- 🎙️ Real-time speech recognition using Whisper + Silero VAD
- 🤖 Multimodal reasoning with Llama 3.1 8B through Agno agent
- 🌐 Web integration (Google Search, Wikipedia, Arxiv)
- 🗣️ Natural voice synthesis with Kokoro-82M ONNX
- ⚡ Low-latency audio processing pipeline
- 🔧 Extensible tool system for agent capabilities
Component | Technology |
---|---|
Speech-to-Text | Whisper (large-v1) + Silero VAD |
Language Model | Llama 3.1 8B via Ollama |
Text-to-Speech | Kokoro-82M ONNX |
Agent Framework | Agno LLM Agent |
- Python 3.9+
- Ollama running locally
Download and install Ollama from Ollama Mac download page.
Run the following command in your terminal:
curl -fsSL https://ollama.com/install.sh | sh
Download and install Ollama from Ollama Mac download page.
# Clone repository
git clone https://github.com/tarun7r/Vocal-Agent.git
# Install Python dependencies
pip3 install -r requirements.txt
pip3 install --no-deps kokoro-onnx==0.4.7
# Install system dependencies for linux
sudo apt-get install espeak-ng
# For Mac users use brew to install
brew install espeak-ng
To install precompiled binaries of eSpeak NG on Windows:
- Visit the eSpeak NG Releases page.
- Click on the Latest release and download the appropriate
.msi
file, e.g.,espeak-ng-20191129-b702b03-x64.msi
. - Execute the downloaded installer package to complete the installation.
- For advanced configuration and usage, refer to the eSpeak NG User Guide.
ollama pull llama3.1:8b ( you can run any model which supports tool calling according to you requirement )
- Download
kokoro-v1.0.onnx
andvoices-v1.0.bin
from kokoro-onnx releases. - Place them in the project directory - Refer to the project strcture
Start Ollama service:
ollama serve
ollama run llama3.1:8b
In a separate terminal:
python3 main.py
Important: Ensure ollama serve
is running before executing main.py
Listening... Press Ctrl+C to exit ⠋
speak now - Recording started ⠸
recording - Recording stopped
Transcribed: Who won the 2022 FIFA World Cup?
LLM Tool calls...
Response from the knowledge agent: The 2022 FIFA World Cup was won by Argentina, led by Lionel Messi. They defeated France in the final on December 18, 2022.
[Audio starts playing]
Key settings in main.py:
# Audio processing
SAMPLE_RATE = 16000
MAX_PHONEME_LENGTH = 500
# Voice synthesis
SPEED = 1.2 # Adjust speech rate
VOICE_PROFILE = "af_heart" # Choose from voices-v1.0.bin
# Agent settings
MAX_THREADS = 2 # Parallel processing threads
.
├── main.py # Core application logic
├── agent_client.py # LLM agent integration
├── kokoro-v1.0.onnx # TTS model
├── voices-v1.0.bin # Voice profiles
├── requirements.txt # Python dependencies
└── README.md
The vocal_agent_mac.sh
script automates the setup and execution of the Vocal-Agent application on macOS. It ensures all dependencies are installed, sets up the environment, and starts the required services.
Before running the script, ensure the following are installed on your system:
- Homebrew: Install Homebrew from https://brew.sh/
- espeak-ng: The script will install this using Homebrew if it's not already installed
- Ollama: Download and install Ollama from https://ollama.com/download/mac
- Kokor Models: The script will download the onnx mdoels and voice bin using the curl
- Clone this repository:
git clone https://github.com/tarun7r/Vocal-Agent.git cd Vocal-Agent
MIT License - See LICENSE for details
- RealtimeSTT for STS + VAD integration
- Kokoro-ONNX for efficient TTS
- Agno for agent framework
- Ollama for local LLM serving
- Project inspiration from - Weebo
- You can add more tools to the agent - Agno Toolkits