Real-Time Cascading Speech-to-Speech Chatbot 🤖

A cutting-edge Cascading voice assistant combining speech recognition, AI reasoning, and neural text-to-speech capabilities. Built with real-time interaction in mind ( LLM tool calls )

Features ✨

🎙️ Real-time speech recognition using Whisper + Silero VAD
🤖 Multimodal reasoning with Llama 3.1 8B through Agno agent
🌐 Web integration (Google Search, Wikipedia, Arxiv)
🗣️ Natural voice synthesis with Kokoro-82M ONNX
⚡ Low-latency audio processing pipeline
🔧 Extensible tool system for agent capabilities

Tech Stack 🛠️

Component	Technology
Speech-to-Text	Whisper (large-v1) + Silero VAD
Language Model	Llama 3.1 8B via Ollama
Text-to-Speech	Kokoro-82M ONNX
Agent Framework	Agno LLM Agent

Installation 📦

Prerequisites

Python 3.9+
Ollama running locally

Install Ollama

On Mac:

Download and install Ollama from Ollama Mac download page.

On Linux:

Run the following command in your terminal:

curl -fsSL https://ollama.com/install.sh | sh

On Windows

Download and install Ollama from Ollama Mac download page.

# Clone repository
git clone https://github.com/tarun7r/Vocal-Agent.git

# Install Python dependencies
pip3 install -r requirements.txt

pip3 install --no-deps kokoro-onnx==0.4.7

# Install system dependencies for linux
sudo apt-get install espeak-ng

# For Mac users use brew to install 
brew install espeak-ng

Install system dependencies for Windows

To install precompiled binaries of eSpeak NG on Windows:

Visit the eSpeak NG Releases page.
Click on the Latest release and download the appropriate .msi file, e.g., espeak-ng-20191129-b702b03-x64.msi.
Execute the downloaded installer package to complete the installation.
For advanced configuration and usage, refer to the eSpeak NG User Guide.

Models Setup 🧠

Llama 3.1 8B:

ollama pull llama3.1:8b ( you can run any model which supports tool calling according to you requirement )

Kokoro Models:

Download kokoro-v1.0.onnx and voices-v1.0.bin from kokoro-onnx releases.
Place them in the project directory - Refer to the project strcture

Usage 🚀

Start Ollama service:

ollama serve

ollama run llama3.1:8b

In a separate terminal:

python3 main.py

Important: Ensure ollama serve is running before executing main.py

Flow after running `main.py`:

Listening... Press Ctrl+C to exit ⠋
speak now - Recording started ⠸
recording - Recording stopped

Transcribed: Who won the 2022 FIFA World Cup?
LLM Tool calls...

Response from the knowledge agent: The 2022 FIFA World Cup was won by Argentina, led by Lionel Messi. They defeated France in the final on December 18, 2022.

[Audio starts playing]

Configuration ⚙️

Key settings in main.py:

# Audio processing
SAMPLE_RATE = 16000
MAX_PHONEME_LENGTH = 500

# Voice synthesis
SPEED = 1.2  # Adjust speech rate
VOICE_PROFILE = "af_heart"  # Choose from voices-v1.0.bin

# Agent settings
MAX_THREADS = 2  # Parallel processing threads

Project Structure 📂

.
├── main.py               # Core application logic
├── agent_client.py       # LLM agent integration
├── kokoro-v1.0.onnx      # TTS model
├── voices-v1.0.bin       # Voice profiles
├── requirements.txt      # Python dependencies
└── README.md

Vocal-Agent Setup Script for macOS

The vocal_agent_mac.sh script automates the setup and execution of the Vocal-Agent application on macOS. It ensures all dependencies are installed, sets up the environment, and starts the required services.

Prerequisites

Before running the script, ensure the following are installed on your system:

Homebrew: Install Homebrew from https://brew.sh/
espeak-ng: The script will install this using Homebrew if it's not already installed
Ollama: Download and install Ollama from https://ollama.com/download/mac
Kokor Models: The script will download the onnx mdoels and voice bin using the curl

How to Use the Script

Clone this repository:

git clone https://github.com/tarun7r/Vocal-Agent.git
cd Vocal-Agent

License 📄

MIT License - See LICENSE for details

Acknowledgements

RealtimeSTT for STS + VAD integration
Kokoro-ONNX for efficient TTS
Agno for agent framework
Ollama for local LLM serving
Project inspiration from - Weebo
You can add more tools to the agent - Agno Toolkits

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-Time Cascading Speech-to-Speech Chatbot 🤖

Features ✨

Tech Stack 🛠️

Installation 📦

Prerequisites

Install Ollama

On Mac:

On Linux:

On Windows

Install system dependencies for Windows

Models Setup 🧠

Llama 3.1 8B:

Kokoro Models:

Usage 🚀

Flow after running `main.py`:

Configuration ⚙️

Project Structure 📂

Vocal-Agent Setup Script for macOS

Prerequisites

How to Use the Script

License 📄

Acknowledgements

About

Uh oh!

Releases 1

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent_client.py		agent_client.py
demo.png		demo.png
main.py		main.py
requirements.txt		requirements.txt
vocal_agent_mac.sh		vocal_agent_mac.sh

License

tarun7r/Vocal-Agent

Folders and files

Latest commit

History

Repository files navigation

Real-Time Cascading Speech-to-Speech Chatbot 🤖

Features ✨

Tech Stack 🛠️

Installation 📦

Prerequisites

Install Ollama

On Mac:

On Linux:

On Windows

Install system dependencies for Windows

Models Setup 🧠

Llama 3.1 8B:

Kokoro Models:

Usage 🚀

Flow after running main.py:

Configuration ⚙️

Project Structure 📂

Vocal-Agent Setup Script for macOS

Prerequisites

How to Use the Script

License 📄

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Languages

Flow after running `main.py`: