Skip to content

tarun7r/Vocal-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time Cascading Speech-to-Speech Chatbot 🤖

A cutting-edge Cascading voice assistant combining speech recognition, AI reasoning, and neural text-to-speech capabilities. Built with real-time interaction in mind ( LLM tool calls )

Features ✨

  • 🎙️ Real-time speech recognition using Whisper + Silero VAD
  • 🤖 Multimodal reasoning with Llama 3.1 8B through Agno agent
  • 🌐 Web integration (Google Search, Wikipedia, Arxiv)
  • 🗣️ Natural voice synthesis with Kokoro-82M ONNX
  • ⚡ Low-latency audio processing pipeline
  • 🔧 Extensible tool system for agent capabilities

Tech Stack 🛠️

Component Technology
Speech-to-Text Whisper (large-v1) + Silero VAD
Language Model Llama 3.1 8B via Ollama
Text-to-Speech Kokoro-82M ONNX
Agent Framework Agno LLM Agent

Installation 📦

Prerequisites

  • Python 3.9+
  • Ollama running locally

Install Ollama

On Mac:

Download and install Ollama from Ollama Mac download page.

On Linux:

Run the following command in your terminal:

curl -fsSL https://ollama.com/install.sh | sh

On Windows

Download and install Ollama from Ollama Mac download page.

# Clone repository
git clone https://github.com/tarun7r/Vocal-Agent.git

# Install Python dependencies
pip3 install -r requirements.txt

pip3 install --no-deps kokoro-onnx==0.4.7

# Install system dependencies for linux
sudo apt-get install espeak-ng

# For Mac users use brew to install 
brew install espeak-ng

Install system dependencies for Windows

To install precompiled binaries of eSpeak NG on Windows:

  1. Visit the eSpeak NG Releases page.
  2. Click on the Latest release and download the appropriate .msi file, e.g., espeak-ng-20191129-b702b03-x64.msi.
  3. Execute the downloaded installer package to complete the installation.
  4. For advanced configuration and usage, refer to the eSpeak NG User Guide.

Models Setup 🧠

Llama 3.1 8B:

ollama pull llama3.1:8b ( you can run any model which supports tool calling according to you requirement )

Kokoro Models:

  • Download kokoro-v1.0.onnx and voices-v1.0.bin from kokoro-onnx releases.
  • Place them in the project directory - Refer to the project strcture

Usage 🚀

Start Ollama service:

ollama serve

ollama run llama3.1:8b

In a separate terminal:

python3 main.py

Important: Ensure ollama serve is running before executing main.py

Flow after running main.py:

Listening... Press Ctrl+C to exit ⠋
speak now - Recording started ⠸
recording - Recording stopped

Transcribed: Who won the 2022 FIFA World Cup?
LLM Tool calls...

Response from the knowledge agent: The 2022 FIFA World Cup was won by Argentina, led by Lionel Messi. They defeated France in the final on December 18, 2022.

[Audio starts playing]

Chatbot Demo

Configuration ⚙️

Key settings in main.py:

# Audio processing
SAMPLE_RATE = 16000
MAX_PHONEME_LENGTH = 500

# Voice synthesis
SPEED = 1.2  # Adjust speech rate
VOICE_PROFILE = "af_heart"  # Choose from voices-v1.0.bin

# Agent settings
MAX_THREADS = 2  # Parallel processing threads

Project Structure 📂

.
├── main.py               # Core application logic
├── agent_client.py       # LLM agent integration
├── kokoro-v1.0.onnx      # TTS model
├── voices-v1.0.bin       # Voice profiles
├── requirements.txt      # Python dependencies
└── README.md

Vocal-Agent Setup Script for macOS

The vocal_agent_mac.sh script automates the setup and execution of the Vocal-Agent application on macOS. It ensures all dependencies are installed, sets up the environment, and starts the required services.

Prerequisites

Before running the script, ensure the following are installed on your system:

  1. Homebrew: Install Homebrew from https://brew.sh/
  2. espeak-ng: The script will install this using Homebrew if it's not already installed
  3. Ollama: Download and install Ollama from https://ollama.com/download/mac
  4. Kokor Models: The script will download the onnx mdoels and voice bin using the curl

How to Use the Script

  1. Clone this repository:
    git clone https://github.com/tarun7r/Vocal-Agent.git
    cd Vocal-Agent
    

License 📄

MIT License - See LICENSE for details

Acknowledgements

About

Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.

Topics

Resources

License

Stars

Watchers

Forks