Unity package for using LiveTalk on-device models for real-time talking head generation and character animation.
LiveTalk is a unified, high-performance talking head generation system that combines the power of LivePortrait and MuseTalk open-source repositories. The PyTorch models from these projects have been ported to ONNX format and optimized for CoreML to enable efficient on-device inference in Unity.
LivePortrait provides facial animation and expression transfer capabilities, while MuseTalk handles real-time lip synchronization with audio. Together, they create a complete pipeline for generating natural-looking talking head videos from avatar images and audio input. Spark-TTS-Unity is the dependency package for TTS generation
- 🎮 Unity-Native Integration: Complete API designed specifically for Unity with singleton pattern
- 🎭 Dual-Pipeline Processing: LivePortrait for facial animation + MuseTalk for lip sync
- 👤 Advanced Character System: Create, save, and load characters with multiple expressions and voices
- 💻 Runs Offline: All processing happens on-device with ONNX Runtime
- ⚡ Real-time Performance: Optimized for real-time inference with frame streaming
- 🎨 Multiple Expression Support: 7 built-in expressions (talk-neutral, approve, disapprove, smile, sad, surprised, confused)
- 🔊 Integrated TTS: Built-in SparkTTS integration for voice generation
- 📦 Cross-Platform Character Format: Supports both folder and macOS bundle formats
- 🎥 Flexible Input: Supports images, videos, and directory-based driving frames
- AI-driven NPCs in games
- Virtual assistants and chatbots
- Real-time character animation
- Interactive storytelling applications
- Video content generation
- Accessibility features
- Virtual avatars and digital humans
- Open your Unity project
- Open the Package Manager (Window > Package Manager)
- Click the "+" button in the top-left corner
- Select "Add package from git URL..."
- Enter the repository URL:
https://github.com/arghyasur1991/LiveTalk-Unity.git
- Click "Add"
- Clone this repository
- Copy the contents into your Unity project's Packages folder
This package requires the following Unity packages:
- com.genesis.sparktts.unity
Some dependencies require additional scoped registry configuration. Add the following to your project's Packages/manifest.json
file:
{
"scopedRegistries": [
{
"name": "NPM",
"url": "https://registry.npmjs.com",
"scopes": [
"com.github.asus4"
]
}
],
"dependencies": {
"com.genesis.LiveTalk.unity": "https://github.com/arghyasur1991/LiveTalk-Unity.git",
// ... other dependencies
}
}
Note: The git URL https://github.com/arghyasur1991/LiveTalk-Unity.git
will automatically fetch the latest version of the package.
LiveTalk requires ONNX models from both LivePortrait and MuseTalk in the following location:
Assets/StreamingAssets/LiveTalk/
└── models/
├── LivePortrait/
│ ├── *.onnx
└── MuseTalk/
├── *.onnx
SparkTTS models are required for voice generation and should be placed in:
Assets/StreamingAssets/SparkTTS/
├── *.onnx
└── LLM/
├── model.onnx
├── model.onnx_data
├── ...
LiveTalk includes a built-in Editor tool that automatically analyzes your codebase and copies only the required models from Assets/Models
to StreamingAssets
with the correct precision settings (FP16, FP32, etc.).
Access the tool: Window > LiveTalk > Model Deployment Tool
- Precision-Aware: Copies only the required precision variants (FP16/FP32) based on code analysis
- Size Optimization: Reduces build size by excluding unused models
- Folder Structure Preservation: Maintains the correct directory structure in StreamingAssets
- Backup Support: Creates backups of existing models before overwriting
- Dry Run Mode: Preview changes without actually copying files
- Open the tool: Go to
Window > LiveTalk > Model Deployment Tool
- Configure paths:
- Source:
Assets/Models
(automatically detected) - Destination:
Assets/StreamingAssets/LiveTalk
(automatically configured)
- Source:
- Select components: Choose which model categories to deploy:
- ✅ SparkTTS Models (deployed via SparkTTS-Unity package)
- ✅ LivePortrait Models (deployed directly)
- ✅ MuseTalk Models (deployed directly)
- Review selection: The tool shows you exactly which LiveTalk models will be copied and their file sizes
- Deploy: Click "Deploy All Models" to copy both LiveTalk and SparkTTS models using their respective deployment systems
The tool selects the used precision for each model based on the LiveTalk codebase:
Model Category | Precision | Execution Provider | Notes |
---|---|---|---|
LivePortrait | |||
warping_spade | FP16 | CoreML | GPU-accelerated rendering |
Other LivePortrait | FP32 | CoreML | Full precision for facial features |
MuseTalk | |||
unet, vae_encoder, vae_decoder | FP16 | CoreML | GPU-accelerated inference |
whisper_encoder, positional_encoding | FP32 | CPU | Audio processing precision |
SparkTTS | |||
Models deployed via SparkTTS-Unity package | See SparkTTS documentation | Various | Handled by SparkTTS deployment tool |
- Overwrite Existing: Replace existing models in StreamingAssets
- Create Backup: Keep .backup copies of replaced files (includes .onnx.data files)
- Dry Run: Preview operations without copying files
The tool automatically handles large models that use separate data files:
- MuseTalk UNet:
unet.onnx
(710KB) +unet.onnx.data
(3.2GB) - uses dot notation - SparkTTS LLM: Handled by SparkTTS-Unity deployment tool with
model.onnx_data
files
LiveTalk model and data files are copied together and included in size calculations and backup operations. SparkTTS models are handled by the SparkTTS-Unity package's own deployment system.
This tool ensures your Unity project includes only the models you actually need, significantly reducing build size while maintaining optimal performance.
SparkTTS models can also be deployed independently using the SparkTTS-Unity package's standalone tool:
Access: Window > SparkTTS > Model Deployment Tool
This allows you to:
- Deploy only SparkTTS models without LiveTalk models
- Use SparkTTS in projects that don't include LiveTalk
- Have fine-grained control over SparkTTS model deployment
Download the pre-exported ONNX models from Google Drive.
- Download the ZIP file from the link
- Extract the contents
- Copy the extracted
LiveTalk
folder with models to your Unity project'sAssets/Models/
directory - Use the Model Deployment Tool (recommended): Go to
Window > LiveTalk > Model Deployment Tool
to automatically copy only the required models with optimal precision settings
Check the Model Setup section of Spark-TTS-Unity
Coming Soon - conversion scripts to export models from the original Python repositories:
- LivePortrait: https://github.com/KwaiVGI/LivePortrait
- MuseTalk: https://github.com/TMElyralab/MuseTalk
The export scripts will convert PyTorch models to ONNX format and apply CoreML optimizations for Unity integration.
using UnityEngine;
using LiveTalk.API;
using System.Collections;
public class LiveTalkExample : MonoBehaviour
{
void Start()
{
// Initialize the LiveTalk system
LiveTalkAPI.Instance.Initialize(
logLevel: LogLevel.INFO,
initializeModelsOnDemand: true, // Load models when needed (default: true)
characterSaveLocation: "", // Uses default location
parentModelPath: "" // Uses StreamingAssets
);
}
}
using UnityEngine;
using LiveTalk.API;
using System.Collections;
public class CharacterCreation : MonoBehaviour
{
[SerializeField] private Texture2D characterImage;
IEnumerator Start()
{
// Initialize API
LiveTalkAPI.Instance.Initialize();
// Create a new character
yield return LiveTalkAPI.Instance.CreateCharacterAsync(
name: "MyCharacter",
gender: Gender.Female,
image: characterImage,
pitch: Pitch.Moderate,
speed: Speed.Moderate,
intro: "Hello, I am your virtual assistant!",
onComplete: (character) => {
Debug.Log($"Character created: {character.Name}");
},
onError: (error) => {
Debug.LogError($"Character creation failed: {error.Message}");
}
);
}
}
using UnityEngine;
using LiveTalk.API;
using System.Collections;
public class CharacterSpeech : MonoBehaviour
{
private Character loadedCharacter;
IEnumerator Start()
{
// Initialize API
LiveTalkAPI.Instance.Initialize();
// Load an existing character
string characterId = "your-character-id";
yield return LiveTalkAPI.Instance.LoadCharacterAsync(
characterId,
onComplete: (character) => {
loadedCharacter = character;
Debug.Log($"Character loaded: {character.Name}");
// Make the character speak
StartCoroutine(MakeCharacterSpeak());
},
onError: (error) => {
Debug.LogError($"Character loading failed: {error.Message}");
}
);
}
IEnumerator MakeCharacterSpeak()
{
if (loadedCharacter == null) yield break;
yield return loadedCharacter.SpeakAsync(
text: "Hello! I can speak with realistic lip sync!",
expressionIndex: 0, // Use talk-neutral expression
onComplete: (frameStream, audioClip) => {
// Process the generated frames and audio
StartCoroutine(PlayGeneratedVideo(frameStream, audioClip));
},
onError: (error) => {
Debug.LogError($"Speech generation failed: {error.Message}");
}
);
}
IEnumerator PlayGeneratedVideo(FrameStream frameStream, AudioClip audioClip)
{
// Play the audio
GetComponent<AudioSource>().clip = audioClip;
GetComponent<AudioSource>().Play();
// Process video frames
while (frameStream.HasMoreFrames)
{
var frameAwaiter = frameStream.WaitForNext();
yield return frameAwaiter;
if (frameAwaiter.Texture != null)
{
// Display the frame (e.g., on a RawImage component)
GetComponent<UnityEngine.UI.RawImage>().texture = frameAwaiter.Texture;
}
}
}
}
using UnityEngine;
using LiveTalk.API;
using System.Collections;
using UnityEngine.Video;
public class FacialAnimation : MonoBehaviour
{
[SerializeField] private Texture2D sourceImage;
[SerializeField] private VideoPlayer drivingVideo;
IEnumerator Start()
{
// Initialize API
LiveTalkAPI.Instance.Initialize();
// Generate animated textures using LivePortrait
var animationStream = LiveTalkAPI.Instance.GenerateAnimatedTexturesAsync(
sourceImage,
drivingVideo,
maxFrames: -1 // Process all frames
);
// Process the animated frames
while (animationStream.HasMoreFrames)
{
var frameAwaiter = animationStream.WaitForNext();
yield return frameAwaiter;
if (frameAwaiter.Texture != null)
{
// Display the animated frame
GetComponent<UnityEngine.UI.RawImage>().texture = frameAwaiter.Texture;
}
}
}
}
Characters support 7 built-in expressions, each with its own index:
- 0: talk-neutral (default speaking)
- 1: approve (nodding, positive)
- 2: disapprove (negative reaction)
- 3: smile (happy expression)
- 4: sad (sorrowful expression)
- 5: surprised (shocked reaction)
- 6: confused (puzzled expression)
Characters support two storage formats:
- Character data stored in a
.bundle
directory - Appears as a single file in macOS Finder
- Contains
Info.plist
for proper macOS package metadata - Automatically used on macOS platforms
- Character data stored in a regular directory
- Works on all platforms (Windows, macOS, Linux)
- Used on non-macOS platforms or when explicitly requested
Each character contains:
- character.json: Character configuration (name, gender, pitch, speed, intro)
- image.png: Character portrait image
- drivingFrames/: Expression data for each expression index
- expression-N/: Folder for expression N
- XXXXX.png: Generated driving frames
- latents.bin: Precomputed latent representations
- faces.json: Face detection and processing data
- textures/: Precomputed texture data
- expression-N/: Folder for expression N
- voice/: Voice model and configuration
- sample.wav: Reference voice sample
- voice_config.json: Voice generation parameters
LiveTalkAPI.Instance.Initialize(
LogLevel logLevel = LogLevel.INFO,
bool initializeModelsOnDemand = true,
string characterSaveLocation = "",
string parentModelPath = ""
)
// Create character
IEnumerator CreateCharacterAsync(string name, Gender gender, Texture2D image,
Pitch pitch, Speed speed, string intro, Action<Character> onComplete, Action<Exception> onError)
// Load character
IEnumerator LoadCharacterAsync(string characterId, Action<Character> onComplete, Action<Exception> onError)
// Get available characters
string[] GetAvailableCharacterIds()
string GetCharacterPath(string characterId)
string GetCharacterFormat(string characterId)
bool IsCharacterBundle(string characterId)
bool IsCharacterFolder(string characterId)
// LivePortrait animation
FrameStream GenerateAnimatedTexturesAsync(Texture2D sourceImage, List<Texture2D> drivingFrames)
FrameStream GenerateAnimatedTexturesAsync(Texture2D sourceImage, VideoPlayer videoPlayer, int maxFrames = -1)
FrameStream GenerateAnimatedTexturesAsync(Texture2D sourceImage, string drivingFramesPath, int maxFrames = -1)
// MuseTalk lip sync
FrameStream GenerateTalkingHeadAsync(Texture2D avatarTexture, string talkingHeadFolderPath, AudioClip audioClip)
string Name { get; }
Gender Gender { get; }
Texture2D Image { get; }
Pitch Pitch { get; }
Speed Speed { get; }
string Intro { get; }
bool IsDataLoaded { get; }
// Create character avatar data
IEnumerator CreateAvatarAsync()
IEnumerator CreateAvatarAsync(bool useBundle)
// Make character speak
IEnumerator SpeakAsync(string text, int expressionIndex = 0,
Action<FrameStream, AudioClip> onComplete = null, Action<Exception> onError = null)
int TotalExpectedFrames { get; set; }
bool HasMoreFrames { get; }
FrameAwaiter WaitForNext() // For use in coroutines
bool TryGetNext(out Texture2D texture) // Non-blocking retrieval
VERBOSE
: Detailed debugging informationINFO
: General information messagesWARNING
: Warning messages onlyERROR
: Error messages only
- initializeModelsOnDemand: When
true
(default), models are loaded only when needed for inference, reducing startup time and memory usage. Whenfalse
, all models are loaded immediately during initialization for faster first-time inference.
- Gender:
Male
,Female
- Pitch:
VeryLow
,Low
,Moderate
,High
,VeryHigh
- Speed:
VeryLow
,Low
,Moderate
,High
,VeryHigh
- Unity 6000.0.46f1 or later
- Platforms: macOS (CPU/CoreML), Windows (Not tested)
- Minimum 32GB RAM recommended for character creations
- Storage space for models (~6GB total: ~7GB LiveTalk + ~3GB SparkTTS)
MacBook Pro M4 Max (Onnx with CoreML Execution Provider):
- Speech With LipSync generation - 10-11 FPS
- Character Creation - 10 minutes per character
LivePortrait Pipeline - 4 FPS:
motion_extractor
(FP32): 30-60mswarping_spade
(FP16): 180-250mslandmark_runner
(FP32): 2-3ms
MuseTalk Pipeline - 11-12 FPS:
vae_encoder
(FP16): 20-30msunet
(FP16): 30-40msvae_decoder
(FP16): 30-50ms
This project is licensed under the MIT License, following the licensing of the underlying technologies:
- LivePortrait: Licensed under the MIT License
- MuseTalk: Licensed under the MIT License
- SparkTTS: Licensed under the Apache License 2.0
- Other dependencies: Licensed under their respective open-source licenses
See the LICENSE file for details.
This project incorporates code and models from several open-source projects:
- LivePortrait - Portrait animation technology
- MuseTalk - Real-time lip synchronization
- SparkTTS - Text-to-speech synthesis
- ONNX Runtime - Cross-platform ML inference
Contributions are welcome! Please read our contributing guidelines and submit pull requests for any improvements.
- LivePortrait Team at KwaiVGI for portrait animation technology
- MuseTalk Team at TMElyralab for lip synchronization technology
- SparkTTS Team for text-to-speech synthesis
- ONNX Runtime team for cross-platform ML inference
See CHANGELOG.md for a detailed history of changes.