Skip to content

Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift

License

Notifications You must be signed in to change notification settings

yangb05/sherpa-onnx

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This repo is specifically designed for the deployment of multilingual ASR models, while also being compatible with standard monolingual ASR models.

Installation(with CUDA)

If you have installed sherpa-onnx before, please uninstall it first.

git clone git@github.com:yangb05/sherpa-onnx.git
cd sherpa-onnx
mkdir build
cd build

cmake \
  -DSHERPA_ONNX_ENABLE_PYTHON=ON \
  -DBUILD_SHARED_LIBS=ON \
  -DSHERPA_ONNX_ENABLE_CHECK=OFF \
  -DSHERPA_ONNX_ENABLE_PORTAUDIO=OFF \
  -DSHERPA_ONNX_ENABLE_C_API=OFF \
  -DSHERPA_ONNX_ENABLE_WEBSOCKET=OFF \
  -DSHERPA_ONNX_ENABLE_GPU=ON \
  ..

make -j
export PYTHONPATH=$PWD/../sherpa-onnx/python/:$PWD/lib:$PYTHONPATH

To check that sherpa-onnx has been successfully installed, please use:

python3 -c "import sherpa_onnx; print(sherpa_onnx.__file__)"

It should print some output like below:

/Users/fangjun/py38/lib/python3.8/site-packages/sherpa_onnx/__init__.py

If you want to install the CPU version of sherpa-onnx, please refer to this tutorial.

Deployment(with CUDA)

To deploy the model, you first need to export it to the ONNX format. Refer to __ for the export method. Additionally, the corresponding tokens.txt file for the model is required, which is generated during the training of the BPE model. If you want to start a secure WebSocket server, you can run sherpa-onnx/python-api-examples/web/generate-certificate.py to generate the certificate cert.pem.

In general, after deploying the model, It is necessary to test whether the deployment was successful. Therefore, it is recommended to provide an audio file in the corresponding language for testing.

After preparing all the required files, you can start a transducer based streaming ASR service like this:

export CUDA_VISIBLE_DEVICES=6
python sherpa-onnx/python-api-examples/streaming_server.py \
  --encoder encoder-epoch-25-avg-15-chunk-16-left-128.onnx \
  --decoder decoder-epoch-25-avg-15-chunk-16-left-128.onnx \
  --joiner joiner-epoch-25-avg-15-chunk-16-left-128.onnx \
  --tokens tokens.txt \
  --doc-root sherpa-onnx/python-api-examples/web \
  --port 50351 \
  --provider cuda \
  --certificate sherpa-onnx/python-api-examples/web/cert.pem

The started ASR service can be tested like this:

python sherpa-onnx/python-api-examples/online-websocket-client-decode-file-cert.py \
  --server-addr localhost \
  --server-port 50351 \
  --langtag '<VI>' \
  test_audio.wav

<VI> is the langtag for Vietnamese,the langtags for other languages ares:

  • <ZH>, Chinese
  • <EN>, English
  • <VI>, Vietnamese
  • <RU>, Russian
  • <JA>, Japanese
  • <AR>, Arabic
  • <TH>, Thai
  • <ID>, Indonisian

About

Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 46.0%
  • Python 18.7%
  • Kotlin 7.1%
  • CMake 6.0%
  • Shell 4.8%
  • Swift 3.7%
  • Other 13.7%