Decided on Parakeet ONNX Runtime. Works pretty great. Realtime voice chat possible now. UX lacking.
This commit is contained in:
280
stt-parakeet/README.md
Normal file
280
stt-parakeet/README.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# Parakeet ASR with ONNX Runtime
|
||||
|
||||
Real-time Automatic Speech Recognition (ASR) system using NVIDIA's Parakeet TDT 0.6B V3 model via the `onnx-asr` library, optimized for NVIDIA GPUs (GTX 1660 and better).
|
||||
|
||||
## Features
|
||||
|
||||
- ✅ **ONNX Runtime with GPU acceleration** (CUDA/TensorRT support)
|
||||
- ✅ **Parakeet TDT 0.6B V3** multilingual model from Hugging Face
|
||||
- ✅ **Real-time streaming** via WebSocket server
|
||||
- ✅ **Voice Activity Detection** (Silero VAD)
|
||||
- ✅ **Microphone client** for live transcription
|
||||
- ✅ **Offline transcription** from audio files
|
||||
- ✅ **Quantization support** (int8, fp16) for faster inference
|
||||
|
||||
## Model Information
|
||||
|
||||
This implementation uses:
|
||||
- **Model**: `nemo-parakeet-tdt-0.6b-v3` (Multilingual)
|
||||
- **Source**: https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx
|
||||
- **Library**: https://github.com/istupakov/onnx-asr
|
||||
- **Original Model**: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3
|
||||
|
||||
## System Requirements
|
||||
|
||||
- **GPU**: NVIDIA GPU with CUDA support (tested on GTX 1660)
|
||||
- **CUDA**: Version 11.8 or 12.x
|
||||
- **Python**: 3.10 or higher
|
||||
- **Memory**: At least 4GB GPU memory recommended
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Clone the repository
|
||||
|
||||
```bash
|
||||
cd /home/koko210Serve/parakeet-test
|
||||
```
|
||||
|
||||
### 2. Create virtual environment
|
||||
|
||||
```bash
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
### 3. Install CUDA dependencies
|
||||
|
||||
Make sure you have CUDA installed. For Ubuntu:
|
||||
|
||||
```bash
|
||||
# Check CUDA version
|
||||
nvcc --version
|
||||
|
||||
# If you need to install CUDA, follow NVIDIA's instructions:
|
||||
# https://developer.nvidia.com/cuda-downloads
|
||||
```
|
||||
|
||||
### 4. Install Python dependencies
|
||||
|
||||
```bash
|
||||
pip install --upgrade pip
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Or manually:
|
||||
|
||||
```bash
|
||||
# With GPU support (recommended)
|
||||
pip install onnx-asr[gpu,hub]
|
||||
|
||||
# Additional dependencies
|
||||
pip install numpy<2.0 websockets sounddevice soundfile
|
||||
```
|
||||
|
||||
### 5. Verify CUDA availability
|
||||
|
||||
```bash
|
||||
python3 -c "import onnxruntime as ort; print('Available providers:', ort.get_available_providers())"
|
||||
```
|
||||
|
||||
You should see `CUDAExecutionProvider` in the list.
|
||||
|
||||
## Usage
|
||||
|
||||
### Test Offline Transcription
|
||||
|
||||
Transcribe an audio file:
|
||||
|
||||
```bash
|
||||
python3 tools/test_offline.py test.wav
|
||||
```
|
||||
|
||||
With VAD (for long audio files):
|
||||
|
||||
```bash
|
||||
python3 tools/test_offline.py test.wav --use-vad
|
||||
```
|
||||
|
||||
With quantization (faster, less memory):
|
||||
|
||||
```bash
|
||||
python3 tools/test_offline.py test.wav --quantization int8
|
||||
```
|
||||
|
||||
### Start WebSocket Server
|
||||
|
||||
Start the ASR server:
|
||||
|
||||
```bash
|
||||
python3 server/ws_server.py
|
||||
```
|
||||
|
||||
With options:
|
||||
|
||||
```bash
|
||||
python3 server/ws_server.py --host 0.0.0.0 --port 8765 --use-vad
|
||||
```
|
||||
|
||||
### Start Microphone Client
|
||||
|
||||
In a separate terminal, start the microphone client:
|
||||
|
||||
```bash
|
||||
python3 client/mic_stream.py
|
||||
```
|
||||
|
||||
List available audio devices:
|
||||
|
||||
```bash
|
||||
python3 client/mic_stream.py --list-devices
|
||||
```
|
||||
|
||||
Connect to a specific device:
|
||||
|
||||
```bash
|
||||
python3 client/mic_stream.py --device 0
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
parakeet-test/
|
||||
├── asr/
|
||||
│ ├── __init__.py
|
||||
│ └── asr_pipeline.py # Main ASR pipeline using onnx-asr
|
||||
├── client/
|
||||
│ ├── __init__.py
|
||||
│ └── mic_stream.py # Microphone streaming client
|
||||
├── server/
|
||||
│ ├── __init__.py
|
||||
│ └── ws_server.py # WebSocket server for streaming ASR
|
||||
├── vad/
|
||||
│ ├── __init__.py
|
||||
│ └── silero_vad.py # VAD wrapper using onnx-asr
|
||||
├── tools/
|
||||
│ ├── test_offline.py # Test offline transcription
|
||||
│ └── diagnose.py # System diagnostics
|
||||
├── models/
|
||||
│ └── parakeet/ # Model files (auto-downloaded)
|
||||
├── requirements.txt # Python dependencies
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
## Model Files
|
||||
|
||||
The model files will be automatically downloaded from Hugging Face on first run to:
|
||||
```
|
||||
models/parakeet/
|
||||
├── config.json
|
||||
├── encoder-parakeet-tdt-0.6b-v3.onnx
|
||||
├── decoder_joint-parakeet-tdt-0.6b-v3.onnx
|
||||
└── vocab.txt
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### GPU Settings
|
||||
|
||||
The ASR pipeline is configured to use CUDA by default. You can customize the execution providers in `asr/asr_pipeline.py`:
|
||||
|
||||
```python
|
||||
providers = [
|
||||
(
|
||||
"CUDAExecutionProvider",
|
||||
{
|
||||
"device_id": 0,
|
||||
"arena_extend_strategy": "kNextPowerOfTwo",
|
||||
"gpu_mem_limit": 6 * 1024 * 1024 * 1024, # 6GB
|
||||
"cudnn_conv_algo_search": "EXHAUSTIVE",
|
||||
"do_copy_in_default_stream": True,
|
||||
}
|
||||
),
|
||||
"CPUExecutionProvider",
|
||||
]
|
||||
```
|
||||
|
||||
### TensorRT (Optional - Faster Inference)
|
||||
|
||||
For even better performance, you can use TensorRT:
|
||||
|
||||
```bash
|
||||
pip install tensorrt tensorrt-cu12-libs
|
||||
```
|
||||
|
||||
Then modify the providers:
|
||||
|
||||
```python
|
||||
providers = [
|
||||
(
|
||||
"TensorrtExecutionProvider",
|
||||
{
|
||||
"trt_max_workspace_size": 6 * 1024**3,
|
||||
"trt_fp16_enable": True,
|
||||
},
|
||||
)
|
||||
]
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### CUDA Not Available
|
||||
|
||||
If CUDA is not detected:
|
||||
|
||||
1. Check CUDA installation: `nvcc --version`
|
||||
2. Verify GPU: `nvidia-smi`
|
||||
3. Reinstall onnxruntime-gpu:
|
||||
```bash
|
||||
pip uninstall onnxruntime onnxruntime-gpu
|
||||
pip install onnxruntime-gpu
|
||||
```
|
||||
|
||||
### Memory Issues
|
||||
|
||||
If you run out of GPU memory:
|
||||
|
||||
1. Use quantization: `--quantization int8`
|
||||
2. Reduce `gpu_mem_limit` in the configuration
|
||||
3. Close other GPU-using applications
|
||||
|
||||
### Audio Issues
|
||||
|
||||
If microphone is not working:
|
||||
|
||||
1. List devices: `python3 client/mic_stream.py --list-devices`
|
||||
2. Select the correct device: `--device <id>`
|
||||
3. Check permissions: `sudo usermod -a -G audio $USER` (then logout/login)
|
||||
|
||||
### Slow Performance
|
||||
|
||||
1. Ensure GPU is being used (check logs for "CUDAExecutionProvider")
|
||||
2. Try quantization for faster inference
|
||||
3. Consider using TensorRT provider
|
||||
4. Check GPU utilization: `nvidia-smi`
|
||||
|
||||
## Performance
|
||||
|
||||
Expected performance on GTX 1660 (6GB):
|
||||
|
||||
- **Offline transcription**: ~50-100x realtime (depending on audio length)
|
||||
- **Streaming**: <100ms latency
|
||||
- **Memory usage**: ~2-3GB GPU memory
|
||||
- **Quantized (int8)**: ~30% faster, ~50% less memory
|
||||
|
||||
## License
|
||||
|
||||
This project uses:
|
||||
- `onnx-asr`: MIT License
|
||||
- Parakeet model: CC-BY-4.0 License
|
||||
|
||||
## References
|
||||
|
||||
- [onnx-asr GitHub](https://github.com/istupakov/onnx-asr)
|
||||
- [Parakeet TDT 0.6B V3 ONNX](https://huggingface.co/istupakov/parakeet-tdt-0.6b-v3-onnx)
|
||||
- [NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)
|
||||
- [ONNX Runtime](https://onnxruntime.ai/)
|
||||
|
||||
## Credits
|
||||
|
||||
- Model conversion by [istupakov](https://github.com/istupakov)
|
||||
- Original Parakeet model by NVIDIA
|
||||
Reference in New Issue
Block a user