Voice conversion pipeline (Soprano TTS → RVC) with Docker support. Previously tracked as bare gitlink; removed .git/ directories and absorbed into main repo for unified tracking. Includes: Soprano TTS, RVC WebUI integration, Docker configs, WebSocket API, and benchmark scripts. Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index). 287 files (3.1GB of ML weights properly excluded via gitignore).
12 KiB
Soprano + RVC Docker Setup
Docker containerization of the dual-GPU Soprano TTS + RVC voice conversion pipeline.
Architecture
The system uses two containers for GPU runtime isolation:
HTTP Request → RVC Container (AMD RX 6800 + ROCm)
↓ ZMQ (tcp://soprano:5555)
Soprano Container (NVIDIA GTX 1660 + CUDA)
↓ Audio data
RVC Processing
↓
HTTP Stream Response
Why Two Containers?
CUDA and ROCm runtimes cannot coexist in a single container due to conflicting driver libraries and system dependencies. The dual-container approach provides:
- Clean GPU runtime separation (CUDA vs ROCm)
- Independent scaling and resource management
- Minimal latency overhead (~1-5ms Docker networking, negligible compared to ZMQ serialization)
Container Details
Soprano Container (miku-soprano-tts)
- Base Image:
nvidia/cuda:11.8.0-runtime-ubuntu22.04 - GPU: NVIDIA GTX 1660 (CUDA 11.8)
- Python: 3.11
- Runtime: NVIDIA Docker runtime
- Port: 5555 (ZMQ, internal only)
- Model: ekwek1/soprano (installed from source)
- Backend: lmdeploy 0.11.1
RVC Container (miku-rvc-api)
- Base Image:
rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0 - GPU: AMD RX 6800 (ROCm 6.2)
- Python: 3.10
- Runtime: Native Docker with ROCm device passthrough
- Port: 8765 (HTTP, exposed externally)
- Model: MikuAI_e210_s6300.pth
- F0 Method: rmvpe
Prerequisites
Hardware Requirements
- CPU: Multi-core (AMD FX 6100 or better)
- GPU 1: NVIDIA GPU with CUDA support (tested: GTX 1660, 6GB VRAM)
- GPU 2: AMD GPU with ROCm support (tested: RX 6800, 16GB VRAM)
- RAM: 16GB minimum
- Disk: 20GB for containers and models
Software Requirements
- Docker Engine 20.10+
- Docker Compose 1.29+
- NVIDIA Container Toolkit (for CUDA GPU)
- ROCm drivers installed on host
Install NVIDIA Container Toolkit
# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Verify GPU Setup
# Test NVIDIA GPU
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
# Test AMD GPU
docker run --rm --device=/dev/kfd --device=/dev/dri rocm/pytorch:latest rocm-smi
Configuration
GPU Device IDs
The docker-compose.yml uses device IDs to assign GPUs. Verify your GPU order:
# NVIDIA GPUs
nvidia-smi -L
# Example output:
# GPU 0: NVIDIA GeForce GTX 1660 (UUID: GPU-...)
# GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-...)
# AMD GPUs
rocm-smi --showproductname
# Example output:
# GPU[0]: Navi 21 [Radeon RX 6800]
Update device IDs in docker-compose.yml:
services:
soprano:
environment:
- NVIDIA_VISIBLE_DEVICES=1 # <-- Set your NVIDIA GPU ID
rvc:
environment:
- ROCR_VISIBLE_DEVICES=0 # <-- Set your AMD GPU ID
Model Files
Ensure the following files exist before building:
soprano_to_rvc/
├── soprano/ # Soprano source (git submodule or clone)
├── Retrieval-based-Voice-Conversion-WebUI/ # RVC WebUI
├── models/
│ ├── MikuAI_e210_s6300.pth # RVC voice model
│ └── added_IVF512_Flat_nprobe_1_MikuAI_v2.index # RVC index
├── soprano_server.py # Soprano TTS server
└── soprano_rvc_api.py # RVC HTTP API
Environment Variables
| Variable | Default | Description |
|---|---|---|
SOPRANO_SERVER |
tcp://soprano:5555 |
ZMQ endpoint for Soprano container |
NVIDIA_VISIBLE_DEVICES |
1 |
NVIDIA GPU device ID |
ROCR_VISIBLE_DEVICES |
0 |
AMD GPU device ID |
HSA_OVERRIDE_GFX_VERSION |
10.3.0 |
ROCm architecture override for RX 6800 |
Build and Deploy
Build Containers
cd soprano_to_rvc
# Build both containers
docker-compose build
# Or build individually
docker build -f Dockerfile.soprano -t miku-soprano:latest .
docker build -f Dockerfile.rvc -t miku-rvc:latest .
Start Services
# Start in foreground (see logs)
docker-compose up
# Start in background
docker-compose up -d
# View logs
docker-compose logs -f
# View logs for specific service
docker-compose logs -f soprano
docker-compose logs -f rvc
Stop Services
# Stop containers
docker-compose down
# Stop and remove volumes
docker-compose down -v
Testing
Health Checks
Both containers have health checks that run automatically:
# Check container health
docker ps
# Manual health check
curl http://localhost:8765/health
# Expected response:
# {
# "status": "healthy",
# "soprano_connected": true,
# "rvc_initialized": true,
# "pipeline_ready": true
# }
API Test
# Test full pipeline (TTS + RVC)
curl -X POST http://localhost:8765/api/speak \
-H "Content-Type: application/json" \
-d '{"text": "Hello, I am Miku!"}' \
-o test_output.wav
# Play the audio
ffplay test_output.wav
Test Soprano Only
# Test Soprano without RVC
curl -X POST http://localhost:8765/api/speak_soprano_only \
-H "Content-Type: application/json" \
-d '{"text": "Testing Soprano TTS"}' \
-o soprano_test.wav
View Status
# Pipeline status
curl http://localhost:8765/api/status
# Returns:
# {
# "initialized": true,
# "soprano_connected": true,
# "config": { ... },
# "timings": { ... }
# }
Performance
Expected Performance (from bare metal testing)
| Metric | Value |
|---|---|
| Overall Realtime Factor | 0.95x (average) |
| Peak Performance | 1.12x realtime |
| Soprano (isolated) | 16.48x realtime |
| Soprano (via ZMQ) | ~7.10x realtime |
| RVC Processing | 166-196ms per 200ms block |
| ZMQ Transfer Overhead | ~0.7s for full audio |
Performance Targets:
- ✅ Short texts (1-2 sentences): 1.00-1.12x realtime
- ✅ Medium texts (3-5 sentences): 0.93-1.07x realtime
- ✅ Long texts (>5 sentences): 1.01-1.12x realtime
Monitor Performance
# GPU utilization
watch -n 1 nvidia-smi # NVIDIA
watch -n 1 rocm-smi # AMD
# Container resource usage
docker stats miku-soprano-tts miku-rvc-api
# View detailed logs
docker-compose logs -f --tail=100
Troubleshooting
Container Won't Start
Soprano container fails:
# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
# Check NVIDIA device ID
nvidia-smi -L
# Rebuild without cache
docker-compose build --no-cache soprano
RVC container fails:
# Check ROCm devices
ls -la /dev/kfd /dev/dri
# Check ROCm drivers
rocm-smi
# Check GPU architecture
rocm-smi --showproductname
# Rebuild without cache
docker-compose build --no-cache rvc
Health Check Fails
Soprano not responding:
# Check Soprano logs
docker-compose logs soprano
# Common issues:
# - Model download in progress (first run takes time)
# - CUDA out of memory (GTX 1660 has 6GB)
# - lmdeploy compilation issues
# Restart container
docker-compose restart soprano
RVC health check fails:
# Check RVC logs
docker-compose logs rvc
# Test health endpoint manually
docker exec miku-rvc-api curl -f http://localhost:8765/health
# Common issues:
# - Soprano container not ready (wait for startup)
# - ZMQ connection timeout
# - ROCm initialization issues
Performance Issues
Slower than expected:
# Check GPU utilization
docker exec miku-soprano-tts nvidia-smi
docker exec miku-rvc-api rocm-smi
# Check CPU usage
docker stats
# Common causes:
# - CPU bottleneck (rmvpe f0method is CPU-bound)
# - Cold start kernel compilation (first 5 jobs slow)
# - Insufficient GPU memory
Out of memory:
# Check VRAM usage
docker exec miku-soprano-tts nvidia-smi
docker exec miku-rvc-api rocm-smi --showmeminfo vram
# Solutions:
# - Reduce batch size in config
# - Restart containers to clear VRAM
# - Use smaller model
Network Issues
ZMQ connection timeout:
# Verify network connectivity
docker exec miku-rvc-api ping soprano
# Test ZMQ port
docker exec miku-rvc-api nc -zv soprano 5555
# Check network
docker network inspect miku-voice-network
Logs and Debugging
# Enable debug logs
# Edit docker-compose.yml, add:
services:
rvc:
environment:
- LOG_LEVEL=DEBUG
# Restart with debug logs
docker-compose down
docker-compose up
# Export logs
docker-compose logs > debug_logs.txt
# Shell into container
docker exec -it miku-soprano-tts bash
docker exec -it miku-rvc-api bash
Integration with Discord Bot
To integrate with the main Miku Discord bot:
- Add to main docker-compose.yml:
services:
miku-voice:
image: miku-rvc:latest
container_name: miku-voice-api
# ... copy config from soprano_to_rvc/docker-compose.yml
miku-soprano:
image: miku-soprano:latest
# ...
- Update bot code to call voice API:
import requests
# In voice channel handler
response = requests.post(
"http://miku-voice-api:8765/api/speak",
json={"text": "Hello from Discord!"}
)
# Stream audio to Discord voice channel
audio_data = response.content
# ... send to Discord voice connection
- Configure network:
networks:
miku-network:
name: miku-internal
driver: bridge
Maintenance
Update Soprano
# Pull latest Soprano source
cd soprano
git pull origin main
cd ..
# Rebuild container
docker-compose build --no-cache soprano
docker-compose up -d soprano
Update RVC
# Update RVC WebUI
cd Retrieval-based-Voice-Conversion-WebUI
git pull origin main
cd ..
# Rebuild container
docker-compose build --no-cache rvc
docker-compose up -d rvc
Clean Up
# Remove stopped containers
docker-compose down
# Remove images
docker rmi miku-soprano:latest miku-rvc:latest
# Clean Docker system
docker system prune -a
Performance Tuning
Optimize for Speed
Edit soprano_rvc_config.json:
{
"block_time": 0.15, // Smaller blocks (more overhead)
"crossfade_time": 0.03, // Reduce crossfade
"f0method": "rmvpe", // Fast but CPU-bound
"extra_time": 1.5 // Reduce context
}
Optimize for Quality
{
"block_time": 0.25, // Larger blocks (better quality)
"crossfade_time": 0.08, // Smoother transitions
"f0method": "rmvpe", // Best quality
"extra_time": 2.0 // More context
}
Known Issues
-
First 5 jobs slow: ROCm MIOpen JIT kernel compilation causes initial slowdown. Subsequent jobs run at full speed.
-
Cold start latency: Container startup takes 60-120s for model loading and CUDA/ROCm initialization.
-
Python path shadowing: Soprano must be installed in editable mode with path fixes (handled in Dockerfile).
-
CPU bottleneck: rmvpe f0method is CPU-bound. Faster GPU methods require kernel compilation and may be slower.
License
This setup uses:
- Soprano TTS: ekwek1/soprano - Check repo for license
- RVC WebUI: Retrieval-based-Voice-Conversion-WebUI - MIT License
Credits
- Soprano TTS by ekwek1
- RVC (Retrieval-based Voice Conversion) by RVC-Project
- Implementation and dual-GPU architecture by Miku Discord Bot team