528 lines
12 KiB
Markdown
528 lines
12 KiB
Markdown
|
|
# Soprano + RVC Docker Setup
|
||
|
|
|
||
|
|
Docker containerization of the dual-GPU Soprano TTS + RVC voice conversion pipeline.
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
The system uses **two containers** for GPU runtime isolation:
|
||
|
|
|
||
|
|
```
|
||
|
|
HTTP Request → RVC Container (AMD RX 6800 + ROCm)
|
||
|
|
↓ ZMQ (tcp://soprano:5555)
|
||
|
|
Soprano Container (NVIDIA GTX 1660 + CUDA)
|
||
|
|
↓ Audio data
|
||
|
|
RVC Processing
|
||
|
|
↓
|
||
|
|
HTTP Stream Response
|
||
|
|
```
|
||
|
|
|
||
|
|
### Why Two Containers?
|
||
|
|
|
||
|
|
**CUDA and ROCm runtimes cannot coexist in a single container** due to conflicting driver libraries and system dependencies. The dual-container approach provides:
|
||
|
|
|
||
|
|
- Clean GPU runtime separation (CUDA vs ROCm)
|
||
|
|
- Independent scaling and resource management
|
||
|
|
- Minimal latency overhead (~1-5ms Docker networking, negligible compared to ZMQ serialization)
|
||
|
|
|
||
|
|
## Container Details
|
||
|
|
|
||
|
|
### Soprano Container (`miku-soprano-tts`)
|
||
|
|
- **Base Image**: `nvidia/cuda:11.8.0-runtime-ubuntu22.04`
|
||
|
|
- **GPU**: NVIDIA GTX 1660 (CUDA 11.8)
|
||
|
|
- **Python**: 3.11
|
||
|
|
- **Runtime**: NVIDIA Docker runtime
|
||
|
|
- **Port**: 5555 (ZMQ, internal only)
|
||
|
|
- **Model**: ekwek1/soprano (installed from source)
|
||
|
|
- **Backend**: lmdeploy 0.11.1
|
||
|
|
|
||
|
|
### RVC Container (`miku-rvc-api`)
|
||
|
|
- **Base Image**: `rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0`
|
||
|
|
- **GPU**: AMD RX 6800 (ROCm 6.2)
|
||
|
|
- **Python**: 3.10
|
||
|
|
- **Runtime**: Native Docker with ROCm device passthrough
|
||
|
|
- **Port**: 8765 (HTTP, exposed externally)
|
||
|
|
- **Model**: MikuAI_e210_s6300.pth
|
||
|
|
- **F0 Method**: rmvpe
|
||
|
|
|
||
|
|
## Prerequisites
|
||
|
|
|
||
|
|
### Hardware Requirements
|
||
|
|
- **CPU**: Multi-core (AMD FX 6100 or better)
|
||
|
|
- **GPU 1**: NVIDIA GPU with CUDA support (tested: GTX 1660, 6GB VRAM)
|
||
|
|
- **GPU 2**: AMD GPU with ROCm support (tested: RX 6800, 16GB VRAM)
|
||
|
|
- **RAM**: 16GB minimum
|
||
|
|
- **Disk**: 20GB for containers and models
|
||
|
|
|
||
|
|
### Software Requirements
|
||
|
|
- Docker Engine 20.10+
|
||
|
|
- Docker Compose 1.29+
|
||
|
|
- NVIDIA Container Toolkit (for CUDA GPU)
|
||
|
|
- ROCm drivers installed on host
|
||
|
|
|
||
|
|
### Install NVIDIA Container Toolkit
|
||
|
|
```bash
|
||
|
|
# Ubuntu/Debian
|
||
|
|
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
|
||
|
|
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
|
||
|
|
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
|
||
|
|
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
|
||
|
|
|
||
|
|
sudo apt-get update
|
||
|
|
sudo apt-get install -y nvidia-container-toolkit
|
||
|
|
sudo systemctl restart docker
|
||
|
|
```
|
||
|
|
|
||
|
|
### Verify GPU Setup
|
||
|
|
```bash
|
||
|
|
# Test NVIDIA GPU
|
||
|
|
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
|
||
|
|
|
||
|
|
# Test AMD GPU
|
||
|
|
docker run --rm --device=/dev/kfd --device=/dev/dri rocm/pytorch:latest rocm-smi
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
### GPU Device IDs
|
||
|
|
|
||
|
|
The `docker-compose.yml` uses device IDs to assign GPUs. Verify your GPU order:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# NVIDIA GPUs
|
||
|
|
nvidia-smi -L
|
||
|
|
# Example output:
|
||
|
|
# GPU 0: NVIDIA GeForce GTX 1660 (UUID: GPU-...)
|
||
|
|
# GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-...)
|
||
|
|
|
||
|
|
# AMD GPUs
|
||
|
|
rocm-smi --showproductname
|
||
|
|
# Example output:
|
||
|
|
# GPU[0]: Navi 21 [Radeon RX 6800]
|
||
|
|
```
|
||
|
|
|
||
|
|
**Update device IDs in `docker-compose.yml`:**
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
services:
|
||
|
|
soprano:
|
||
|
|
environment:
|
||
|
|
- NVIDIA_VISIBLE_DEVICES=1 # <-- Set your NVIDIA GPU ID
|
||
|
|
|
||
|
|
rvc:
|
||
|
|
environment:
|
||
|
|
- ROCR_VISIBLE_DEVICES=0 # <-- Set your AMD GPU ID
|
||
|
|
```
|
||
|
|
|
||
|
|
### Model Files
|
||
|
|
|
||
|
|
Ensure the following files exist before building:
|
||
|
|
|
||
|
|
```
|
||
|
|
soprano_to_rvc/
|
||
|
|
├── soprano/ # Soprano source (git submodule or clone)
|
||
|
|
├── Retrieval-based-Voice-Conversion-WebUI/ # RVC WebUI
|
||
|
|
├── models/
|
||
|
|
│ ├── MikuAI_e210_s6300.pth # RVC voice model
|
||
|
|
│ └── added_IVF512_Flat_nprobe_1_MikuAI_v2.index # RVC index
|
||
|
|
├── soprano_server.py # Soprano TTS server
|
||
|
|
└── soprano_rvc_api.py # RVC HTTP API
|
||
|
|
```
|
||
|
|
|
||
|
|
### Environment Variables
|
||
|
|
|
||
|
|
| Variable | Default | Description |
|
||
|
|
|----------|---------|-------------|
|
||
|
|
| `SOPRANO_SERVER` | `tcp://soprano:5555` | ZMQ endpoint for Soprano container |
|
||
|
|
| `NVIDIA_VISIBLE_DEVICES` | `1` | NVIDIA GPU device ID |
|
||
|
|
| `ROCR_VISIBLE_DEVICES` | `0` | AMD GPU device ID |
|
||
|
|
| `HSA_OVERRIDE_GFX_VERSION` | `10.3.0` | ROCm architecture override for RX 6800 |
|
||
|
|
|
||
|
|
## Build and Deploy
|
||
|
|
|
||
|
|
### Build Containers
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd soprano_to_rvc
|
||
|
|
|
||
|
|
# Build both containers
|
||
|
|
docker-compose build
|
||
|
|
|
||
|
|
# Or build individually
|
||
|
|
docker build -f Dockerfile.soprano -t miku-soprano:latest .
|
||
|
|
docker build -f Dockerfile.rvc -t miku-rvc:latest .
|
||
|
|
```
|
||
|
|
|
||
|
|
### Start Services
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Start in foreground (see logs)
|
||
|
|
docker-compose up
|
||
|
|
|
||
|
|
# Start in background
|
||
|
|
docker-compose up -d
|
||
|
|
|
||
|
|
# View logs
|
||
|
|
docker-compose logs -f
|
||
|
|
|
||
|
|
# View logs for specific service
|
||
|
|
docker-compose logs -f soprano
|
||
|
|
docker-compose logs -f rvc
|
||
|
|
```
|
||
|
|
|
||
|
|
### Stop Services
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Stop containers
|
||
|
|
docker-compose down
|
||
|
|
|
||
|
|
# Stop and remove volumes
|
||
|
|
docker-compose down -v
|
||
|
|
```
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
### Health Checks
|
||
|
|
|
||
|
|
Both containers have health checks that run automatically:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check container health
|
||
|
|
docker ps
|
||
|
|
|
||
|
|
# Manual health check
|
||
|
|
curl http://localhost:8765/health
|
||
|
|
|
||
|
|
# Expected response:
|
||
|
|
# {
|
||
|
|
# "status": "healthy",
|
||
|
|
# "soprano_connected": true,
|
||
|
|
# "rvc_initialized": true,
|
||
|
|
# "pipeline_ready": true
|
||
|
|
# }
|
||
|
|
```
|
||
|
|
|
||
|
|
### API Test
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Test full pipeline (TTS + RVC)
|
||
|
|
curl -X POST http://localhost:8765/api/speak \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{"text": "Hello, I am Miku!"}' \
|
||
|
|
-o test_output.wav
|
||
|
|
|
||
|
|
# Play the audio
|
||
|
|
ffplay test_output.wav
|
||
|
|
```
|
||
|
|
|
||
|
|
### Test Soprano Only
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Test Soprano without RVC
|
||
|
|
curl -X POST http://localhost:8765/api/speak_soprano_only \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{"text": "Testing Soprano TTS"}' \
|
||
|
|
-o soprano_test.wav
|
||
|
|
```
|
||
|
|
|
||
|
|
### View Status
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Pipeline status
|
||
|
|
curl http://localhost:8765/api/status
|
||
|
|
|
||
|
|
# Returns:
|
||
|
|
# {
|
||
|
|
# "initialized": true,
|
||
|
|
# "soprano_connected": true,
|
||
|
|
# "config": { ... },
|
||
|
|
# "timings": { ... }
|
||
|
|
# }
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance
|
||
|
|
|
||
|
|
### Expected Performance (from bare metal testing)
|
||
|
|
|
||
|
|
| Metric | Value |
|
||
|
|
|--------|-------|
|
||
|
|
| Overall Realtime Factor | 0.95x (average) |
|
||
|
|
| Peak Performance | 1.12x realtime |
|
||
|
|
| Soprano (isolated) | 16.48x realtime |
|
||
|
|
| Soprano (via ZMQ) | ~7.10x realtime |
|
||
|
|
| RVC Processing | 166-196ms per 200ms block |
|
||
|
|
| ZMQ Transfer Overhead | ~0.7s for full audio |
|
||
|
|
|
||
|
|
**Performance Targets**:
|
||
|
|
- ✅ **Short texts (1-2 sentences)**: 1.00-1.12x realtime
|
||
|
|
- ✅ **Medium texts (3-5 sentences)**: 0.93-1.07x realtime
|
||
|
|
- ✅ **Long texts (>5 sentences)**: 1.01-1.12x realtime
|
||
|
|
|
||
|
|
### Monitor Performance
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# GPU utilization
|
||
|
|
watch -n 1 nvidia-smi # NVIDIA
|
||
|
|
watch -n 1 rocm-smi # AMD
|
||
|
|
|
||
|
|
# Container resource usage
|
||
|
|
docker stats miku-soprano-tts miku-rvc-api
|
||
|
|
|
||
|
|
# View detailed logs
|
||
|
|
docker-compose logs -f --tail=100
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Container Won't Start
|
||
|
|
|
||
|
|
**Soprano container fails**:
|
||
|
|
```bash
|
||
|
|
# Check NVIDIA runtime
|
||
|
|
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
|
||
|
|
|
||
|
|
# Check NVIDIA device ID
|
||
|
|
nvidia-smi -L
|
||
|
|
|
||
|
|
# Rebuild without cache
|
||
|
|
docker-compose build --no-cache soprano
|
||
|
|
```
|
||
|
|
|
||
|
|
**RVC container fails**:
|
||
|
|
```bash
|
||
|
|
# Check ROCm devices
|
||
|
|
ls -la /dev/kfd /dev/dri
|
||
|
|
|
||
|
|
# Check ROCm drivers
|
||
|
|
rocm-smi
|
||
|
|
|
||
|
|
# Check GPU architecture
|
||
|
|
rocm-smi --showproductname
|
||
|
|
|
||
|
|
# Rebuild without cache
|
||
|
|
docker-compose build --no-cache rvc
|
||
|
|
```
|
||
|
|
|
||
|
|
### Health Check Fails
|
||
|
|
|
||
|
|
**Soprano not responding**:
|
||
|
|
```bash
|
||
|
|
# Check Soprano logs
|
||
|
|
docker-compose logs soprano
|
||
|
|
|
||
|
|
# Common issues:
|
||
|
|
# - Model download in progress (first run takes time)
|
||
|
|
# - CUDA out of memory (GTX 1660 has 6GB)
|
||
|
|
# - lmdeploy compilation issues
|
||
|
|
|
||
|
|
# Restart container
|
||
|
|
docker-compose restart soprano
|
||
|
|
```
|
||
|
|
|
||
|
|
**RVC health check fails**:
|
||
|
|
```bash
|
||
|
|
# Check RVC logs
|
||
|
|
docker-compose logs rvc
|
||
|
|
|
||
|
|
# Test health endpoint manually
|
||
|
|
docker exec miku-rvc-api curl -f http://localhost:8765/health
|
||
|
|
|
||
|
|
# Common issues:
|
||
|
|
# - Soprano container not ready (wait for startup)
|
||
|
|
# - ZMQ connection timeout
|
||
|
|
# - ROCm initialization issues
|
||
|
|
```
|
||
|
|
|
||
|
|
### Performance Issues
|
||
|
|
|
||
|
|
**Slower than expected**:
|
||
|
|
```bash
|
||
|
|
# Check GPU utilization
|
||
|
|
docker exec miku-soprano-tts nvidia-smi
|
||
|
|
docker exec miku-rvc-api rocm-smi
|
||
|
|
|
||
|
|
# Check CPU usage
|
||
|
|
docker stats
|
||
|
|
|
||
|
|
# Common causes:
|
||
|
|
# - CPU bottleneck (rmvpe f0method is CPU-bound)
|
||
|
|
# - Cold start kernel compilation (first 5 jobs slow)
|
||
|
|
# - Insufficient GPU memory
|
||
|
|
```
|
||
|
|
|
||
|
|
**Out of memory**:
|
||
|
|
```bash
|
||
|
|
# Check VRAM usage
|
||
|
|
docker exec miku-soprano-tts nvidia-smi
|
||
|
|
docker exec miku-rvc-api rocm-smi --showmeminfo vram
|
||
|
|
|
||
|
|
# Solutions:
|
||
|
|
# - Reduce batch size in config
|
||
|
|
# - Restart containers to clear VRAM
|
||
|
|
# - Use smaller model
|
||
|
|
```
|
||
|
|
|
||
|
|
### Network Issues
|
||
|
|
|
||
|
|
**ZMQ connection timeout**:
|
||
|
|
```bash
|
||
|
|
# Verify network connectivity
|
||
|
|
docker exec miku-rvc-api ping soprano
|
||
|
|
|
||
|
|
# Test ZMQ port
|
||
|
|
docker exec miku-rvc-api nc -zv soprano 5555
|
||
|
|
|
||
|
|
# Check network
|
||
|
|
docker network inspect miku-voice-network
|
||
|
|
```
|
||
|
|
|
||
|
|
### Logs and Debugging
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Enable debug logs
|
||
|
|
# Edit docker-compose.yml, add:
|
||
|
|
services:
|
||
|
|
rvc:
|
||
|
|
environment:
|
||
|
|
- LOG_LEVEL=DEBUG
|
||
|
|
|
||
|
|
# Restart with debug logs
|
||
|
|
docker-compose down
|
||
|
|
docker-compose up
|
||
|
|
|
||
|
|
# Export logs
|
||
|
|
docker-compose logs > debug_logs.txt
|
||
|
|
|
||
|
|
# Shell into container
|
||
|
|
docker exec -it miku-soprano-tts bash
|
||
|
|
docker exec -it miku-rvc-api bash
|
||
|
|
```
|
||
|
|
|
||
|
|
## Integration with Discord Bot
|
||
|
|
|
||
|
|
To integrate with the main Miku Discord bot:
|
||
|
|
|
||
|
|
1. **Add to main docker-compose.yml**:
|
||
|
|
```yaml
|
||
|
|
services:
|
||
|
|
miku-voice:
|
||
|
|
image: miku-rvc:latest
|
||
|
|
container_name: miku-voice-api
|
||
|
|
# ... copy config from soprano_to_rvc/docker-compose.yml
|
||
|
|
|
||
|
|
miku-soprano:
|
||
|
|
image: miku-soprano:latest
|
||
|
|
# ...
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Update bot code** to call voice API:
|
||
|
|
```python
|
||
|
|
import requests
|
||
|
|
|
||
|
|
# In voice channel handler
|
||
|
|
response = requests.post(
|
||
|
|
"http://miku-voice-api:8765/api/speak",
|
||
|
|
json={"text": "Hello from Discord!"}
|
||
|
|
)
|
||
|
|
|
||
|
|
# Stream audio to Discord voice channel
|
||
|
|
audio_data = response.content
|
||
|
|
# ... send to Discord voice connection
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Configure network**:
|
||
|
|
```yaml
|
||
|
|
networks:
|
||
|
|
miku-network:
|
||
|
|
name: miku-internal
|
||
|
|
driver: bridge
|
||
|
|
```
|
||
|
|
|
||
|
|
## Maintenance
|
||
|
|
|
||
|
|
### Update Soprano
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Pull latest Soprano source
|
||
|
|
cd soprano
|
||
|
|
git pull origin main
|
||
|
|
cd ..
|
||
|
|
|
||
|
|
# Rebuild container
|
||
|
|
docker-compose build --no-cache soprano
|
||
|
|
docker-compose up -d soprano
|
||
|
|
```
|
||
|
|
|
||
|
|
### Update RVC
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Update RVC WebUI
|
||
|
|
cd Retrieval-based-Voice-Conversion-WebUI
|
||
|
|
git pull origin main
|
||
|
|
cd ..
|
||
|
|
|
||
|
|
# Rebuild container
|
||
|
|
docker-compose build --no-cache rvc
|
||
|
|
docker-compose up -d rvc
|
||
|
|
```
|
||
|
|
|
||
|
|
### Clean Up
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Remove stopped containers
|
||
|
|
docker-compose down
|
||
|
|
|
||
|
|
# Remove images
|
||
|
|
docker rmi miku-soprano:latest miku-rvc:latest
|
||
|
|
|
||
|
|
# Clean Docker system
|
||
|
|
docker system prune -a
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance Tuning
|
||
|
|
|
||
|
|
### Optimize for Speed
|
||
|
|
|
||
|
|
Edit `soprano_rvc_config.json`:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"block_time": 0.15, // Smaller blocks (more overhead)
|
||
|
|
"crossfade_time": 0.03, // Reduce crossfade
|
||
|
|
"f0method": "rmvpe", // Fast but CPU-bound
|
||
|
|
"extra_time": 1.5 // Reduce context
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Optimize for Quality
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"block_time": 0.25, // Larger blocks (better quality)
|
||
|
|
"crossfade_time": 0.08, // Smoother transitions
|
||
|
|
"f0method": "rmvpe", // Best quality
|
||
|
|
"extra_time": 2.0 // More context
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Known Issues
|
||
|
|
|
||
|
|
1. **First 5 jobs slow**: ROCm MIOpen JIT kernel compilation causes initial slowdown. Subsequent jobs run at full speed.
|
||
|
|
|
||
|
|
2. **Cold start latency**: Container startup takes 60-120s for model loading and CUDA/ROCm initialization.
|
||
|
|
|
||
|
|
3. **Python path shadowing**: Soprano must be installed in editable mode with path fixes (handled in Dockerfile).
|
||
|
|
|
||
|
|
4. **CPU bottleneck**: rmvpe f0method is CPU-bound. Faster GPU methods require kernel compilation and may be slower.
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
This setup uses:
|
||
|
|
- **Soprano TTS**: [ekwek1/soprano](https://github.com/ekwek1/soprano) - Check repo for license
|
||
|
|
- **RVC WebUI**: [Retrieval-based-Voice-Conversion-WebUI](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) - MIT License
|
||
|
|
|
||
|
|
## Credits
|
||
|
|
|
||
|
|
- Soprano TTS by ekwek1
|
||
|
|
- RVC (Retrieval-based Voice Conversion) by RVC-Project
|
||
|
|
- Implementation and dual-GPU architecture by Miku Discord Bot team
|