# Soprano + RVC Docker Setup Docker containerization of the dual-GPU Soprano TTS + RVC voice conversion pipeline. ## Architecture The system uses **two containers** for GPU runtime isolation: ``` HTTP Request → RVC Container (AMD RX 6800 + ROCm) ↓ ZMQ (tcp://soprano:5555) Soprano Container (NVIDIA GTX 1660 + CUDA) ↓ Audio data RVC Processing ↓ HTTP Stream Response ``` ### Why Two Containers? **CUDA and ROCm runtimes cannot coexist in a single container** due to conflicting driver libraries and system dependencies. The dual-container approach provides: - Clean GPU runtime separation (CUDA vs ROCm) - Independent scaling and resource management - Minimal latency overhead (~1-5ms Docker networking, negligible compared to ZMQ serialization) ## Container Details ### Soprano Container (`miku-soprano-tts`) - **Base Image**: `nvidia/cuda:11.8.0-runtime-ubuntu22.04` - **GPU**: NVIDIA GTX 1660 (CUDA 11.8) - **Python**: 3.11 - **Runtime**: NVIDIA Docker runtime - **Port**: 5555 (ZMQ, internal only) - **Model**: ekwek1/soprano (installed from source) - **Backend**: lmdeploy 0.11.1 ### RVC Container (`miku-rvc-api`) - **Base Image**: `rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0` - **GPU**: AMD RX 6800 (ROCm 6.2) - **Python**: 3.10 - **Runtime**: Native Docker with ROCm device passthrough - **Port**: 8765 (HTTP, exposed externally) - **Model**: MikuAI_e210_s6300.pth - **F0 Method**: rmvpe ## Prerequisites ### Hardware Requirements - **CPU**: Multi-core (AMD FX 6100 or better) - **GPU 1**: NVIDIA GPU with CUDA support (tested: GTX 1660, 6GB VRAM) - **GPU 2**: AMD GPU with ROCm support (tested: RX 6800, 16GB VRAM) - **RAM**: 16GB minimum - **Disk**: 20GB for containers and models ### Software Requirements - Docker Engine 20.10+ - Docker Compose 1.29+ - NVIDIA Container Toolkit (for CUDA GPU) - ROCm drivers installed on host ### Install NVIDIA Container Toolkit ```bash # Ubuntu/Debian distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker ``` ### Verify GPU Setup ```bash # Test NVIDIA GPU docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi # Test AMD GPU docker run --rm --device=/dev/kfd --device=/dev/dri rocm/pytorch:latest rocm-smi ``` ## Configuration ### GPU Device IDs The `docker-compose.yml` uses device IDs to assign GPUs. Verify your GPU order: ```bash # NVIDIA GPUs nvidia-smi -L # Example output: # GPU 0: NVIDIA GeForce GTX 1660 (UUID: GPU-...) # GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-...) # AMD GPUs rocm-smi --showproductname # Example output: # GPU[0]: Navi 21 [Radeon RX 6800] ``` **Update device IDs in `docker-compose.yml`:** ```yaml services: soprano: environment: - NVIDIA_VISIBLE_DEVICES=1 # <-- Set your NVIDIA GPU ID rvc: environment: - ROCR_VISIBLE_DEVICES=0 # <-- Set your AMD GPU ID ``` ### Model Files Ensure the following files exist before building: ``` soprano_to_rvc/ ├── soprano/ # Soprano source (git submodule or clone) ├── Retrieval-based-Voice-Conversion-WebUI/ # RVC WebUI ├── models/ │ ├── MikuAI_e210_s6300.pth # RVC voice model │ └── added_IVF512_Flat_nprobe_1_MikuAI_v2.index # RVC index ├── soprano_server.py # Soprano TTS server └── soprano_rvc_api.py # RVC HTTP API ``` ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `SOPRANO_SERVER` | `tcp://soprano:5555` | ZMQ endpoint for Soprano container | | `NVIDIA_VISIBLE_DEVICES` | `1` | NVIDIA GPU device ID | | `ROCR_VISIBLE_DEVICES` | `0` | AMD GPU device ID | | `HSA_OVERRIDE_GFX_VERSION` | `10.3.0` | ROCm architecture override for RX 6800 | ## Build and Deploy ### Build Containers ```bash cd soprano_to_rvc # Build both containers docker-compose build # Or build individually docker build -f Dockerfile.soprano -t miku-soprano:latest . docker build -f Dockerfile.rvc -t miku-rvc:latest . ``` ### Start Services ```bash # Start in foreground (see logs) docker-compose up # Start in background docker-compose up -d # View logs docker-compose logs -f # View logs for specific service docker-compose logs -f soprano docker-compose logs -f rvc ``` ### Stop Services ```bash # Stop containers docker-compose down # Stop and remove volumes docker-compose down -v ``` ## Testing ### Health Checks Both containers have health checks that run automatically: ```bash # Check container health docker ps # Manual health check curl http://localhost:8765/health # Expected response: # { # "status": "healthy", # "soprano_connected": true, # "rvc_initialized": true, # "pipeline_ready": true # } ``` ### API Test ```bash # Test full pipeline (TTS + RVC) curl -X POST http://localhost:8765/api/speak \ -H "Content-Type: application/json" \ -d '{"text": "Hello, I am Miku!"}' \ -o test_output.wav # Play the audio ffplay test_output.wav ``` ### Test Soprano Only ```bash # Test Soprano without RVC curl -X POST http://localhost:8765/api/speak_soprano_only \ -H "Content-Type: application/json" \ -d '{"text": "Testing Soprano TTS"}' \ -o soprano_test.wav ``` ### View Status ```bash # Pipeline status curl http://localhost:8765/api/status # Returns: # { # "initialized": true, # "soprano_connected": true, # "config": { ... }, # "timings": { ... } # } ``` ## Performance ### Expected Performance (from bare metal testing) | Metric | Value | |--------|-------| | Overall Realtime Factor | 0.95x (average) | | Peak Performance | 1.12x realtime | | Soprano (isolated) | 16.48x realtime | | Soprano (via ZMQ) | ~7.10x realtime | | RVC Processing | 166-196ms per 200ms block | | ZMQ Transfer Overhead | ~0.7s for full audio | **Performance Targets**: - ✅ **Short texts (1-2 sentences)**: 1.00-1.12x realtime - ✅ **Medium texts (3-5 sentences)**: 0.93-1.07x realtime - ✅ **Long texts (>5 sentences)**: 1.01-1.12x realtime ### Monitor Performance ```bash # GPU utilization watch -n 1 nvidia-smi # NVIDIA watch -n 1 rocm-smi # AMD # Container resource usage docker stats miku-soprano-tts miku-rvc-api # View detailed logs docker-compose logs -f --tail=100 ``` ## Troubleshooting ### Container Won't Start **Soprano container fails**: ```bash # Check NVIDIA runtime docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi # Check NVIDIA device ID nvidia-smi -L # Rebuild without cache docker-compose build --no-cache soprano ``` **RVC container fails**: ```bash # Check ROCm devices ls -la /dev/kfd /dev/dri # Check ROCm drivers rocm-smi # Check GPU architecture rocm-smi --showproductname # Rebuild without cache docker-compose build --no-cache rvc ``` ### Health Check Fails **Soprano not responding**: ```bash # Check Soprano logs docker-compose logs soprano # Common issues: # - Model download in progress (first run takes time) # - CUDA out of memory (GTX 1660 has 6GB) # - lmdeploy compilation issues # Restart container docker-compose restart soprano ``` **RVC health check fails**: ```bash # Check RVC logs docker-compose logs rvc # Test health endpoint manually docker exec miku-rvc-api curl -f http://localhost:8765/health # Common issues: # - Soprano container not ready (wait for startup) # - ZMQ connection timeout # - ROCm initialization issues ``` ### Performance Issues **Slower than expected**: ```bash # Check GPU utilization docker exec miku-soprano-tts nvidia-smi docker exec miku-rvc-api rocm-smi # Check CPU usage docker stats # Common causes: # - CPU bottleneck (rmvpe f0method is CPU-bound) # - Cold start kernel compilation (first 5 jobs slow) # - Insufficient GPU memory ``` **Out of memory**: ```bash # Check VRAM usage docker exec miku-soprano-tts nvidia-smi docker exec miku-rvc-api rocm-smi --showmeminfo vram # Solutions: # - Reduce batch size in config # - Restart containers to clear VRAM # - Use smaller model ``` ### Network Issues **ZMQ connection timeout**: ```bash # Verify network connectivity docker exec miku-rvc-api ping soprano # Test ZMQ port docker exec miku-rvc-api nc -zv soprano 5555 # Check network docker network inspect miku-voice-network ``` ### Logs and Debugging ```bash # Enable debug logs # Edit docker-compose.yml, add: services: rvc: environment: - LOG_LEVEL=DEBUG # Restart with debug logs docker-compose down docker-compose up # Export logs docker-compose logs > debug_logs.txt # Shell into container docker exec -it miku-soprano-tts bash docker exec -it miku-rvc-api bash ``` ## Integration with Discord Bot To integrate with the main Miku Discord bot: 1. **Add to main docker-compose.yml**: ```yaml services: miku-voice: image: miku-rvc:latest container_name: miku-voice-api # ... copy config from soprano_to_rvc/docker-compose.yml miku-soprano: image: miku-soprano:latest # ... ``` 2. **Update bot code** to call voice API: ```python import requests # In voice channel handler response = requests.post( "http://miku-voice-api:8765/api/speak", json={"text": "Hello from Discord!"} ) # Stream audio to Discord voice channel audio_data = response.content # ... send to Discord voice connection ``` 3. **Configure network**: ```yaml networks: miku-network: name: miku-internal driver: bridge ``` ## Maintenance ### Update Soprano ```bash # Pull latest Soprano source cd soprano git pull origin main cd .. # Rebuild container docker-compose build --no-cache soprano docker-compose up -d soprano ``` ### Update RVC ```bash # Update RVC WebUI cd Retrieval-based-Voice-Conversion-WebUI git pull origin main cd .. # Rebuild container docker-compose build --no-cache rvc docker-compose up -d rvc ``` ### Clean Up ```bash # Remove stopped containers docker-compose down # Remove images docker rmi miku-soprano:latest miku-rvc:latest # Clean Docker system docker system prune -a ``` ## Performance Tuning ### Optimize for Speed Edit `soprano_rvc_config.json`: ```json { "block_time": 0.15, // Smaller blocks (more overhead) "crossfade_time": 0.03, // Reduce crossfade "f0method": "rmvpe", // Fast but CPU-bound "extra_time": 1.5 // Reduce context } ``` ### Optimize for Quality ```json { "block_time": 0.25, // Larger blocks (better quality) "crossfade_time": 0.08, // Smoother transitions "f0method": "rmvpe", // Best quality "extra_time": 2.0 // More context } ``` ## Known Issues 1. **First 5 jobs slow**: ROCm MIOpen JIT kernel compilation causes initial slowdown. Subsequent jobs run at full speed. 2. **Cold start latency**: Container startup takes 60-120s for model loading and CUDA/ROCm initialization. 3. **Python path shadowing**: Soprano must be installed in editable mode with path fixes (handled in Dockerfile). 4. **CPU bottleneck**: rmvpe f0method is CPU-bound. Faster GPU methods require kernel compilation and may be slower. ## License This setup uses: - **Soprano TTS**: [ekwek1/soprano](https://github.com/ekwek1/soprano) - Check repo for license - **RVC WebUI**: [Retrieval-based-Voice-Conversion-WebUI](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) - MIT License ## Credits - Soprano TTS by ekwek1 - RVC (Retrieval-based Voice Conversion) by RVC-Project - Implementation and dual-GPU architecture by Miku Discord Bot team