add: absorb soprano_to_rvc as regular subdirectory

Voice conversion pipeline (Soprano TTS → RVC) with Docker support. Previously tracked as bare gitlink; removed .git/ directories and absorbed into main repo for unified tracking. Includes: Soprano TTS, RVC WebUI integration, Docker configs, WebSocket API, and benchmark scripts. Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index). 287 files (3.1GB of ML weights properly excluded via gitignore).
2026-03-04 00:24:53 +02:00
parent 34b184a05a
commit 8ca716029e
287 changed files with 47102 additions and 0 deletions
--- a/soprano_to_rvc/DOCKER_SETUP.md
+++ b/soprano_to_rvc/DOCKER_SETUP.md
@@ -0,0 +1,527 @@
+# Soprano + RVC Docker Setup
+
+Docker containerization of the dual-GPU Soprano TTS + RVC voice conversion pipeline.
+
+## Architecture
+
+The system uses **two containers** for GPU runtime isolation:
+
+```
+HTTP Request → RVC Container (AMD RX 6800 + ROCm)
+                ↓ ZMQ (tcp://soprano:5555)
+         Soprano Container (NVIDIA GTX 1660 + CUDA)
+                ↓ Audio data
+         RVC Processing
+                ↓
+         HTTP Stream Response
+```
+
+### Why Two Containers?
+
+**CUDA and ROCm runtimes cannot coexist in a single container** due to conflicting driver libraries and system dependencies. The dual-container approach provides:
+
+- Clean GPU runtime separation (CUDA vs ROCm)
+- Independent scaling and resource management
+- Minimal latency overhead (~1-5ms Docker networking, negligible compared to ZMQ serialization)
+
+## Container Details
+
+### Soprano Container (`miku-soprano-tts`)
+- **Base Image**: `nvidia/cuda:11.8.0-runtime-ubuntu22.04`
+- **GPU**: NVIDIA GTX 1660 (CUDA 11.8)
+- **Python**: 3.11
+- **Runtime**: NVIDIA Docker runtime
+- **Port**: 5555 (ZMQ, internal only)
+- **Model**: ekwek1/soprano (installed from source)
+- **Backend**: lmdeploy 0.11.1
+
+### RVC Container (`miku-rvc-api`)
+- **Base Image**: `rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0`
+- **GPU**: AMD RX 6800 (ROCm 6.2)
+- **Python**: 3.10
+- **Runtime**: Native Docker with ROCm device passthrough
+- **Port**: 8765 (HTTP, exposed externally)
+- **Model**: MikuAI_e210_s6300.pth
+- **F0 Method**: rmvpe
+
+## Prerequisites
+
+### Hardware Requirements
+- **CPU**: Multi-core (AMD FX 6100 or better)
+- **GPU 1**: NVIDIA GPU with CUDA support (tested: GTX 1660, 6GB VRAM)
+- **GPU 2**: AMD GPU with ROCm support (tested: RX 6800, 16GB VRAM)
+- **RAM**: 16GB minimum
+- **Disk**: 20GB for containers and models
+
+### Software Requirements
+- Docker Engine 20.10+
+- Docker Compose 1.29+
+- NVIDIA Container Toolkit (for CUDA GPU)
+- ROCm drivers installed on host
+
+### Install NVIDIA Container Toolkit
+```bash
+# Ubuntu/Debian
+distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
+curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
+  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
+
+sudo apt-get update
+sudo apt-get install -y nvidia-container-toolkit
+sudo systemctl restart docker
+```
+
+### Verify GPU Setup
+```bash
+# Test NVIDIA GPU
+docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
+
+# Test AMD GPU
+docker run --rm --device=/dev/kfd --device=/dev/dri rocm/pytorch:latest rocm-smi
+```
+
+## Configuration
+
+### GPU Device IDs
+
+The `docker-compose.yml` uses device IDs to assign GPUs. Verify your GPU order:
+
+```bash
+# NVIDIA GPUs
+nvidia-smi -L
+# Example output:
+# GPU 0: NVIDIA GeForce GTX 1660 (UUID: GPU-...)
+# GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-...)
+
+# AMD GPUs
+rocm-smi --showproductname
+# Example output:
+# GPU[0]: Navi 21 [Radeon RX 6800]
+```
+
+**Update device IDs in `docker-compose.yml`:**
+
+```yaml
+services:
+  soprano:
+    environment:
+      - NVIDIA_VISIBLE_DEVICES=1  # <-- Set your NVIDIA GPU ID
+      
+  rvc:
+    environment:
+      - ROCR_VISIBLE_DEVICES=0  # <-- Set your AMD GPU ID
+```
+
+### Model Files
+
+Ensure the following files exist before building:
+
+```
+soprano_to_rvc/
+├── soprano/                          # Soprano source (git submodule or clone)
+├── Retrieval-based-Voice-Conversion-WebUI/  # RVC WebUI
+├── models/
+│   ├── MikuAI_e210_s6300.pth        # RVC voice model
+│   └── added_IVF512_Flat_nprobe_1_MikuAI_v2.index  # RVC index
+├── soprano_server.py                # Soprano TTS server
+└── soprano_rvc_api.py               # RVC HTTP API
+```
+
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SOPRANO_SERVER` | `tcp://soprano:5555` | ZMQ endpoint for Soprano container |
+| `NVIDIA_VISIBLE_DEVICES` | `1` | NVIDIA GPU device ID |
+| `ROCR_VISIBLE_DEVICES` | `0` | AMD GPU device ID |
+| `HSA_OVERRIDE_GFX_VERSION` | `10.3.0` | ROCm architecture override for RX 6800 |
+
+## Build and Deploy
+
+### Build Containers
+
+```bash
+cd soprano_to_rvc
+
+# Build both containers
+docker-compose build
+
+# Or build individually
+docker build -f Dockerfile.soprano -t miku-soprano:latest .
+docker build -f Dockerfile.rvc -t miku-rvc:latest .
+```
+
+### Start Services
+
+```bash
+# Start in foreground (see logs)
+docker-compose up
+
+# Start in background
+docker-compose up -d
+
+# View logs
+docker-compose logs -f
+
+# View logs for specific service
+docker-compose logs -f soprano
+docker-compose logs -f rvc
+```
+
+### Stop Services
+
+```bash
+# Stop containers
+docker-compose down
+
+# Stop and remove volumes
+docker-compose down -v
+```
+
+## Testing
+
+### Health Checks
+
+Both containers have health checks that run automatically:
+
+```bash
+# Check container health
+docker ps
+
+# Manual health check
+curl http://localhost:8765/health
+
+# Expected response:
+# {
+#   "status": "healthy",
+#   "soprano_connected": true,
+#   "rvc_initialized": true,
+#   "pipeline_ready": true
+# }
+```
+
+### API Test
+
+```bash
+# Test full pipeline (TTS + RVC)
+curl -X POST http://localhost:8765/api/speak \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hello, I am Miku!"}' \
+  -o test_output.wav
+
+# Play the audio
+ffplay test_output.wav
+```
+
+### Test Soprano Only
+
+```bash
+# Test Soprano without RVC
+curl -X POST http://localhost:8765/api/speak_soprano_only \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Testing Soprano TTS"}' \
+  -o soprano_test.wav
+```
+
+### View Status
+
+```bash
+# Pipeline status
+curl http://localhost:8765/api/status
+
+# Returns:
+# {
+#   "initialized": true,
+#   "soprano_connected": true,
+#   "config": { ... },
+#   "timings": { ... }
+# }
+```
+
+## Performance
+
+### Expected Performance (from bare metal testing)
+
+| Metric | Value |
+|--------|-------|
+| Overall Realtime Factor | 0.95x (average) |
+| Peak Performance | 1.12x realtime |
+| Soprano (isolated) | 16.48x realtime |
+| Soprano (via ZMQ) | ~7.10x realtime |
+| RVC Processing | 166-196ms per 200ms block |
+| ZMQ Transfer Overhead | ~0.7s for full audio |
+
+**Performance Targets**:
+- ✅ **Short texts (1-2 sentences)**: 1.00-1.12x realtime
+- ✅ **Medium texts (3-5 sentences)**: 0.93-1.07x realtime
+- ✅ **Long texts (>5 sentences)**: 1.01-1.12x realtime
+
+### Monitor Performance
+
+```bash
+# GPU utilization
+watch -n 1 nvidia-smi  # NVIDIA
+watch -n 1 rocm-smi    # AMD
+
+# Container resource usage
+docker stats miku-soprano-tts miku-rvc-api
+
+# View detailed logs
+docker-compose logs -f --tail=100
+```
+
+## Troubleshooting
+
+### Container Won't Start
+
+**Soprano container fails**:
+```bash
+# Check NVIDIA runtime
+docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
+
+# Check NVIDIA device ID
+nvidia-smi -L
+
+# Rebuild without cache
+docker-compose build --no-cache soprano
+```
+
+**RVC container fails**:
+```bash
+# Check ROCm devices
+ls -la /dev/kfd /dev/dri
+
+# Check ROCm drivers
+rocm-smi
+
+# Check GPU architecture
+rocm-smi --showproductname
+
+# Rebuild without cache
+docker-compose build --no-cache rvc
+```
+
+### Health Check Fails
+
+**Soprano not responding**:
+```bash
+# Check Soprano logs
+docker-compose logs soprano
+
+# Common issues:
+# - Model download in progress (first run takes time)
+# - CUDA out of memory (GTX 1660 has 6GB)
+# - lmdeploy compilation issues
+
+# Restart container
+docker-compose restart soprano
+```
+
+**RVC health check fails**:
+```bash
+# Check RVC logs
+docker-compose logs rvc
+
+# Test health endpoint manually
+docker exec miku-rvc-api curl -f http://localhost:8765/health
+
+# Common issues:
+# - Soprano container not ready (wait for startup)
+# - ZMQ connection timeout
+# - ROCm initialization issues
+```
+
+### Performance Issues
+
+**Slower than expected**:
+```bash
+# Check GPU utilization
+docker exec miku-soprano-tts nvidia-smi
+docker exec miku-rvc-api rocm-smi
+
+# Check CPU usage
+docker stats
+
+# Common causes:
+# - CPU bottleneck (rmvpe f0method is CPU-bound)
+# - Cold start kernel compilation (first 5 jobs slow)
+# - Insufficient GPU memory
+```
+
+**Out of memory**:
+```bash
+# Check VRAM usage
+docker exec miku-soprano-tts nvidia-smi
+docker exec miku-rvc-api rocm-smi --showmeminfo vram
+
+# Solutions:
+# - Reduce batch size in config
+# - Restart containers to clear VRAM
+# - Use smaller model
+```
+
+### Network Issues
+
+**ZMQ connection timeout**:
+```bash
+# Verify network connectivity
+docker exec miku-rvc-api ping soprano
+
+# Test ZMQ port
+docker exec miku-rvc-api nc -zv soprano 5555
+
+# Check network
+docker network inspect miku-voice-network
+```
+
+### Logs and Debugging
+
+```bash
+# Enable debug logs
+# Edit docker-compose.yml, add:
+services:
+  rvc:
+    environment:
+      - LOG_LEVEL=DEBUG
+
+# Restart with debug logs
+docker-compose down
+docker-compose up
+
+# Export logs
+docker-compose logs > debug_logs.txt
+
+# Shell into container
+docker exec -it miku-soprano-tts bash
+docker exec -it miku-rvc-api bash
+```
+
+## Integration with Discord Bot
+
+To integrate with the main Miku Discord bot:
+
+1. **Add to main docker-compose.yml**:
+```yaml
+services:
+  miku-voice:
+    image: miku-rvc:latest
+    container_name: miku-voice-api
+    # ... copy config from soprano_to_rvc/docker-compose.yml
+    
+  miku-soprano:
+    image: miku-soprano:latest
+    # ...
+```
+
+2. **Update bot code** to call voice API:
+```python
+import requests
+
+# In voice channel handler
+response = requests.post(
+    "http://miku-voice-api:8765/api/speak",
+    json={"text": "Hello from Discord!"}
+)
+
+# Stream audio to Discord voice channel
+audio_data = response.content
+# ... send to Discord voice connection
+```
+
+3. **Configure network**:
+```yaml
+networks:
+  miku-network:
+    name: miku-internal
+    driver: bridge
+```
+
+## Maintenance
+
+### Update Soprano
+
+```bash
+# Pull latest Soprano source
+cd soprano
+git pull origin main
+cd ..
+
+# Rebuild container
+docker-compose build --no-cache soprano
+docker-compose up -d soprano
+```
+
+### Update RVC
+
+```bash
+# Update RVC WebUI
+cd Retrieval-based-Voice-Conversion-WebUI
+git pull origin main
+cd ..
+
+# Rebuild container
+docker-compose build --no-cache rvc
+docker-compose up -d rvc
+```
+
+### Clean Up
+
+```bash
+# Remove stopped containers
+docker-compose down
+
+# Remove images
+docker rmi miku-soprano:latest miku-rvc:latest
+
+# Clean Docker system
+docker system prune -a
+```
+
+## Performance Tuning
+
+### Optimize for Speed
+
+Edit `soprano_rvc_config.json`:
+
+```json
+{
+  "block_time": 0.15,        // Smaller blocks (more overhead)
+  "crossfade_time": 0.03,    // Reduce crossfade
+  "f0method": "rmvpe",       // Fast but CPU-bound
+  "extra_time": 1.5          // Reduce context
+}
+```
+
+### Optimize for Quality
+
+```json
+{
+  "block_time": 0.25,        // Larger blocks (better quality)
+  "crossfade_time": 0.08,    // Smoother transitions
+  "f0method": "rmvpe",       // Best quality
+  "extra_time": 2.0          // More context
+}
+```
+
+## Known Issues
+
+1. **First 5 jobs slow**: ROCm MIOpen JIT kernel compilation causes initial slowdown. Subsequent jobs run at full speed.
+
+2. **Cold start latency**: Container startup takes 60-120s for model loading and CUDA/ROCm initialization.
+
+3. **Python path shadowing**: Soprano must be installed in editable mode with path fixes (handled in Dockerfile).
+
+4. **CPU bottleneck**: rmvpe f0method is CPU-bound. Faster GPU methods require kernel compilation and may be slower.
+
+## License
+
+This setup uses:
+- **Soprano TTS**: [ekwek1/soprano](https://github.com/ekwek1/soprano) - Check repo for license
+- **RVC WebUI**: [Retrieval-based-Voice-Conversion-WebUI](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) - MIT License
+
+## Credits
+
+- Soprano TTS by ekwek1
+- RVC (Retrieval-based Voice Conversion) by RVC-Project
+- Implementation and dual-GPU architecture by Miku Discord Bot team