add: absorb soprano_to_rvc as regular subdirectory

Voice conversion pipeline (Soprano TTS → RVC) with Docker support. Previously tracked as bare gitlink; removed .git/ directories and absorbed into main repo for unified tracking. Includes: Soprano TTS, RVC WebUI integration, Docker configs, WebSocket API, and benchmark scripts. Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index). 287 files (3.1GB of ML weights properly excluded via gitignore).
2026-03-04 00:24:53 +02:00
parent 34b184a05a
commit 8ca716029e
287 changed files with 47102 additions and 0 deletions
--- a/soprano_to_rvc/DOCKER_QUICK_REF.md
+++ b/soprano_to_rvc/DOCKER_QUICK_REF.md
@@ -0,0 +1,200 @@
+# Docker Quick Reference
+
+## Quick Commands
+
+```bash
+# Build containers
+./build_docker.sh
+
+# Start services (with auto-wait for ready)
+./start_docker.sh
+
+# Start manually
+docker-compose up -d
+
+# Stop services
+docker-compose down
+
+# View logs
+docker-compose logs -f
+
+# Restart a service
+docker-compose restart soprano
+docker-compose restart rvc
+
+# Rebuild and restart
+docker-compose up -d --build
+```
+
+## Health & Status
+
+```bash
+# Health check
+curl http://localhost:8765/health
+
+# Pipeline status
+curl http://localhost:8765/api/status
+
+# Container status
+docker-compose ps
+
+# Resource usage
+docker stats miku-soprano-tts miku-rvc-api
+```
+
+## Testing
+
+```bash
+# Test full pipeline (TTS + RVC)
+curl -X POST http://localhost:8765/api/speak \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hello, I am Miku!"}' \
+  -o test.wav && ffplay test.wav
+
+# Test Soprano only
+curl -X POST http://localhost:8765/api/speak_soprano_only \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Testing Soprano"}' \
+  -o soprano.wav && ffplay soprano.wav
+```
+
+## Debugging
+
+```bash
+# View logs
+docker-compose logs soprano  # Soprano TTS logs
+docker-compose logs rvc      # RVC API logs
+docker-compose logs -f       # Follow all logs
+
+# Shell into container
+docker exec -it miku-soprano-tts bash
+docker exec -it miku-rvc-api bash
+
+# Check GPU usage
+docker exec miku-soprano-tts nvidia-smi
+docker exec miku-rvc-api rocm-smi
+
+# Test ZMQ connection from RVC container
+docker exec miku-rvc-api python3 -c "
+import zmq
+ctx = zmq.Context()
+sock = ctx.socket(zmq.REQ)
+sock.connect('tcp://soprano:5555')
+print('Connected to Soprano!')
+sock.close()
+"
+```
+
+## Configuration
+
+Edit `docker-compose.yml` to change GPU device IDs:
+
+```yaml
+services:
+  soprano:
+    environment:
+      - NVIDIA_VISIBLE_DEVICES=1  # Your NVIDIA GPU ID
+      
+  rvc:
+    environment:
+      - ROCR_VISIBLE_DEVICES=0    # Your AMD GPU ID
+```
+
+## Performance Tips
+
+- **First run is slow**: ROCm kernel compilation takes time (first 5 jobs)
+- **Warmup helps**: First few jobs may be slower, then speeds up
+- **Monitor VRAM**: 
+  - Soprano needs ~4GB (GTX 1660 has 6GB)
+  - RVC needs ~8GB (RX 6800 has 16GB)
+- **CPU bottleneck**: rmvpe f0method is CPU-bound (~50% load on FX 6100)
+
+## Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| Container won't start | Check `docker-compose logs <service>` |
+| GPU not detected | Verify device IDs with `nvidia-smi -L` and `rocm-smi` |
+| Health check fails | Wait 60-120s for model loading |
+| ZMQ timeout | Check network: `docker network inspect miku-voice-network` |
+| Out of memory | Restart containers or reduce batch size |
+| Slow performance | Check GPU usage with `nvidia-smi` / `rocm-smi` |
+
+## File Structure
+
+```
+soprano_to_rvc/
+├── docker-compose.yml           # Container orchestration
+├── Dockerfile.soprano           # Soprano container (CUDA)
+├── Dockerfile.rvc               # RVC container (ROCm)
+├── build_docker.sh             # Build script
+├── start_docker.sh             # Start script with health check
+├── soprano_server.py           # Soprano TTS server
+├── soprano_rvc_api.py          # RVC HTTP API
+├── soprano_rvc_config.json     # Pipeline configuration
+├── soprano/                    # Soprano source code
+├── Retrieval-based-Voice-Conversion-WebUI/  # RVC WebUI
+└── models/                     # Voice models
+    ├── MikuAI_e210_s6300.pth
+    └── added_IVF512_Flat_nprobe_1_MikuAI_v2.index
+```
+
+## Architecture Diagram
+
+```
+┌─────────────────────────────────────────────────────┐
+│ Client (HTTP POST /api/speak)                       │
+└────────────────────┬────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────┐
+│ RVC Container (AMD RX 6800 + ROCm)                  │
+│ - soprano_rvc_api.py                                │
+│ - Port: 8765 (HTTP)                                 │
+│ - Python 3.10                                       │
+└────────────────────┬────────────────────────────────┘
+                     │ ZMQ (tcp://soprano:5555)
+                     ▼
+┌─────────────────────────────────────────────────────┐
+│ Soprano Container (NVIDIA GTX 1660 + CUDA)          │
+│ - soprano_server.py                                 │
+│ - Port: 5555 (ZMQ, internal)                        │
+│ - Python 3.11                                       │
+└────────────────────┬────────────────────────────────┘
+                     │ Audio data (JSON/base64)
+                     ▼
+┌─────────────────────────────────────────────────────┐
+│ RVC Processing                                      │
+│ - Voice conversion                                  │
+│ - 200ms blocks with 50ms crossfade                  │
+└────────────────────┬────────────────────────────────┘
+                     │
+                     ▼
+┌─────────────────────────────────────────────────────┐
+│ Client (HTTP Response with WAV audio)               │
+└─────────────────────────────────────────────────────┘
+```
+
+## Performance Metrics
+
+From bare metal testing (Docker overhead is negligible):
+
+| Metric | Value |
+|--------|-------|
+| Overall Realtime Factor | 0.95x average |
+| Peak Performance | 1.12x realtime |
+| Soprano (isolated) | 16.48x realtime |
+| Soprano (via ZMQ) | ~7.10x realtime |
+| RVC Processing | 166-196ms per 200ms block |
+| ZMQ Transfer | ~0.7s for full audio |
+
+## Next Steps
+
+1. **Integration**: Add to main Miku bot `docker-compose.yml`
+2. **Testing**: Test with LLM streaming (`stream_llm_to_voice.py`)
+3. **Production**: Monitor performance and tune configuration
+4. **Scaling**: Consider horizontal scaling for multiple users
+
+## Support
+
+For detailed setup instructions, see [DOCKER_SETUP.md](DOCKER_SETUP.md)