add: absorb soprano_to_rvc as regular subdirectory
Voice conversion pipeline (Soprano TTS → RVC) with Docker support. Previously tracked as bare gitlink; removed .git/ directories and absorbed into main repo for unified tracking. Includes: Soprano TTS, RVC WebUI integration, Docker configs, WebSocket API, and benchmark scripts. Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index). 287 files (3.1GB of ML weights properly excluded via gitignore).
This commit is contained in:
200
soprano_to_rvc/DOCKER_QUICK_REF.md
Normal file
200
soprano_to_rvc/DOCKER_QUICK_REF.md
Normal file
@@ -0,0 +1,200 @@
|
||||
# Docker Quick Reference
|
||||
|
||||
## Quick Commands
|
||||
|
||||
```bash
|
||||
# Build containers
|
||||
./build_docker.sh
|
||||
|
||||
# Start services (with auto-wait for ready)
|
||||
./start_docker.sh
|
||||
|
||||
# Start manually
|
||||
docker-compose up -d
|
||||
|
||||
# Stop services
|
||||
docker-compose down
|
||||
|
||||
# View logs
|
||||
docker-compose logs -f
|
||||
|
||||
# Restart a service
|
||||
docker-compose restart soprano
|
||||
docker-compose restart rvc
|
||||
|
||||
# Rebuild and restart
|
||||
docker-compose up -d --build
|
||||
```
|
||||
|
||||
## Health & Status
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:8765/health
|
||||
|
||||
# Pipeline status
|
||||
curl http://localhost:8765/api/status
|
||||
|
||||
# Container status
|
||||
docker-compose ps
|
||||
|
||||
# Resource usage
|
||||
docker stats miku-soprano-tts miku-rvc-api
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
# Test full pipeline (TTS + RVC)
|
||||
curl -X POST http://localhost:8765/api/speak \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"text": "Hello, I am Miku!"}' \
|
||||
-o test.wav && ffplay test.wav
|
||||
|
||||
# Test Soprano only
|
||||
curl -X POST http://localhost:8765/api/speak_soprano_only \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"text": "Testing Soprano"}' \
|
||||
-o soprano.wav && ffplay soprano.wav
|
||||
```
|
||||
|
||||
## Debugging
|
||||
|
||||
```bash
|
||||
# View logs
|
||||
docker-compose logs soprano # Soprano TTS logs
|
||||
docker-compose logs rvc # RVC API logs
|
||||
docker-compose logs -f # Follow all logs
|
||||
|
||||
# Shell into container
|
||||
docker exec -it miku-soprano-tts bash
|
||||
docker exec -it miku-rvc-api bash
|
||||
|
||||
# Check GPU usage
|
||||
docker exec miku-soprano-tts nvidia-smi
|
||||
docker exec miku-rvc-api rocm-smi
|
||||
|
||||
# Test ZMQ connection from RVC container
|
||||
docker exec miku-rvc-api python3 -c "
|
||||
import zmq
|
||||
ctx = zmq.Context()
|
||||
sock = ctx.socket(zmq.REQ)
|
||||
sock.connect('tcp://soprano:5555')
|
||||
print('Connected to Soprano!')
|
||||
sock.close()
|
||||
"
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Edit `docker-compose.yml` to change GPU device IDs:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
soprano:
|
||||
environment:
|
||||
- NVIDIA_VISIBLE_DEVICES=1 # Your NVIDIA GPU ID
|
||||
|
||||
rvc:
|
||||
environment:
|
||||
- ROCR_VISIBLE_DEVICES=0 # Your AMD GPU ID
|
||||
```
|
||||
|
||||
## Performance Tips
|
||||
|
||||
- **First run is slow**: ROCm kernel compilation takes time (first 5 jobs)
|
||||
- **Warmup helps**: First few jobs may be slower, then speeds up
|
||||
- **Monitor VRAM**:
|
||||
- Soprano needs ~4GB (GTX 1660 has 6GB)
|
||||
- RVC needs ~8GB (RX 6800 has 16GB)
|
||||
- **CPU bottleneck**: rmvpe f0method is CPU-bound (~50% load on FX 6100)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| Container won't start | Check `docker-compose logs <service>` |
|
||||
| GPU not detected | Verify device IDs with `nvidia-smi -L` and `rocm-smi` |
|
||||
| Health check fails | Wait 60-120s for model loading |
|
||||
| ZMQ timeout | Check network: `docker network inspect miku-voice-network` |
|
||||
| Out of memory | Restart containers or reduce batch size |
|
||||
| Slow performance | Check GPU usage with `nvidia-smi` / `rocm-smi` |
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
soprano_to_rvc/
|
||||
├── docker-compose.yml # Container orchestration
|
||||
├── Dockerfile.soprano # Soprano container (CUDA)
|
||||
├── Dockerfile.rvc # RVC container (ROCm)
|
||||
├── build_docker.sh # Build script
|
||||
├── start_docker.sh # Start script with health check
|
||||
├── soprano_server.py # Soprano TTS server
|
||||
├── soprano_rvc_api.py # RVC HTTP API
|
||||
├── soprano_rvc_config.json # Pipeline configuration
|
||||
├── soprano/ # Soprano source code
|
||||
├── Retrieval-based-Voice-Conversion-WebUI/ # RVC WebUI
|
||||
└── models/ # Voice models
|
||||
├── MikuAI_e210_s6300.pth
|
||||
└── added_IVF512_Flat_nprobe_1_MikuAI_v2.index
|
||||
```
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Client (HTTP POST /api/speak) │
|
||||
└────────────────────┬────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ RVC Container (AMD RX 6800 + ROCm) │
|
||||
│ - soprano_rvc_api.py │
|
||||
│ - Port: 8765 (HTTP) │
|
||||
│ - Python 3.10 │
|
||||
└────────────────────┬────────────────────────────────┘
|
||||
│ ZMQ (tcp://soprano:5555)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Soprano Container (NVIDIA GTX 1660 + CUDA) │
|
||||
│ - soprano_server.py │
|
||||
│ - Port: 5555 (ZMQ, internal) │
|
||||
│ - Python 3.11 │
|
||||
└────────────────────┬────────────────────────────────┘
|
||||
│ Audio data (JSON/base64)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ RVC Processing │
|
||||
│ - Voice conversion │
|
||||
│ - 200ms blocks with 50ms crossfade │
|
||||
└────────────────────┬────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Client (HTTP Response with WAV audio) │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
From bare metal testing (Docker overhead is negligible):
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Overall Realtime Factor | 0.95x average |
|
||||
| Peak Performance | 1.12x realtime |
|
||||
| Soprano (isolated) | 16.48x realtime |
|
||||
| Soprano (via ZMQ) | ~7.10x realtime |
|
||||
| RVC Processing | 166-196ms per 200ms block |
|
||||
| ZMQ Transfer | ~0.7s for full audio |
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Integration**: Add to main Miku bot `docker-compose.yml`
|
||||
2. **Testing**: Test with LLM streaming (`stream_llm_to_voice.py`)
|
||||
3. **Production**: Monitor performance and tune configuration
|
||||
4. **Scaling**: Consider horizontal scaling for multiple users
|
||||
|
||||
## Support
|
||||
|
||||
For detailed setup instructions, see [DOCKER_SETUP.md](DOCKER_SETUP.md)
|
||||
Reference in New Issue
Block a user