Files
miku-discord/soprano_to_rvc/DOCKER_QUICK_REF.md

201 lines
6.8 KiB
Markdown
Raw Permalink Normal View History

# Docker Quick Reference
## Quick Commands
```bash
# Build containers
./build_docker.sh
# Start services (with auto-wait for ready)
./start_docker.sh
# Start manually
docker-compose up -d
# Stop services
docker-compose down
# View logs
docker-compose logs -f
# Restart a service
docker-compose restart soprano
docker-compose restart rvc
# Rebuild and restart
docker-compose up -d --build
```
## Health & Status
```bash
# Health check
curl http://localhost:8765/health
# Pipeline status
curl http://localhost:8765/api/status
# Container status
docker-compose ps
# Resource usage
docker stats miku-soprano-tts miku-rvc-api
```
## Testing
```bash
# Test full pipeline (TTS + RVC)
curl -X POST http://localhost:8765/api/speak \
-H "Content-Type: application/json" \
-d '{"text": "Hello, I am Miku!"}' \
-o test.wav && ffplay test.wav
# Test Soprano only
curl -X POST http://localhost:8765/api/speak_soprano_only \
-H "Content-Type: application/json" \
-d '{"text": "Testing Soprano"}' \
-o soprano.wav && ffplay soprano.wav
```
## Debugging
```bash
# View logs
docker-compose logs soprano # Soprano TTS logs
docker-compose logs rvc # RVC API logs
docker-compose logs -f # Follow all logs
# Shell into container
docker exec -it miku-soprano-tts bash
docker exec -it miku-rvc-api bash
# Check GPU usage
docker exec miku-soprano-tts nvidia-smi
docker exec miku-rvc-api rocm-smi
# Test ZMQ connection from RVC container
docker exec miku-rvc-api python3 -c "
import zmq
ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect('tcp://soprano:5555')
print('Connected to Soprano!')
sock.close()
"
```
## Configuration
Edit `docker-compose.yml` to change GPU device IDs:
```yaml
services:
soprano:
environment:
- NVIDIA_VISIBLE_DEVICES=1 # Your NVIDIA GPU ID
rvc:
environment:
- ROCR_VISIBLE_DEVICES=0 # Your AMD GPU ID
```
## Performance Tips
- **First run is slow**: ROCm kernel compilation takes time (first 5 jobs)
- **Warmup helps**: First few jobs may be slower, then speeds up
- **Monitor VRAM**:
- Soprano needs ~4GB (GTX 1660 has 6GB)
- RVC needs ~8GB (RX 6800 has 16GB)
- **CPU bottleneck**: rmvpe f0method is CPU-bound (~50% load on FX 6100)
## Troubleshooting
| Issue | Solution |
|-------|----------|
| Container won't start | Check `docker-compose logs <service>` |
| GPU not detected | Verify device IDs with `nvidia-smi -L` and `rocm-smi` |
| Health check fails | Wait 60-120s for model loading |
| ZMQ timeout | Check network: `docker network inspect miku-voice-network` |
| Out of memory | Restart containers or reduce batch size |
| Slow performance | Check GPU usage with `nvidia-smi` / `rocm-smi` |
## File Structure
```
soprano_to_rvc/
├── docker-compose.yml # Container orchestration
├── Dockerfile.soprano # Soprano container (CUDA)
├── Dockerfile.rvc # RVC container (ROCm)
├── build_docker.sh # Build script
├── start_docker.sh # Start script with health check
├── soprano_server.py # Soprano TTS server
├── soprano_rvc_api.py # RVC HTTP API
├── soprano_rvc_config.json # Pipeline configuration
├── soprano/ # Soprano source code
├── Retrieval-based-Voice-Conversion-WebUI/ # RVC WebUI
└── models/ # Voice models
├── MikuAI_e210_s6300.pth
└── added_IVF512_Flat_nprobe_1_MikuAI_v2.index
```
## Architecture Diagram
```
┌─────────────────────────────────────────────────────┐
│ Client (HTTP POST /api/speak) │
└────────────────────┬────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ RVC Container (AMD RX 6800 + ROCm) │
│ - soprano_rvc_api.py │
│ - Port: 8765 (HTTP) │
│ - Python 3.10 │
└────────────────────┬────────────────────────────────┘
│ ZMQ (tcp://soprano:5555)
┌─────────────────────────────────────────────────────┐
│ Soprano Container (NVIDIA GTX 1660 + CUDA) │
│ - soprano_server.py │
│ - Port: 5555 (ZMQ, internal) │
│ - Python 3.11 │
└────────────────────┬────────────────────────────────┘
│ Audio data (JSON/base64)
┌─────────────────────────────────────────────────────┐
│ RVC Processing │
│ - Voice conversion │
│ - 200ms blocks with 50ms crossfade │
└────────────────────┬────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Client (HTTP Response with WAV audio) │
└─────────────────────────────────────────────────────┘
```
## Performance Metrics
From bare metal testing (Docker overhead is negligible):
| Metric | Value |
|--------|-------|
| Overall Realtime Factor | 0.95x average |
| Peak Performance | 1.12x realtime |
| Soprano (isolated) | 16.48x realtime |
| Soprano (via ZMQ) | ~7.10x realtime |
| RVC Processing | 166-196ms per 200ms block |
| ZMQ Transfer | ~0.7s for full audio |
## Next Steps
1. **Integration**: Add to main Miku bot `docker-compose.yml`
2. **Testing**: Test with LLM streaming (`stream_llm_to_voice.py`)
3. **Production**: Monitor performance and tune configuration
4. **Scaling**: Consider horizontal scaling for multiple users
## Support
For detailed setup instructions, see [DOCKER_SETUP.md](DOCKER_SETUP.md)