# Docker Quick Reference ## Quick Commands ```bash # Build containers ./build_docker.sh # Start services (with auto-wait for ready) ./start_docker.sh # Start manually docker-compose up -d # Stop services docker-compose down # View logs docker-compose logs -f # Restart a service docker-compose restart soprano docker-compose restart rvc # Rebuild and restart docker-compose up -d --build ``` ## Health & Status ```bash # Health check curl http://localhost:8765/health # Pipeline status curl http://localhost:8765/api/status # Container status docker-compose ps # Resource usage docker stats miku-soprano-tts miku-rvc-api ``` ## Testing ```bash # Test full pipeline (TTS + RVC) curl -X POST http://localhost:8765/api/speak \ -H "Content-Type: application/json" \ -d '{"text": "Hello, I am Miku!"}' \ -o test.wav && ffplay test.wav # Test Soprano only curl -X POST http://localhost:8765/api/speak_soprano_only \ -H "Content-Type: application/json" \ -d '{"text": "Testing Soprano"}' \ -o soprano.wav && ffplay soprano.wav ``` ## Debugging ```bash # View logs docker-compose logs soprano # Soprano TTS logs docker-compose logs rvc # RVC API logs docker-compose logs -f # Follow all logs # Shell into container docker exec -it miku-soprano-tts bash docker exec -it miku-rvc-api bash # Check GPU usage docker exec miku-soprano-tts nvidia-smi docker exec miku-rvc-api rocm-smi # Test ZMQ connection from RVC container docker exec miku-rvc-api python3 -c " import zmq ctx = zmq.Context() sock = ctx.socket(zmq.REQ) sock.connect('tcp://soprano:5555') print('Connected to Soprano!') sock.close() " ``` ## Configuration Edit `docker-compose.yml` to change GPU device IDs: ```yaml services: soprano: environment: - NVIDIA_VISIBLE_DEVICES=1 # Your NVIDIA GPU ID rvc: environment: - ROCR_VISIBLE_DEVICES=0 # Your AMD GPU ID ``` ## Performance Tips - **First run is slow**: ROCm kernel compilation takes time (first 5 jobs) - **Warmup helps**: First few jobs may be slower, then speeds up - **Monitor VRAM**: - Soprano needs ~4GB (GTX 1660 has 6GB) - RVC needs ~8GB (RX 6800 has 16GB) - **CPU bottleneck**: rmvpe f0method is CPU-bound (~50% load on FX 6100) ## Troubleshooting | Issue | Solution | |-------|----------| | Container won't start | Check `docker-compose logs ` | | GPU not detected | Verify device IDs with `nvidia-smi -L` and `rocm-smi` | | Health check fails | Wait 60-120s for model loading | | ZMQ timeout | Check network: `docker network inspect miku-voice-network` | | Out of memory | Restart containers or reduce batch size | | Slow performance | Check GPU usage with `nvidia-smi` / `rocm-smi` | ## File Structure ``` soprano_to_rvc/ ├── docker-compose.yml # Container orchestration ├── Dockerfile.soprano # Soprano container (CUDA) ├── Dockerfile.rvc # RVC container (ROCm) ├── build_docker.sh # Build script ├── start_docker.sh # Start script with health check ├── soprano_server.py # Soprano TTS server ├── soprano_rvc_api.py # RVC HTTP API ├── soprano_rvc_config.json # Pipeline configuration ├── soprano/ # Soprano source code ├── Retrieval-based-Voice-Conversion-WebUI/ # RVC WebUI └── models/ # Voice models ├── MikuAI_e210_s6300.pth └── added_IVF512_Flat_nprobe_1_MikuAI_v2.index ``` ## Architecture Diagram ``` ┌─────────────────────────────────────────────────────┐ │ Client (HTTP POST /api/speak) │ └────────────────────┬────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────┐ │ RVC Container (AMD RX 6800 + ROCm) │ │ - soprano_rvc_api.py │ │ - Port: 8765 (HTTP) │ │ - Python 3.10 │ └────────────────────┬────────────────────────────────┘ │ ZMQ (tcp://soprano:5555) ▼ ┌─────────────────────────────────────────────────────┐ │ Soprano Container (NVIDIA GTX 1660 + CUDA) │ │ - soprano_server.py │ │ - Port: 5555 (ZMQ, internal) │ │ - Python 3.11 │ └────────────────────┬────────────────────────────────┘ │ Audio data (JSON/base64) ▼ ┌─────────────────────────────────────────────────────┐ │ RVC Processing │ │ - Voice conversion │ │ - 200ms blocks with 50ms crossfade │ └────────────────────┬────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────┐ │ Client (HTTP Response with WAV audio) │ └─────────────────────────────────────────────────────┘ ``` ## Performance Metrics From bare metal testing (Docker overhead is negligible): | Metric | Value | |--------|-------| | Overall Realtime Factor | 0.95x average | | Peak Performance | 1.12x realtime | | Soprano (isolated) | 16.48x realtime | | Soprano (via ZMQ) | ~7.10x realtime | | RVC Processing | 166-196ms per 200ms block | | ZMQ Transfer | ~0.7s for full audio | ## Next Steps 1. **Integration**: Add to main Miku bot `docker-compose.yml` 2. **Testing**: Test with LLM streaming (`stream_llm_to_voice.py`) 3. **Production**: Monitor performance and tune configuration 4. **Scaling**: Consider horizontal scaling for multiple users ## Support For detailed setup instructions, see [DOCKER_SETUP.md](DOCKER_SETUP.md)