201 lines
6.8 KiB
Markdown
201 lines
6.8 KiB
Markdown
|
|
# Docker Quick Reference
|
||
|
|
|
||
|
|
## Quick Commands
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Build containers
|
||
|
|
./build_docker.sh
|
||
|
|
|
||
|
|
# Start services (with auto-wait for ready)
|
||
|
|
./start_docker.sh
|
||
|
|
|
||
|
|
# Start manually
|
||
|
|
docker-compose up -d
|
||
|
|
|
||
|
|
# Stop services
|
||
|
|
docker-compose down
|
||
|
|
|
||
|
|
# View logs
|
||
|
|
docker-compose logs -f
|
||
|
|
|
||
|
|
# Restart a service
|
||
|
|
docker-compose restart soprano
|
||
|
|
docker-compose restart rvc
|
||
|
|
|
||
|
|
# Rebuild and restart
|
||
|
|
docker-compose up -d --build
|
||
|
|
```
|
||
|
|
|
||
|
|
## Health & Status
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Health check
|
||
|
|
curl http://localhost:8765/health
|
||
|
|
|
||
|
|
# Pipeline status
|
||
|
|
curl http://localhost:8765/api/status
|
||
|
|
|
||
|
|
# Container status
|
||
|
|
docker-compose ps
|
||
|
|
|
||
|
|
# Resource usage
|
||
|
|
docker stats miku-soprano-tts miku-rvc-api
|
||
|
|
```
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Test full pipeline (TTS + RVC)
|
||
|
|
curl -X POST http://localhost:8765/api/speak \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{"text": "Hello, I am Miku!"}' \
|
||
|
|
-o test.wav && ffplay test.wav
|
||
|
|
|
||
|
|
# Test Soprano only
|
||
|
|
curl -X POST http://localhost:8765/api/speak_soprano_only \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{"text": "Testing Soprano"}' \
|
||
|
|
-o soprano.wav && ffplay soprano.wav
|
||
|
|
```
|
||
|
|
|
||
|
|
## Debugging
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# View logs
|
||
|
|
docker-compose logs soprano # Soprano TTS logs
|
||
|
|
docker-compose logs rvc # RVC API logs
|
||
|
|
docker-compose logs -f # Follow all logs
|
||
|
|
|
||
|
|
# Shell into container
|
||
|
|
docker exec -it miku-soprano-tts bash
|
||
|
|
docker exec -it miku-rvc-api bash
|
||
|
|
|
||
|
|
# Check GPU usage
|
||
|
|
docker exec miku-soprano-tts nvidia-smi
|
||
|
|
docker exec miku-rvc-api rocm-smi
|
||
|
|
|
||
|
|
# Test ZMQ connection from RVC container
|
||
|
|
docker exec miku-rvc-api python3 -c "
|
||
|
|
import zmq
|
||
|
|
ctx = zmq.Context()
|
||
|
|
sock = ctx.socket(zmq.REQ)
|
||
|
|
sock.connect('tcp://soprano:5555')
|
||
|
|
print('Connected to Soprano!')
|
||
|
|
sock.close()
|
||
|
|
"
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
Edit `docker-compose.yml` to change GPU device IDs:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
services:
|
||
|
|
soprano:
|
||
|
|
environment:
|
||
|
|
- NVIDIA_VISIBLE_DEVICES=1 # Your NVIDIA GPU ID
|
||
|
|
|
||
|
|
rvc:
|
||
|
|
environment:
|
||
|
|
- ROCR_VISIBLE_DEVICES=0 # Your AMD GPU ID
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance Tips
|
||
|
|
|
||
|
|
- **First run is slow**: ROCm kernel compilation takes time (first 5 jobs)
|
||
|
|
- **Warmup helps**: First few jobs may be slower, then speeds up
|
||
|
|
- **Monitor VRAM**:
|
||
|
|
- Soprano needs ~4GB (GTX 1660 has 6GB)
|
||
|
|
- RVC needs ~8GB (RX 6800 has 16GB)
|
||
|
|
- **CPU bottleneck**: rmvpe f0method is CPU-bound (~50% load on FX 6100)
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
| Issue | Solution |
|
||
|
|
|-------|----------|
|
||
|
|
| Container won't start | Check `docker-compose logs <service>` |
|
||
|
|
| GPU not detected | Verify device IDs with `nvidia-smi -L` and `rocm-smi` |
|
||
|
|
| Health check fails | Wait 60-120s for model loading |
|
||
|
|
| ZMQ timeout | Check network: `docker network inspect miku-voice-network` |
|
||
|
|
| Out of memory | Restart containers or reduce batch size |
|
||
|
|
| Slow performance | Check GPU usage with `nvidia-smi` / `rocm-smi` |
|
||
|
|
|
||
|
|
## File Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
soprano_to_rvc/
|
||
|
|
├── docker-compose.yml # Container orchestration
|
||
|
|
├── Dockerfile.soprano # Soprano container (CUDA)
|
||
|
|
├── Dockerfile.rvc # RVC container (ROCm)
|
||
|
|
├── build_docker.sh # Build script
|
||
|
|
├── start_docker.sh # Start script with health check
|
||
|
|
├── soprano_server.py # Soprano TTS server
|
||
|
|
├── soprano_rvc_api.py # RVC HTTP API
|
||
|
|
├── soprano_rvc_config.json # Pipeline configuration
|
||
|
|
├── soprano/ # Soprano source code
|
||
|
|
├── Retrieval-based-Voice-Conversion-WebUI/ # RVC WebUI
|
||
|
|
└── models/ # Voice models
|
||
|
|
├── MikuAI_e210_s6300.pth
|
||
|
|
└── added_IVF512_Flat_nprobe_1_MikuAI_v2.index
|
||
|
|
```
|
||
|
|
|
||
|
|
## Architecture Diagram
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────────────────────────────────────────┐
|
||
|
|
│ Client (HTTP POST /api/speak) │
|
||
|
|
└────────────────────┬────────────────────────────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌─────────────────────────────────────────────────────┐
|
||
|
|
│ RVC Container (AMD RX 6800 + ROCm) │
|
||
|
|
│ - soprano_rvc_api.py │
|
||
|
|
│ - Port: 8765 (HTTP) │
|
||
|
|
│ - Python 3.10 │
|
||
|
|
└────────────────────┬────────────────────────────────┘
|
||
|
|
│ ZMQ (tcp://soprano:5555)
|
||
|
|
▼
|
||
|
|
┌─────────────────────────────────────────────────────┐
|
||
|
|
│ Soprano Container (NVIDIA GTX 1660 + CUDA) │
|
||
|
|
│ - soprano_server.py │
|
||
|
|
│ - Port: 5555 (ZMQ, internal) │
|
||
|
|
│ - Python 3.11 │
|
||
|
|
└────────────────────┬────────────────────────────────┘
|
||
|
|
│ Audio data (JSON/base64)
|
||
|
|
▼
|
||
|
|
┌─────────────────────────────────────────────────────┐
|
||
|
|
│ RVC Processing │
|
||
|
|
│ - Voice conversion │
|
||
|
|
│ - 200ms blocks with 50ms crossfade │
|
||
|
|
└────────────────────┬────────────────────────────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌─────────────────────────────────────────────────────┐
|
||
|
|
│ Client (HTTP Response with WAV audio) │
|
||
|
|
└─────────────────────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance Metrics
|
||
|
|
|
||
|
|
From bare metal testing (Docker overhead is negligible):
|
||
|
|
|
||
|
|
| Metric | Value |
|
||
|
|
|--------|-------|
|
||
|
|
| Overall Realtime Factor | 0.95x average |
|
||
|
|
| Peak Performance | 1.12x realtime |
|
||
|
|
| Soprano (isolated) | 16.48x realtime |
|
||
|
|
| Soprano (via ZMQ) | ~7.10x realtime |
|
||
|
|
| RVC Processing | 166-196ms per 200ms block |
|
||
|
|
| ZMQ Transfer | ~0.7s for full audio |
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Integration**: Add to main Miku bot `docker-compose.yml`
|
||
|
|
2. **Testing**: Test with LLM streaming (`stream_llm_to_voice.py`)
|
||
|
|
3. **Production**: Monitor performance and tune configuration
|
||
|
|
4. **Scaling**: Consider horizontal scaling for multiple users
|
||
|
|
|
||
|
|
## Support
|
||
|
|
|
||
|
|
For detailed setup instructions, see [DOCKER_SETUP.md](DOCKER_SETUP.md)
|