soprano_to_rvc/DOCKER_QUICK_REF.md

# Docker Quick Reference

## Quick Commands

```bash
# Build containers
./build_docker.sh

# Start services (with auto-wait for ready)
./start_docker.sh

# Start manually
docker-compose up -d

# Stop services
docker-compose down

# View logs
docker-compose logs -f

# Restart a service
docker-compose restart soprano
docker-compose restart rvc

# Rebuild and restart
docker-compose up -d --build
```

## Health & Status

```bash
# Health check
curl http://localhost:8765/health

# Pipeline status
curl http://localhost:8765/api/status

# Container status
docker-compose ps

# Resource usage
docker stats miku-soprano-tts miku-rvc-api
```

## Testing

```bash
# Test full pipeline (TTS + RVC)
curl -X POST http://localhost:8765/api/speak \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, I am Miku!"}' \
  -o test.wav && ffplay test.wav

# Test Soprano only
curl -X POST http://localhost:8765/api/speak_soprano_only \
  -H "Content-Type: application/json" \
  -d '{"text": "Testing Soprano"}' \
  -o soprano.wav && ffplay soprano.wav
```

## Debugging

```bash
# View logs
docker-compose logs soprano  # Soprano TTS logs
docker-compose logs rvc      # RVC API logs
docker-compose logs -f       # Follow all logs

# Shell into container
docker exec -it miku-soprano-tts bash
docker exec -it miku-rvc-api bash

# Check GPU usage
docker exec miku-soprano-tts nvidia-smi
docker exec miku-rvc-api rocm-smi

# Test ZMQ connection from RVC container
docker exec miku-rvc-api python3 -c "
import zmq
ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect('tcp://soprano:5555')
print('Connected to Soprano!')
sock.close()
"
```

## Configuration

Edit `docker-compose.yml` to change GPU device IDs:

```yaml
services:
  soprano:
    environment:
      - NVIDIA_VISIBLE_DEVICES=1  # Your NVIDIA GPU ID
      
  rvc:
    environment:
      - ROCR_VISIBLE_DEVICES=0    # Your AMD GPU ID
```

## Performance Tips

- **First run is slow**: ROCm kernel compilation takes time (first 5 jobs)
- **Warmup helps**: First few jobs may be slower, then speeds up
- **Monitor VRAM**: 
  - Soprano needs ~4GB (GTX 1660 has 6GB)
  - RVC needs ~8GB (RX 6800 has 16GB)
- **CPU bottleneck**: rmvpe f0method is CPU-bound (~50% load on FX 6100)

## Troubleshooting

| Issue | Solution |
|-------|----------|
| Container won't start | Check `docker-compose logs <service>` |
| GPU not detected | Verify device IDs with `nvidia-smi -L` and `rocm-smi` |
| Health check fails | Wait 60-120s for model loading |
| ZMQ timeout | Check network: `docker network inspect miku-voice-network` |
| Out of memory | Restart containers or reduce batch size |
| Slow performance | Check GPU usage with `nvidia-smi` / `rocm-smi` |

## File Structure

```
soprano_to_rvc/
├── docker-compose.yml           # Container orchestration
├── Dockerfile.soprano           # Soprano container (CUDA)
├── Dockerfile.rvc               # RVC container (ROCm)
├── build_docker.sh             # Build script
├── start_docker.sh             # Start script with health check
├── soprano_server.py           # Soprano TTS server
├── soprano_rvc_api.py          # RVC HTTP API
├── soprano_rvc_config.json     # Pipeline configuration
├── soprano/                    # Soprano source code
├── Retrieval-based-Voice-Conversion-WebUI/  # RVC WebUI
└── models/                     # Voice models
    ├── MikuAI_e210_s6300.pth
    └── added_IVF512_Flat_nprobe_1_MikuAI_v2.index
```

## Architecture Diagram

```
┌─────────────────────────────────────────────────────┐
│ Client (HTTP POST /api/speak)                       │
└────────────────────┬────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────┐
│ RVC Container (AMD RX 6800 + ROCm)                  │
│ - soprano_rvc_api.py                                │
│ - Port: 8765 (HTTP)                                 │
│ - Python 3.10                                       │
└────────────────────┬────────────────────────────────┘
                     │ ZMQ (tcp://soprano:5555)
                     ▼
┌─────────────────────────────────────────────────────┐
│ Soprano Container (NVIDIA GTX 1660 + CUDA)          │
│ - soprano_server.py                                 │
│ - Port: 5555 (ZMQ, internal)                        │
│ - Python 3.11                                       │
└────────────────────┬────────────────────────────────┘
                     │ Audio data (JSON/base64)
                     ▼
┌─────────────────────────────────────────────────────┐
│ RVC Processing                                      │
│ - Voice conversion                                  │
│ - 200ms blocks with 50ms crossfade                  │
└────────────────────┬────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────┐
│ Client (HTTP Response with WAV audio)               │
└─────────────────────────────────────────────────────┘
```

## Performance Metrics

From bare metal testing (Docker overhead is negligible):

| Metric | Value |
|--------|-------|
| Overall Realtime Factor | 0.95x average |
| Peak Performance | 1.12x realtime |
| Soprano (isolated) | 16.48x realtime |
| Soprano (via ZMQ) | ~7.10x realtime |
| RVC Processing | 166-196ms per 200ms block |
| ZMQ Transfer | ~0.7s for full audio |

## Next Steps

1. **Integration**: Add to main Miku bot `docker-compose.yml`
2. **Testing**: Test with LLM streaming (`stream_llm_to_voice.py`)
3. **Production**: Monitor performance and tune configuration
4. **Scaling**: Consider horizontal scaling for multiple users

## Support

For detailed setup instructions, see [DOCKER_SETUP.md](DOCKER_SETUP.md)
add: absorb soprano_to_rvc as regular subdirectory Voice conversion pipeline (Soprano TTS → RVC) with Docker support. Previously tracked as bare gitlink; removed .git/ directories and absorbed into main repo for unified tracking. Includes: Soprano TTS, RVC WebUI integration, Docker configs, WebSocket API, and benchmark scripts. Updated .gitignore to exclude large model weights (.pth, .pt, .onnx, .index). 287 files (3.1GB of ML weights properly excluded via gitignore). 2026-03-04 00:24:53 +02:00			`# Docker Quick Reference`

			`## Quick Commands`

			```bash
			`# Build containers`
			`./build_docker.sh`

			`# Start services (with auto-wait for ready)`
			`./start_docker.sh`

			`# Start manually`
			`docker-compose up -d`

			`# Stop services`
			`docker-compose down`

			`# View logs`
			`docker-compose logs -f`

			`# Restart a service`
			`docker-compose restart soprano`
			`docker-compose restart rvc`

			`# Rebuild and restart`
			`docker-compose up -d --build`
			```

			`## Health & Status`

			```bash
			`# Health check`
			`curl http://localhost:8765/health`

			`# Pipeline status`
			`curl http://localhost:8765/api/status`

			`# Container status`
			`docker-compose ps`

			`# Resource usage`
			`docker stats miku-soprano-tts miku-rvc-api`
			```

			`## Testing`

			```bash
			`# Test full pipeline (TTS + RVC)`
			`curl -X POST http://localhost:8765/api/speak \`
			`-H "Content-Type: application/json" \`
			`-d '{"text": "Hello, I am Miku!"}' \`
			`-o test.wav && ffplay test.wav`

			`# Test Soprano only`
			`curl -X POST http://localhost:8765/api/speak_soprano_only \`
			`-H "Content-Type: application/json" \`
			`-d '{"text": "Testing Soprano"}' \`
			`-o soprano.wav && ffplay soprano.wav`
			```

			`## Debugging`

			```bash
			`# View logs`
			`docker-compose logs soprano # Soprano TTS logs`
			`docker-compose logs rvc # RVC API logs`
			`docker-compose logs -f # Follow all logs`

			`# Shell into container`
			`docker exec -it miku-soprano-tts bash`
			`docker exec -it miku-rvc-api bash`

			`# Check GPU usage`
			`docker exec miku-soprano-tts nvidia-smi`
			`docker exec miku-rvc-api rocm-smi`

			`# Test ZMQ connection from RVC container`
			`docker exec miku-rvc-api python3 -c "`
			`import zmq`
			`ctx = zmq.Context()`
			`sock = ctx.socket(zmq.REQ)`
			`sock.connect('tcp://soprano:5555')`
			`print('Connected to Soprano!')`
			`sock.close()`
			`"`
			```

			`## Configuration`

			Edit `docker-compose.yml` to change GPU device IDs:

			```yaml
			`services:`
			`soprano:`
			`environment:`
			`- NVIDIA_VISIBLE_DEVICES=1 # Your NVIDIA GPU ID`

			`rvc:`
			`environment:`
			`- ROCR_VISIBLE_DEVICES=0 # Your AMD GPU ID`
			```

			`## Performance Tips`

			`- First run is slow: ROCm kernel compilation takes time (first 5 jobs)`
			`- Warmup helps: First few jobs may be slower, then speeds up`
			`- Monitor VRAM:`
			`- Soprano needs ~4GB (GTX 1660 has 6GB)`
			`- RVC needs ~8GB (RX 6800 has 16GB)`
			`- CPU bottleneck: rmvpe f0method is CPU-bound (~50% load on FX 6100)`

			`## Troubleshooting`

			`\| Issue \| Solution \|`
			`\|-------\|----------\|`
			\| Container won't start \| Check `docker-compose logs <service>` \|
			\| GPU not detected \| Verify device IDs with `nvidia-smi -L` and `rocm-smi` \|
			`\| Health check fails \| Wait 60-120s for model loading \|`
			\| ZMQ timeout \| Check network: `docker network inspect miku-voice-network` \|
			`\| Out of memory \| Restart containers or reduce batch size \|`
			\| Slow performance \| Check GPU usage with `nvidia-smi` / `rocm-smi` \|

			`## File Structure`

			```
			`soprano_to_rvc/`
			`├── docker-compose.yml # Container orchestration`
			`├── Dockerfile.soprano # Soprano container (CUDA)`
			`├── Dockerfile.rvc # RVC container (ROCm)`
			`├── build_docker.sh # Build script`
			`├── start_docker.sh # Start script with health check`
			`├── soprano_server.py # Soprano TTS server`
			`├── soprano_rvc_api.py # RVC HTTP API`
			`├── soprano_rvc_config.json # Pipeline configuration`
			`├── soprano/ # Soprano source code`
			`├── Retrieval-based-Voice-Conversion-WebUI/ # RVC WebUI`
			`└── models/ # Voice models`
			`├── MikuAI_e210_s6300.pth`
			`└── added_IVF512_Flat_nprobe_1_MikuAI_v2.index`
			```

			`## Architecture Diagram`

			```
			`┌─────────────────────────────────────────────────────┐`
			`│ Client (HTTP POST /api/speak) │`
			`└────────────────────┬────────────────────────────────┘`
			`│`
			`▼`
			`┌─────────────────────────────────────────────────────┐`
			`│ RVC Container (AMD RX 6800 + ROCm) │`
			`│ - soprano_rvc_api.py │`
			`│ - Port: 8765 (HTTP) │`
			`│ - Python 3.10 │`
			`└────────────────────┬────────────────────────────────┘`
			`│ ZMQ (tcp://soprano:5555)`
			`▼`
			`┌─────────────────────────────────────────────────────┐`
			`│ Soprano Container (NVIDIA GTX 1660 + CUDA) │`
			`│ - soprano_server.py │`
			`│ - Port: 5555 (ZMQ, internal) │`
			`│ - Python 3.11 │`
			`└────────────────────┬────────────────────────────────┘`
			`│ Audio data (JSON/base64)`
			`▼`
			`┌─────────────────────────────────────────────────────┐`
			`│ RVC Processing │`
			`│ - Voice conversion │`
			`│ - 200ms blocks with 50ms crossfade │`
			`└────────────────────┬────────────────────────────────┘`
			`│`
			`▼`
			`┌─────────────────────────────────────────────────────┐`
			`│ Client (HTTP Response with WAV audio) │`
			`└─────────────────────────────────────────────────────┘`
			```

			`## Performance Metrics`

			`From bare metal testing (Docker overhead is negligible):`

			`\| Metric \| Value \|`
			`\|--------\|-------\|`
			`\| Overall Realtime Factor \| 0.95x average \|`
			`\| Peak Performance \| 1.12x realtime \|`
			`\| Soprano (isolated) \| 16.48x realtime \|`
			`\| Soprano (via ZMQ) \| ~7.10x realtime \|`
			`\| RVC Processing \| 166-196ms per 200ms block \|`
			`\| ZMQ Transfer \| ~0.7s for full audio \|`

			`## Next Steps`

			1. Integration: Add to main Miku bot `docker-compose.yml`
			2. Testing: Test with LLM streaming (`stream_llm_to_voice.py`)
			`3. Production: Monitor performance and tune configuration`
			`4. Scaling: Consider horizontal scaling for multiple users`

			`## Support`

			`For detailed setup instructions, see [DOCKER_SETUP.md](DOCKER_SETUP.md)`