Files
miku-discord/soprano_to_rvc/DOCKER_QUICK_REF.md
koko210Serve 8ca716029e add: absorb soprano_to_rvc as regular subdirectory
Voice conversion pipeline (Soprano TTS → RVC) with Docker support.
Previously tracked as bare gitlink; removed .git/ directories and
absorbed into main repo for unified tracking.

Includes: Soprano TTS, RVC WebUI integration, Docker configs,
WebSocket API, and benchmark scripts.
Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index).
287 files (3.1GB of ML weights properly excluded via gitignore).
2026-03-04 00:24:53 +02:00

6.8 KiB

Docker Quick Reference

Quick Commands

# Build containers
./build_docker.sh

# Start services (with auto-wait for ready)
./start_docker.sh

# Start manually
docker-compose up -d

# Stop services
docker-compose down

# View logs
docker-compose logs -f

# Restart a service
docker-compose restart soprano
docker-compose restart rvc

# Rebuild and restart
docker-compose up -d --build

Health & Status

# Health check
curl http://localhost:8765/health

# Pipeline status
curl http://localhost:8765/api/status

# Container status
docker-compose ps

# Resource usage
docker stats miku-soprano-tts miku-rvc-api

Testing

# Test full pipeline (TTS + RVC)
curl -X POST http://localhost:8765/api/speak \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, I am Miku!"}' \
  -o test.wav && ffplay test.wav

# Test Soprano only
curl -X POST http://localhost:8765/api/speak_soprano_only \
  -H "Content-Type: application/json" \
  -d '{"text": "Testing Soprano"}' \
  -o soprano.wav && ffplay soprano.wav

Debugging

# View logs
docker-compose logs soprano  # Soprano TTS logs
docker-compose logs rvc      # RVC API logs
docker-compose logs -f       # Follow all logs

# Shell into container
docker exec -it miku-soprano-tts bash
docker exec -it miku-rvc-api bash

# Check GPU usage
docker exec miku-soprano-tts nvidia-smi
docker exec miku-rvc-api rocm-smi

# Test ZMQ connection from RVC container
docker exec miku-rvc-api python3 -c "
import zmq
ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect('tcp://soprano:5555')
print('Connected to Soprano!')
sock.close()
"

Configuration

Edit docker-compose.yml to change GPU device IDs:

services:
  soprano:
    environment:
      - NVIDIA_VISIBLE_DEVICES=1  # Your NVIDIA GPU ID
      
  rvc:
    environment:
      - ROCR_VISIBLE_DEVICES=0    # Your AMD GPU ID

Performance Tips

  • First run is slow: ROCm kernel compilation takes time (first 5 jobs)
  • Warmup helps: First few jobs may be slower, then speeds up
  • Monitor VRAM:
    • Soprano needs ~4GB (GTX 1660 has 6GB)
    • RVC needs ~8GB (RX 6800 has 16GB)
  • CPU bottleneck: rmvpe f0method is CPU-bound (~50% load on FX 6100)

Troubleshooting

Issue Solution
Container won't start Check docker-compose logs <service>
GPU not detected Verify device IDs with nvidia-smi -L and rocm-smi
Health check fails Wait 60-120s for model loading
ZMQ timeout Check network: docker network inspect miku-voice-network
Out of memory Restart containers or reduce batch size
Slow performance Check GPU usage with nvidia-smi / rocm-smi

File Structure

soprano_to_rvc/
├── docker-compose.yml           # Container orchestration
├── Dockerfile.soprano           # Soprano container (CUDA)
├── Dockerfile.rvc               # RVC container (ROCm)
├── build_docker.sh             # Build script
├── start_docker.sh             # Start script with health check
├── soprano_server.py           # Soprano TTS server
├── soprano_rvc_api.py          # RVC HTTP API
├── soprano_rvc_config.json     # Pipeline configuration
├── soprano/                    # Soprano source code
├── Retrieval-based-Voice-Conversion-WebUI/  # RVC WebUI
└── models/                     # Voice models
    ├── MikuAI_e210_s6300.pth
    └── added_IVF512_Flat_nprobe_1_MikuAI_v2.index

Architecture Diagram

┌─────────────────────────────────────────────────────┐
│ Client (HTTP POST /api/speak)                       │
└────────────────────┬────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────┐
│ RVC Container (AMD RX 6800 + ROCm)                  │
│ - soprano_rvc_api.py                                │
│ - Port: 8765 (HTTP)                                 │
│ - Python 3.10                                       │
└────────────────────┬────────────────────────────────┘
                     │ ZMQ (tcp://soprano:5555)
                     ▼
┌─────────────────────────────────────────────────────┐
│ Soprano Container (NVIDIA GTX 1660 + CUDA)          │
│ - soprano_server.py                                 │
│ - Port: 5555 (ZMQ, internal)                        │
│ - Python 3.11                                       │
└────────────────────┬────────────────────────────────┘
                     │ Audio data (JSON/base64)
                     ▼
┌─────────────────────────────────────────────────────┐
│ RVC Processing                                      │
│ - Voice conversion                                  │
│ - 200ms blocks with 50ms crossfade                  │
└────────────────────┬────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────┐
│ Client (HTTP Response with WAV audio)               │
└─────────────────────────────────────────────────────┘

Performance Metrics

From bare metal testing (Docker overhead is negligible):

Metric Value
Overall Realtime Factor 0.95x average
Peak Performance 1.12x realtime
Soprano (isolated) 16.48x realtime
Soprano (via ZMQ) ~7.10x realtime
RVC Processing 166-196ms per 200ms block
ZMQ Transfer ~0.7s for full audio

Next Steps

  1. Integration: Add to main Miku bot docker-compose.yml
  2. Testing: Test with LLM streaming (stream_llm_to_voice.py)
  3. Production: Monitor performance and tune configuration
  4. Scaling: Consider horizontal scaling for multiple users

Support

For detailed setup instructions, see DOCKER_SETUP.md