Files
miku-discord/soprano_to_rvc/DOCKER_SETUP.md
koko210Serve 8ca716029e add: absorb soprano_to_rvc as regular subdirectory
Voice conversion pipeline (Soprano TTS → RVC) with Docker support.
Previously tracked as bare gitlink; removed .git/ directories and
absorbed into main repo for unified tracking.

Includes: Soprano TTS, RVC WebUI integration, Docker configs,
WebSocket API, and benchmark scripts.
Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index).
287 files (3.1GB of ML weights properly excluded via gitignore).
2026-03-04 00:24:53 +02:00

12 KiB

Soprano + RVC Docker Setup

Docker containerization of the dual-GPU Soprano TTS + RVC voice conversion pipeline.

Architecture

The system uses two containers for GPU runtime isolation:

HTTP Request → RVC Container (AMD RX 6800 + ROCm)
                ↓ ZMQ (tcp://soprano:5555)
         Soprano Container (NVIDIA GTX 1660 + CUDA)
                ↓ Audio data
         RVC Processing
                ↓
         HTTP Stream Response

Why Two Containers?

CUDA and ROCm runtimes cannot coexist in a single container due to conflicting driver libraries and system dependencies. The dual-container approach provides:

  • Clean GPU runtime separation (CUDA vs ROCm)
  • Independent scaling and resource management
  • Minimal latency overhead (~1-5ms Docker networking, negligible compared to ZMQ serialization)

Container Details

Soprano Container (miku-soprano-tts)

  • Base Image: nvidia/cuda:11.8.0-runtime-ubuntu22.04
  • GPU: NVIDIA GTX 1660 (CUDA 11.8)
  • Python: 3.11
  • Runtime: NVIDIA Docker runtime
  • Port: 5555 (ZMQ, internal only)
  • Model: ekwek1/soprano (installed from source)
  • Backend: lmdeploy 0.11.1

RVC Container (miku-rvc-api)

  • Base Image: rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0
  • GPU: AMD RX 6800 (ROCm 6.2)
  • Python: 3.10
  • Runtime: Native Docker with ROCm device passthrough
  • Port: 8765 (HTTP, exposed externally)
  • Model: MikuAI_e210_s6300.pth
  • F0 Method: rmvpe

Prerequisites

Hardware Requirements

  • CPU: Multi-core (AMD FX 6100 or better)
  • GPU 1: NVIDIA GPU with CUDA support (tested: GTX 1660, 6GB VRAM)
  • GPU 2: AMD GPU with ROCm support (tested: RX 6800, 16GB VRAM)
  • RAM: 16GB minimum
  • Disk: 20GB for containers and models

Software Requirements

  • Docker Engine 20.10+
  • Docker Compose 1.29+
  • NVIDIA Container Toolkit (for CUDA GPU)
  • ROCm drivers installed on host

Install NVIDIA Container Toolkit

# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Verify GPU Setup

# Test NVIDIA GPU
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

# Test AMD GPU
docker run --rm --device=/dev/kfd --device=/dev/dri rocm/pytorch:latest rocm-smi

Configuration

GPU Device IDs

The docker-compose.yml uses device IDs to assign GPUs. Verify your GPU order:

# NVIDIA GPUs
nvidia-smi -L
# Example output:
# GPU 0: NVIDIA GeForce GTX 1660 (UUID: GPU-...)
# GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-...)

# AMD GPUs
rocm-smi --showproductname
# Example output:
# GPU[0]: Navi 21 [Radeon RX 6800]

Update device IDs in docker-compose.yml:

services:
  soprano:
    environment:
      - NVIDIA_VISIBLE_DEVICES=1  # <-- Set your NVIDIA GPU ID
      
  rvc:
    environment:
      - ROCR_VISIBLE_DEVICES=0  # <-- Set your AMD GPU ID

Model Files

Ensure the following files exist before building:

soprano_to_rvc/
├── soprano/                          # Soprano source (git submodule or clone)
├── Retrieval-based-Voice-Conversion-WebUI/  # RVC WebUI
├── models/
│   ├── MikuAI_e210_s6300.pth        # RVC voice model
│   └── added_IVF512_Flat_nprobe_1_MikuAI_v2.index  # RVC index
├── soprano_server.py                # Soprano TTS server
└── soprano_rvc_api.py               # RVC HTTP API

Environment Variables

Variable Default Description
SOPRANO_SERVER tcp://soprano:5555 ZMQ endpoint for Soprano container
NVIDIA_VISIBLE_DEVICES 1 NVIDIA GPU device ID
ROCR_VISIBLE_DEVICES 0 AMD GPU device ID
HSA_OVERRIDE_GFX_VERSION 10.3.0 ROCm architecture override for RX 6800

Build and Deploy

Build Containers

cd soprano_to_rvc

# Build both containers
docker-compose build

# Or build individually
docker build -f Dockerfile.soprano -t miku-soprano:latest .
docker build -f Dockerfile.rvc -t miku-rvc:latest .

Start Services

# Start in foreground (see logs)
docker-compose up

# Start in background
docker-compose up -d

# View logs
docker-compose logs -f

# View logs for specific service
docker-compose logs -f soprano
docker-compose logs -f rvc

Stop Services

# Stop containers
docker-compose down

# Stop and remove volumes
docker-compose down -v

Testing

Health Checks

Both containers have health checks that run automatically:

# Check container health
docker ps

# Manual health check
curl http://localhost:8765/health

# Expected response:
# {
#   "status": "healthy",
#   "soprano_connected": true,
#   "rvc_initialized": true,
#   "pipeline_ready": true
# }

API Test

# Test full pipeline (TTS + RVC)
curl -X POST http://localhost:8765/api/speak \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, I am Miku!"}' \
  -o test_output.wav

# Play the audio
ffplay test_output.wav

Test Soprano Only

# Test Soprano without RVC
curl -X POST http://localhost:8765/api/speak_soprano_only \
  -H "Content-Type: application/json" \
  -d '{"text": "Testing Soprano TTS"}' \
  -o soprano_test.wav

View Status

# Pipeline status
curl http://localhost:8765/api/status

# Returns:
# {
#   "initialized": true,
#   "soprano_connected": true,
#   "config": { ... },
#   "timings": { ... }
# }

Performance

Expected Performance (from bare metal testing)

Metric Value
Overall Realtime Factor 0.95x (average)
Peak Performance 1.12x realtime
Soprano (isolated) 16.48x realtime
Soprano (via ZMQ) ~7.10x realtime
RVC Processing 166-196ms per 200ms block
ZMQ Transfer Overhead ~0.7s for full audio

Performance Targets:

  • Short texts (1-2 sentences): 1.00-1.12x realtime
  • Medium texts (3-5 sentences): 0.93-1.07x realtime
  • Long texts (>5 sentences): 1.01-1.12x realtime

Monitor Performance

# GPU utilization
watch -n 1 nvidia-smi  # NVIDIA
watch -n 1 rocm-smi    # AMD

# Container resource usage
docker stats miku-soprano-tts miku-rvc-api

# View detailed logs
docker-compose logs -f --tail=100

Troubleshooting

Container Won't Start

Soprano container fails:

# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

# Check NVIDIA device ID
nvidia-smi -L

# Rebuild without cache
docker-compose build --no-cache soprano

RVC container fails:

# Check ROCm devices
ls -la /dev/kfd /dev/dri

# Check ROCm drivers
rocm-smi

# Check GPU architecture
rocm-smi --showproductname

# Rebuild without cache
docker-compose build --no-cache rvc

Health Check Fails

Soprano not responding:

# Check Soprano logs
docker-compose logs soprano

# Common issues:
# - Model download in progress (first run takes time)
# - CUDA out of memory (GTX 1660 has 6GB)
# - lmdeploy compilation issues

# Restart container
docker-compose restart soprano

RVC health check fails:

# Check RVC logs
docker-compose logs rvc

# Test health endpoint manually
docker exec miku-rvc-api curl -f http://localhost:8765/health

# Common issues:
# - Soprano container not ready (wait for startup)
# - ZMQ connection timeout
# - ROCm initialization issues

Performance Issues

Slower than expected:

# Check GPU utilization
docker exec miku-soprano-tts nvidia-smi
docker exec miku-rvc-api rocm-smi

# Check CPU usage
docker stats

# Common causes:
# - CPU bottleneck (rmvpe f0method is CPU-bound)
# - Cold start kernel compilation (first 5 jobs slow)
# - Insufficient GPU memory

Out of memory:

# Check VRAM usage
docker exec miku-soprano-tts nvidia-smi
docker exec miku-rvc-api rocm-smi --showmeminfo vram

# Solutions:
# - Reduce batch size in config
# - Restart containers to clear VRAM
# - Use smaller model

Network Issues

ZMQ connection timeout:

# Verify network connectivity
docker exec miku-rvc-api ping soprano

# Test ZMQ port
docker exec miku-rvc-api nc -zv soprano 5555

# Check network
docker network inspect miku-voice-network

Logs and Debugging

# Enable debug logs
# Edit docker-compose.yml, add:
services:
  rvc:
    environment:
      - LOG_LEVEL=DEBUG

# Restart with debug logs
docker-compose down
docker-compose up

# Export logs
docker-compose logs > debug_logs.txt

# Shell into container
docker exec -it miku-soprano-tts bash
docker exec -it miku-rvc-api bash

Integration with Discord Bot

To integrate with the main Miku Discord bot:

  1. Add to main docker-compose.yml:
services:
  miku-voice:
    image: miku-rvc:latest
    container_name: miku-voice-api
    # ... copy config from soprano_to_rvc/docker-compose.yml
    
  miku-soprano:
    image: miku-soprano:latest
    # ...
  1. Update bot code to call voice API:
import requests

# In voice channel handler
response = requests.post(
    "http://miku-voice-api:8765/api/speak",
    json={"text": "Hello from Discord!"}
)

# Stream audio to Discord voice channel
audio_data = response.content
# ... send to Discord voice connection
  1. Configure network:
networks:
  miku-network:
    name: miku-internal
    driver: bridge

Maintenance

Update Soprano

# Pull latest Soprano source
cd soprano
git pull origin main
cd ..

# Rebuild container
docker-compose build --no-cache soprano
docker-compose up -d soprano

Update RVC

# Update RVC WebUI
cd Retrieval-based-Voice-Conversion-WebUI
git pull origin main
cd ..

# Rebuild container
docker-compose build --no-cache rvc
docker-compose up -d rvc

Clean Up

# Remove stopped containers
docker-compose down

# Remove images
docker rmi miku-soprano:latest miku-rvc:latest

# Clean Docker system
docker system prune -a

Performance Tuning

Optimize for Speed

Edit soprano_rvc_config.json:

{
  "block_time": 0.15,        // Smaller blocks (more overhead)
  "crossfade_time": 0.03,    // Reduce crossfade
  "f0method": "rmvpe",       // Fast but CPU-bound
  "extra_time": 1.5          // Reduce context
}

Optimize for Quality

{
  "block_time": 0.25,        // Larger blocks (better quality)
  "crossfade_time": 0.08,    // Smoother transitions
  "f0method": "rmvpe",       // Best quality
  "extra_time": 2.0          // More context
}

Known Issues

  1. First 5 jobs slow: ROCm MIOpen JIT kernel compilation causes initial slowdown. Subsequent jobs run at full speed.

  2. Cold start latency: Container startup takes 60-120s for model loading and CUDA/ROCm initialization.

  3. Python path shadowing: Soprano must be installed in editable mode with path fixes (handled in Dockerfile).

  4. CPU bottleneck: rmvpe f0method is CPU-bound. Faster GPU methods require kernel compilation and may be slower.

License

This setup uses:

Credits

  • Soprano TTS by ekwek1
  • RVC (Retrieval-based Voice Conversion) by RVC-Project
  • Implementation and dual-GPU architecture by Miku Discord Bot team