Files
miku-discord/soprano_to_rvc/RVC_CONTAINER_FIXES.md
koko210Serve 8ca716029e add: absorb soprano_to_rvc as regular subdirectory
Voice conversion pipeline (Soprano TTS → RVC) with Docker support.
Previously tracked as bare gitlink; removed .git/ directories and
absorbed into main repo for unified tracking.

Includes: Soprano TTS, RVC WebUI integration, Docker configs,
WebSocket API, and benchmark scripts.
Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index).
287 files (3.1GB of ML weights properly excluded via gitignore).
2026-03-04 00:24:53 +02:00

4.5 KiB

RVC Container Build Fixes

Summary

Successfully built RVC Docker container (63.6GB) with AMD RX 6800 GPU support and ROCm 6.4.

Critical Issues and Solutions

1. PyTorch Version Override

Problem: pip installing requirements upgraded torch 2.5.1+git8420923 (ROCm) to 2.8.0 (CUDA)

Root Cause: Base image rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.5.1 has custom torch build not available in PyPI

Solution: Created constraints.txt to pin exact torch version:

torch==2.5.1+git8420923
torchvision==0.20.1a0+04d8fc4
torchaudio

2. Torchaudio Compatibility

Problem: torchaudio 2.5.1 (standard) requires CUDA libraries, crashes with "libtorch_cuda.so not found"

Root Cause: No torchaudio 2.5.1+rocm6.4 available in PyTorch repository

Solution: Install torchaudio 2.5.1+rocm6.2 (ABI compatible with ROCm 6.4):

pip install --no-cache-dir torchaudio==2.5.1+rocm6.2 --index-url https://download.pytorch.org/whl/rocm6.2

3. scipy/numpy/numba Version Conflicts

Problem:

  • scipy 1.10.1 installed with numpy 1.21.2 → ABI mismatch
  • numba required numpy <1.23, but scipy needs >=1.19.5
  • Upgrading scipy caused numba to break

Root Cause: requirements-rvc.txt had mismatched versions from different dependency resolution

Solution: Force install matching versions from bare metal:

pip install --no-cache-dir numpy==1.23.5 scipy==1.15.3 numba==0.56.4

4. apex C++ Extension Incompatibility

Problem: apex fused_layer_norm_cuda extension failed with undefined symbol error:

ImportError: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Root Cause: apex compiled for PyTorch 2.3, incompatible with 2.5.1

Solution: Remove apex (not needed for inference):

pip uninstall -y apex || true

Final Dockerfile RUN Command

RUN pip install --no-cache-dir pip==24.0 && \
    pip install --no-cache-dir -c constraints.txt -r requirements-rvc.txt && \
    pip uninstall -y apex || true && \
    pip install --no-cache-dir torchaudio==2.5.1+rocm6.2 --index-url https://download.pytorch.org/whl/rocm6.2 && \
    pip install --no-cache-dir numpy==1.23.5 scipy==1.15.3 numba==0.56.4

Docker Compose Configuration

GPU Passthrough (ROCm)

rvc:
  devices:
    - /dev/kfd:/dev/kfd
    - /dev/dri:/dev/dri
  group_add:
    - "989"  # render group (numeric for container compatibility)
    - "985"  # video group
  environment:
    - HSA_OVERRIDE_GFX_VERSION=10.3.0  # RX 6800 (gfx1030)
    - HSA_FORCE_FINE_GRAIN_PCIE=1

Verification

Successful Startup Logs

2026-01-15 20:07:41 | INFO | configs.config | Found GPU AMD Radeon RX 6800
2026-01-15 20:07:41 | INFO | configs.config | Half-precision floating-point: True, device: cuda:0
2026-01-15 20:07:41 | INFO | __main__ | ✓ Connected to Soprano server at tcp://soprano:5555
2026-01-15 20:07:49 | INFO | __main__ | ✓ RVC model loaded (version: v2, target SR: 48000Hz)
2026-01-15 20:07:49 | INFO | __main__ | ✓ Pipeline ready! API accepting requests on port 8765
INFO:     Uvicorn running on http://0.0.0.0:8765 (Press CTRL+C to quit)

Health Check

$ curl http://localhost:8765/health
{
  "status": "healthy",
  "soprano_connected": true,
  "rvc_initialized": true,
  "pipeline_ready": true
}

Container Stats

  • Base Image: rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.5.1 (~12GB)
  • Final Size: 63.6GB
  • Python: 3.10.x
  • pip: 24.0
  • PyTorch: 2.5.1+git8420923 (ROCm 6.4)
  • Torchaudio: 2.5.1+rocm6.2
  • GPU: AMD RX 6800 (16GB VRAM, gfx1030)
  • Status: Healthy and working

Build Time

  • Multi-stage build: ~75 minutes
  • Single command fixes in running container: ~2 minutes

Lessons Learned

  1. Base image PyTorch versions are sacred - Don't let pip "upgrade" them
  2. Constraints files are essential for complex PyTorch environments
  3. ROCm versions don't always match - 6.2 torchaudio works with 6.4 torch
  4. apex is problematic - Remove when not needed
  5. Numeric group IDs required for GPU device access in containers
  6. Manual container fixes can identify solutions before long rebuilds
  7. Multi-stage builds don't save much space when base image is large

Next Steps

  • Test GPU performance (target: >0.9x realtime)
  • Verify end-to-end synthesis pipeline
  • Archive builder stage to /4TB/Docker/
  • Document complete deployment process