Files

koko210Serve 8ca716029e add: absorb soprano_to_rvc as regular subdirectory

Voice conversion pipeline (Soprano TTS → RVC) with Docker support.
Previously tracked as bare gitlink; removed .git/ directories and
absorbed into main repo for unified tracking.

Includes: Soprano TTS, RVC WebUI integration, Docker configs,
WebSocket API, and benchmark scripts.
Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index).
287 files (3.1GB of ML weights properly excluded via gitignore).

2026-03-04 00:24:53 +02:00

4.5 KiB

Raw Blame History

RVC Container Build Fixes

Summary

Successfully built RVC Docker container (63.6GB) with AMD RX 6800 GPU support and ROCm 6.4.

Critical Issues and Solutions

1. PyTorch Version Override

Problem: pip installing requirements upgraded torch 2.5.1+git8420923 (ROCm) to 2.8.0 (CUDA)

Root Cause: Base image rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.5.1 has custom torch build not available in PyPI

Solution: Created constraints.txt to pin exact torch version:

torch==2.5.1+git8420923
torchvision==0.20.1a0+04d8fc4
torchaudio

2. Torchaudio Compatibility

Problem: torchaudio 2.5.1 (standard) requires CUDA libraries, crashes with "libtorch_cuda.so not found"

Root Cause: No torchaudio 2.5.1+rocm6.4 available in PyTorch repository

Solution: Install torchaudio 2.5.1+rocm6.2 (ABI compatible with ROCm 6.4):

pip install --no-cache-dir torchaudio==2.5.1+rocm6.2 --index-url https://download.pytorch.org/whl/rocm6.2

3. scipy/numpy/numba Version Conflicts

Problem:

scipy 1.10.1 installed with numpy 1.21.2 → ABI mismatch
numba required numpy <1.23, but scipy needs >=1.19.5
Upgrading scipy caused numba to break

Root Cause: requirements-rvc.txt had mismatched versions from different dependency resolution

Solution: Force install matching versions from bare metal:

pip install --no-cache-dir numpy==1.23.5 scipy==1.15.3 numba==0.56.4

4. apex C++ Extension Incompatibility

Problem: apex fused_layer_norm_cuda extension failed with undefined symbol error:

ImportError: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Root Cause: apex compiled for PyTorch 2.3, incompatible with 2.5.1

Solution: Remove apex (not needed for inference):

pip uninstall -y apex || true

Final Dockerfile RUN Command

RUN pip install --no-cache-dir pip==24.0 && \
    pip install --no-cache-dir -c constraints.txt -r requirements-rvc.txt && \
    pip uninstall -y apex || true && \
    pip install --no-cache-dir torchaudio==2.5.1+rocm6.2 --index-url https://download.pytorch.org/whl/rocm6.2 && \
    pip install --no-cache-dir numpy==1.23.5 scipy==1.15.3 numba==0.56.4

Docker Compose Configuration

GPU Passthrough (ROCm)

rvc:
  devices:
    - /dev/kfd:/dev/kfd
    - /dev/dri:/dev/dri
  group_add:
    - "989"  # render group (numeric for container compatibility)
    - "985"  # video group
  environment:
    - HSA_OVERRIDE_GFX_VERSION=10.3.0  # RX 6800 (gfx1030)
    - HSA_FORCE_FINE_GRAIN_PCIE=1

Verification

Successful Startup Logs

2026-01-15 20:07:41 | INFO | configs.config | Found GPU AMD Radeon RX 6800
2026-01-15 20:07:41 | INFO | configs.config | Half-precision floating-point: True, device: cuda:0
2026-01-15 20:07:41 | INFO | __main__ | ✓ Connected to Soprano server at tcp://soprano:5555
2026-01-15 20:07:49 | INFO | __main__ | ✓ RVC model loaded (version: v2, target SR: 48000Hz)
2026-01-15 20:07:49 | INFO | __main__ | ✓ Pipeline ready! API accepting requests on port 8765
INFO:     Uvicorn running on http://0.0.0.0:8765 (Press CTRL+C to quit)

Health Check

$ curl http://localhost:8765/health
{
  "status": "healthy",
  "soprano_connected": true,
  "rvc_initialized": true,
  "pipeline_ready": true
}

Container Stats

Base Image: rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.5.1 (~12GB)
Final Size: 63.6GB
Python: 3.10.x
pip: 24.0
PyTorch: 2.5.1+git8420923 (ROCm 6.4)
Torchaudio: 2.5.1+rocm6.2
GPU: AMD RX 6800 (16GB VRAM, gfx1030)
Status: ✅ Healthy and working

Build Time

Multi-stage build: ~75 minutes
Single command fixes in running container: ~2 minutes

Lessons Learned

Base image PyTorch versions are sacred - Don't let pip "upgrade" them
Constraints files are essential for complex PyTorch environments
ROCm versions don't always match - 6.2 torchaudio works with 6.4 torch
apex is problematic - Remove when not needed
Numeric group IDs required for GPU device access in containers
Manual container fixes can identify solutions before long rebuilds
Multi-stage builds don't save much space when base image is large

Next Steps

Test GPU performance (target: >0.9x realtime)
Verify end-to-end synthesis pipeline
Archive builder stage to /4TB/Docker/
Document complete deployment process

4.5 KiB Raw Blame History