# RVC Container Build Fixes ## Summary Successfully built RVC Docker container (63.6GB) with AMD RX 6800 GPU support and ROCm 6.4. ## Critical Issues and Solutions ### 1. PyTorch Version Override **Problem**: pip installing requirements upgraded torch 2.5.1+git8420923 (ROCm) to 2.8.0 (CUDA) **Root Cause**: Base image `rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.5.1` has custom torch build not available in PyPI **Solution**: Created `constraints.txt` to pin exact torch version: ```text torch==2.5.1+git8420923 torchvision==0.20.1a0+04d8fc4 torchaudio ``` ### 2. Torchaudio Compatibility **Problem**: torchaudio 2.5.1 (standard) requires CUDA libraries, crashes with "libtorch_cuda.so not found" **Root Cause**: No torchaudio 2.5.1+rocm6.4 available in PyTorch repository **Solution**: Install torchaudio 2.5.1+rocm6.2 (ABI compatible with ROCm 6.4): ```dockerfile pip install --no-cache-dir torchaudio==2.5.1+rocm6.2 --index-url https://download.pytorch.org/whl/rocm6.2 ``` ### 3. scipy/numpy/numba Version Conflicts **Problem**: - scipy 1.10.1 installed with numpy 1.21.2 → ABI mismatch - numba required numpy <1.23, but scipy needs >=1.19.5 - Upgrading scipy caused numba to break **Root Cause**: requirements-rvc.txt had mismatched versions from different dependency resolution **Solution**: Force install matching versions from bare metal: ```bash pip install --no-cache-dir numpy==1.23.5 scipy==1.15.3 numba==0.56.4 ``` ### 4. apex C++ Extension Incompatibility **Problem**: apex fused_layer_norm_cuda extension failed with undefined symbol error: ``` ImportError: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ``` **Root Cause**: apex compiled for PyTorch 2.3, incompatible with 2.5.1 **Solution**: Remove apex (not needed for inference): ```dockerfile pip uninstall -y apex || true ``` ## Final Dockerfile RUN Command ```dockerfile RUN pip install --no-cache-dir pip==24.0 && \ pip install --no-cache-dir -c constraints.txt -r requirements-rvc.txt && \ pip uninstall -y apex || true && \ pip install --no-cache-dir torchaudio==2.5.1+rocm6.2 --index-url https://download.pytorch.org/whl/rocm6.2 && \ pip install --no-cache-dir numpy==1.23.5 scipy==1.15.3 numba==0.56.4 ``` ## Docker Compose Configuration ### GPU Passthrough (ROCm) ```yaml rvc: devices: - /dev/kfd:/dev/kfd - /dev/dri:/dev/dri group_add: - "989" # render group (numeric for container compatibility) - "985" # video group environment: - HSA_OVERRIDE_GFX_VERSION=10.3.0 # RX 6800 (gfx1030) - HSA_FORCE_FINE_GRAIN_PCIE=1 ``` ## Verification ### Successful Startup Logs ``` 2026-01-15 20:07:41 | INFO | configs.config | Found GPU AMD Radeon RX 6800 2026-01-15 20:07:41 | INFO | configs.config | Half-precision floating-point: True, device: cuda:0 2026-01-15 20:07:41 | INFO | __main__ | ✓ Connected to Soprano server at tcp://soprano:5555 2026-01-15 20:07:49 | INFO | __main__ | ✓ RVC model loaded (version: v2, target SR: 48000Hz) 2026-01-15 20:07:49 | INFO | __main__ | ✓ Pipeline ready! API accepting requests on port 8765 INFO: Uvicorn running on http://0.0.0.0:8765 (Press CTRL+C to quit) ``` ### Health Check ```bash $ curl http://localhost:8765/health { "status": "healthy", "soprano_connected": true, "rvc_initialized": true, "pipeline_ready": true } ``` ## Container Stats - **Base Image**: rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.5.1 (~12GB) - **Final Size**: 63.6GB - **Python**: 3.10.x - **pip**: 24.0 - **PyTorch**: 2.5.1+git8420923 (ROCm 6.4) - **Torchaudio**: 2.5.1+rocm6.2 - **GPU**: AMD RX 6800 (16GB VRAM, gfx1030) - **Status**: ✅ Healthy and working ## Build Time - Multi-stage build: ~75 minutes - Single command fixes in running container: ~2 minutes ## Lessons Learned 1. **Base image PyTorch versions are sacred** - Don't let pip "upgrade" them 2. **Constraints files are essential** for complex PyTorch environments 3. **ROCm versions don't always match** - 6.2 torchaudio works with 6.4 torch 4. **apex is problematic** - Remove when not needed 5. **Numeric group IDs** required for GPU device access in containers 6. **Manual container fixes** can identify solutions before long rebuilds 7. **Multi-stage builds** don't save much space when base image is large ## Next Steps - [ ] Test GPU performance (target: >0.9x realtime) - [ ] Verify end-to-end synthesis pipeline - [ ] Archive builder stage to /4TB/Docker/ - [ ] Document complete deployment process