add: absorb soprano_to_rvc as regular subdirectory
Voice conversion pipeline (Soprano TTS → RVC) with Docker support. Previously tracked as bare gitlink; removed .git/ directories and absorbed into main repo for unified tracking. Includes: Soprano TTS, RVC WebUI integration, Docker configs, WebSocket API, and benchmark scripts. Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index). 287 files (3.1GB of ML weights properly excluded via gitignore).
This commit is contained in:
133
soprano_to_rvc/RVC_CONTAINER_FIXES.md
Normal file
133
soprano_to_rvc/RVC_CONTAINER_FIXES.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# RVC Container Build Fixes
|
||||
|
||||
## Summary
|
||||
Successfully built RVC Docker container (63.6GB) with AMD RX 6800 GPU support and ROCm 6.4.
|
||||
|
||||
## Critical Issues and Solutions
|
||||
|
||||
### 1. PyTorch Version Override
|
||||
**Problem**: pip installing requirements upgraded torch 2.5.1+git8420923 (ROCm) to 2.8.0 (CUDA)
|
||||
|
||||
**Root Cause**: Base image `rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.5.1` has custom torch build not available in PyPI
|
||||
|
||||
**Solution**: Created `constraints.txt` to pin exact torch version:
|
||||
```text
|
||||
torch==2.5.1+git8420923
|
||||
torchvision==0.20.1a0+04d8fc4
|
||||
torchaudio
|
||||
```
|
||||
|
||||
### 2. Torchaudio Compatibility
|
||||
**Problem**: torchaudio 2.5.1 (standard) requires CUDA libraries, crashes with "libtorch_cuda.so not found"
|
||||
|
||||
**Root Cause**: No torchaudio 2.5.1+rocm6.4 available in PyTorch repository
|
||||
|
||||
**Solution**: Install torchaudio 2.5.1+rocm6.2 (ABI compatible with ROCm 6.4):
|
||||
```dockerfile
|
||||
pip install --no-cache-dir torchaudio==2.5.1+rocm6.2 --index-url https://download.pytorch.org/whl/rocm6.2
|
||||
```
|
||||
|
||||
### 3. scipy/numpy/numba Version Conflicts
|
||||
**Problem**:
|
||||
- scipy 1.10.1 installed with numpy 1.21.2 → ABI mismatch
|
||||
- numba required numpy <1.23, but scipy needs >=1.19.5
|
||||
- Upgrading scipy caused numba to break
|
||||
|
||||
**Root Cause**: requirements-rvc.txt had mismatched versions from different dependency resolution
|
||||
|
||||
**Solution**: Force install matching versions from bare metal:
|
||||
```bash
|
||||
pip install --no-cache-dir numpy==1.23.5 scipy==1.15.3 numba==0.56.4
|
||||
```
|
||||
|
||||
### 4. apex C++ Extension Incompatibility
|
||||
**Problem**: apex fused_layer_norm_cuda extension failed with undefined symbol error:
|
||||
```
|
||||
ImportError: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
|
||||
```
|
||||
|
||||
**Root Cause**: apex compiled for PyTorch 2.3, incompatible with 2.5.1
|
||||
|
||||
**Solution**: Remove apex (not needed for inference):
|
||||
```dockerfile
|
||||
pip uninstall -y apex || true
|
||||
```
|
||||
|
||||
## Final Dockerfile RUN Command
|
||||
|
||||
```dockerfile
|
||||
RUN pip install --no-cache-dir pip==24.0 && \
|
||||
pip install --no-cache-dir -c constraints.txt -r requirements-rvc.txt && \
|
||||
pip uninstall -y apex || true && \
|
||||
pip install --no-cache-dir torchaudio==2.5.1+rocm6.2 --index-url https://download.pytorch.org/whl/rocm6.2 && \
|
||||
pip install --no-cache-dir numpy==1.23.5 scipy==1.15.3 numba==0.56.4
|
||||
```
|
||||
|
||||
## Docker Compose Configuration
|
||||
|
||||
### GPU Passthrough (ROCm)
|
||||
```yaml
|
||||
rvc:
|
||||
devices:
|
||||
- /dev/kfd:/dev/kfd
|
||||
- /dev/dri:/dev/dri
|
||||
group_add:
|
||||
- "989" # render group (numeric for container compatibility)
|
||||
- "985" # video group
|
||||
environment:
|
||||
- HSA_OVERRIDE_GFX_VERSION=10.3.0 # RX 6800 (gfx1030)
|
||||
- HSA_FORCE_FINE_GRAIN_PCIE=1
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
### Successful Startup Logs
|
||||
```
|
||||
2026-01-15 20:07:41 | INFO | configs.config | Found GPU AMD Radeon RX 6800
|
||||
2026-01-15 20:07:41 | INFO | configs.config | Half-precision floating-point: True, device: cuda:0
|
||||
2026-01-15 20:07:41 | INFO | __main__ | ✓ Connected to Soprano server at tcp://soprano:5555
|
||||
2026-01-15 20:07:49 | INFO | __main__ | ✓ RVC model loaded (version: v2, target SR: 48000Hz)
|
||||
2026-01-15 20:07:49 | INFO | __main__ | ✓ Pipeline ready! API accepting requests on port 8765
|
||||
INFO: Uvicorn running on http://0.0.0.0:8765 (Press CTRL+C to quit)
|
||||
```
|
||||
|
||||
### Health Check
|
||||
```bash
|
||||
$ curl http://localhost:8765/health
|
||||
{
|
||||
"status": "healthy",
|
||||
"soprano_connected": true,
|
||||
"rvc_initialized": true,
|
||||
"pipeline_ready": true
|
||||
}
|
||||
```
|
||||
|
||||
## Container Stats
|
||||
- **Base Image**: rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.5.1 (~12GB)
|
||||
- **Final Size**: 63.6GB
|
||||
- **Python**: 3.10.x
|
||||
- **pip**: 24.0
|
||||
- **PyTorch**: 2.5.1+git8420923 (ROCm 6.4)
|
||||
- **Torchaudio**: 2.5.1+rocm6.2
|
||||
- **GPU**: AMD RX 6800 (16GB VRAM, gfx1030)
|
||||
- **Status**: ✅ Healthy and working
|
||||
|
||||
## Build Time
|
||||
- Multi-stage build: ~75 minutes
|
||||
- Single command fixes in running container: ~2 minutes
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
1. **Base image PyTorch versions are sacred** - Don't let pip "upgrade" them
|
||||
2. **Constraints files are essential** for complex PyTorch environments
|
||||
3. **ROCm versions don't always match** - 6.2 torchaudio works with 6.4 torch
|
||||
4. **apex is problematic** - Remove when not needed
|
||||
5. **Numeric group IDs** required for GPU device access in containers
|
||||
6. **Manual container fixes** can identify solutions before long rebuilds
|
||||
7. **Multi-stage builds** don't save much space when base image is large
|
||||
|
||||
## Next Steps
|
||||
- [ ] Test GPU performance (target: >0.9x realtime)
|
||||
- [ ] Verify end-to-end synthesis pipeline
|
||||
- [ ] Archive builder stage to /4TB/Docker/
|
||||
- [ ] Document complete deployment process
|
||||
Reference in New Issue
Block a user