add: absorb soprano_to_rvc as regular subdirectory

Voice conversion pipeline (Soprano TTS → RVC) with Docker support. Previously tracked as bare gitlink; removed .git/ directories and absorbed into main repo for unified tracking. Includes: Soprano TTS, RVC WebUI integration, Docker configs, WebSocket API, and benchmark scripts. Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index). 287 files (3.1GB of ML weights properly excluded via gitignore).
2026-03-04 00:24:53 +02:00
parent 34b184a05a
commit 8ca716029e
287 changed files with 47102 additions and 0 deletions
--- a/soprano_to_rvc/DOCKER_DEPENDENCIES.md
+++ b/soprano_to_rvc/DOCKER_DEPENDENCIES.md
@@ -0,0 +1,283 @@
+# Docker Container Dependencies
+
+This document lists all Python packages explicitly installed in each Docker container, matching the exact versions from the working virtual environments.
+
+## Soprano Container (CUDA/Python 3.11.14)
+
+### Base Image
+- `nvidia/cuda:11.8.0-runtime-ubuntu22.04`
+- Python 3.11.14 (explicitly installed via deadsnakes PPA)
+- pip 25.3
+
+### PyTorch Stack (CUDA 11.8)
+Installed from PyTorch index (https://download.pytorch.org/whl/cu118):
+- `torch==2.7.1+cu118`
+- `torchaudio==2.7.1+cu118`
+- `torchvision==0.22.1+cu118`
+
+### Core Dependencies (from requirements-soprano.txt)
+```
+fastapi==0.128.0
+uvicorn==0.40.0
+pyzmq==27.1.0
+numpy==2.4.1
+pydantic==2.12.5
+python-multipart==0.0.21
+sounddevice==0.5.3
+pydub==0.25.1
+```
+
+### LMDeploy and ML Dependencies
+```
+lmdeploy==0.11.1
+transformers==4.57.5
+tokenizers==0.22.2
+huggingface-hub==0.36.0
+safetensors==0.7.0
+accelerate==1.12.0
+sentencepiece==0.2.1
+einops==0.8.1
+peft==0.14.0
+scipy==1.17.0
+```
+
+### Soprano TTS
+- Installed from source: `pip install -e '.[lmdeploy]'` from `/app/soprano/`
+- Version: `soprano-tts==0.1.0` (from local editable install)
+- Git commit: `f2e7c19b7de51e18d6d977dd0d1027c4b9967f50`
+- Features: Hallucination detection and automatic regeneration
+
+### Supporting Libraries
+```
+protobuf==6.33.4
+tiktoken==0.12.0
+requests==2.32.5
+tqdm==4.67.1
+PyYAML==6.0.3
+Jinja2==3.1.6
+click==8.3.1
+psutil==7.2.1
+packaging==25.0
+filelock==3.20.3
+fsspec==2026.1.0
+regex==2026.1.14
+certifi==2026.1.4
+charset-normalizer==3.4.4
+urllib3==2.6.3
+idna==3.11
+```
+
+---
+
+## RVC Container (ROCm/Python 3.10.19)
+
+### Base Image
+- `rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0`
+- Python 3.10.x (from base image, targeting 3.10.19)
+- pip 24.0
+- PyTorch 2.5.1+rocm6.2 (pre-installed in base)
+- TorchAudio 2.5.1+rocm6.2 (pre-installed in base)
+- TorchVision 0.20.1+rocm6.2 (pre-installed in base)
+
+### Core Dependencies (from requirements-rvc.txt)
+```
+fastapi==0.128.0
+uvicorn==0.40.0
+pyzmq==27.1.0
+numpy==1.23.5
+pydantic==2.12.5
+python-multipart==0.0.21
+```
+
+### Audio Processing Stack
+```
+librosa==0.10.2
+soundfile==0.13.1
+sounddevice==0.5.3
+pydub==0.25.1
+audioread==3.1.0
+resampy==0.4.3
+soxr==1.0.0
+pyworld==0.3.2
+praat-parselmouth==0.4.7
+torchcrepe==0.0.23
+torchfcpe==0.0.4
+```
+
+### RVC Core Dependencies
+```
+fairseq==0.12.2
+faiss-cpu==1.7.3
+gradio==3.48.0
+gradio_client==0.6.1
+numba==0.56.4
+llvmlite==0.39.0
+local-attention==1.11.2
+```
+
+### LMDeploy and ML Dependencies
+```
+lmdeploy==0.11.1
+transformers==4.57.3
+tokenizers==0.22.2
+huggingface-hub==0.36.0
+safetensors==0.7.0
+accelerate==1.12.0
+sentencepiece==0.2.1
+einops==0.8.1
+peft==0.14.0
+```
+
+### Scientific Computing
+```
+scipy==1.15.3
+scikit-learn==1.7.2
+matplotlib==3.10.8
+pandas==2.3.3
+```
+
+### Additional Dependencies
+```
+av==16.1.0
+pillow==10.4.0
+omegaconf==2.0.6
+hydra-core==1.0.7
+python-dotenv==1.2.1
+ray==2.53.0
+```
+
+### Supporting Libraries
+```
+protobuf==6.33.4
+tiktoken==0.12.0
+requests==2.32.5
+tqdm==4.67.1
+PyYAML==6.0.3
+Jinja2==3.1.6
+click==8.3.1
+psutil==7.2.1
+packaging==25.0
+filelock==3.20.0
+fsspec==2025.10.0
+regex==2025.11.3
+certifi==2026.1.4
+charset-normalizer==3.4.4
+urllib3==2.6.3
+idna==3.11
+```
+
+---
+
+## Key Differences Between Containers
+
+### Python Versions
+- **Soprano**: Python 3.11.14 (installed via deadsnakes PPA)
+- **RVC**: Python 3.10.x (from ROCm base image, targeting 3.10.19)
+
+### pip Versions
+- **Soprano**: pip 25.3
+- **RVC**: pip 24.0
+
+### PyTorch Versions
+- **Soprano**: PyTorch 2.7.1 with CUDA 11.8
+- **RVC**: PyTorch 2.5.1 with ROCm 6.2
+
+### NumPy Versions
+- **Soprano**: NumPy 2.4.1 (latest compatible with PyTorch 2.7.1)
+- **RVC**: NumPy 1.23.5 (required for fairseq/RVC compatibility)
+
+### FastAPI Versions
+- **Both**: FastAPI 0.128.0 (updated from 0.88.0 in original requirements.txt)
+- **Both**: Starlette 0.50.0 (dependency of FastAPI 0.128.0)
+
+### Transformers Versions
+- **Soprano**: transformers==4.57.5
+- **RVC**: transformers==4.57.3
+- Slight version difference but compatible
+
+### Unique to Soprano
+- `soprano-tts` (editable install from source)
+- CUDA-specific NVIDIA packages (cublas, cudnn, etc.)
+
+### Unique to RVC
+- `fairseq` (sequence-to-sequence modeling)
+- `faiss-cpu` (similarity search)
+- `gradio` (web UI framework)
+- Audio processing: `librosa`, `soundfile`, `pyworld`, `praat-parselmouth`
+- Voice pitch: `torchcrepe`, `torchfcpe`
+- Performance: `numba`, `llvmlite`
+- Attention: `local-attention`
+- Configuration: `omegaconf`, `hydra-core`
+
+---
+
+## Installation Order
+
+### Soprano Container
+1. System packages (Python 3.11.14 via deadsnakes PPA, git, build tools)
+2. Upgrade pip to 25.3
+3. PyTorch with CUDA 11.8
+4. Dependencies from `requirements-soprano.txt`
+5. Soprano from source with `pip install -e '.[lmdeploy]'`
+
+### RVC Container
+1. System packages (ffmpeg, libsndfile1, curl)
+2. Set pip to 24.0 (downgrade if needed to match venv)
+3. PyTorch with ROCm 6.2 (pre-installed in base image)
+4. Dependencies from `requirements-rvc.txt`
+5. Models copied to `/app/models/`
+
+---
+
+## Verification
+
+After building containers, verify installations:
+
+### Soprano Container
+```bash
+docker exec miku-soprano-tts python3 -c "
+import torch
+import soprano
+import lmdeploy
+print(f'PyTorch: {torch.__version__}')
+print(f'CUDA available: {torch.cuda.is_available()}')
+print(f'Soprano: {soprano.__version__}')
+print(f'LMDeploy: {lmdeploy.__version__}')
+"
+```
+
+### RVC Container
+```bash
+docker exec miku-rvc-api python3 -c "
+import torch
+import fairseq
+import librosa
+import pyworld
+print(f'PyTorch: {torch.__version__}')
+print(f'ROCm available: {torch.cuda.is_available()}')
+print(f'Fairseq: {fairseq.__version__}')
+print(f'Librosa: {librosa.__version__}')
+"
+```
+
+---
+
+## Maintenance
+
+When updating packages:
+
+1. **Update venv first**: Test in local virtual environment
+2. **Export packages**: `pip list --format=freeze > packages.txt`
+3. **Update requirements**: Update `requirements-soprano.txt` or `requirements-rvc.txt`
+4. **Rebuild container**: `docker-compose build --no-cache <service>`
+5. **Test**: Verify functionality matches bare metal
+
+---
+
+## Notes
+
+- All version numbers are explicitly pinned for reproducibility
+- PyTorch versions are installed from specific indexes (CUDA/ROCm)
+- Soprano is installed as editable package to match bare metal setup
+- ROCm base image comes with PyTorch pre-installed (not in requirements file)
+- CUDA packages are handled by PyTorch wheel (nvidia-cublas, nvidia-cudnn, etc.)