add: absorb soprano_to_rvc as regular subdirectory

Voice conversion pipeline (Soprano TTS → RVC) with Docker support.
Previously tracked as bare gitlink; removed .git/ directories and
absorbed into main repo for unified tracking.

Includes: Soprano TTS, RVC WebUI integration, Docker configs,
WebSocket API, and benchmark scripts.
Updated .gitignore to exclude large model weights (*.pth, *.pt, *.onnx, *.index).
287 files (3.1GB of ML weights properly excluded via gitignore).
This commit is contained in:
2026-03-04 00:24:53 +02:00
parent 34b184a05a
commit 8ca716029e
287 changed files with 47102 additions and 0 deletions

View File

@@ -0,0 +1,283 @@
# Docker Container Dependencies
This document lists all Python packages explicitly installed in each Docker container, matching the exact versions from the working virtual environments.
## Soprano Container (CUDA/Python 3.11.14)
### Base Image
- `nvidia/cuda:11.8.0-runtime-ubuntu22.04`
- Python 3.11.14 (explicitly installed via deadsnakes PPA)
- pip 25.3
### PyTorch Stack (CUDA 11.8)
Installed from PyTorch index (https://download.pytorch.org/whl/cu118):
- `torch==2.7.1+cu118`
- `torchaudio==2.7.1+cu118`
- `torchvision==0.22.1+cu118`
### Core Dependencies (from requirements-soprano.txt)
```
fastapi==0.128.0
uvicorn==0.40.0
pyzmq==27.1.0
numpy==2.4.1
pydantic==2.12.5
python-multipart==0.0.21
sounddevice==0.5.3
pydub==0.25.1
```
### LMDeploy and ML Dependencies
```
lmdeploy==0.11.1
transformers==4.57.5
tokenizers==0.22.2
huggingface-hub==0.36.0
safetensors==0.7.0
accelerate==1.12.0
sentencepiece==0.2.1
einops==0.8.1
peft==0.14.0
scipy==1.17.0
```
### Soprano TTS
- Installed from source: `pip install -e '.[lmdeploy]'` from `/app/soprano/`
- Version: `soprano-tts==0.1.0` (from local editable install)
- Git commit: `f2e7c19b7de51e18d6d977dd0d1027c4b9967f50`
- Features: Hallucination detection and automatic regeneration
### Supporting Libraries
```
protobuf==6.33.4
tiktoken==0.12.0
requests==2.32.5
tqdm==4.67.1
PyYAML==6.0.3
Jinja2==3.1.6
click==8.3.1
psutil==7.2.1
packaging==25.0
filelock==3.20.3
fsspec==2026.1.0
regex==2026.1.14
certifi==2026.1.4
charset-normalizer==3.4.4
urllib3==2.6.3
idna==3.11
```
---
## RVC Container (ROCm/Python 3.10.19)
### Base Image
- `rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0`
- Python 3.10.x (from base image, targeting 3.10.19)
- pip 24.0
- PyTorch 2.5.1+rocm6.2 (pre-installed in base)
- TorchAudio 2.5.1+rocm6.2 (pre-installed in base)
- TorchVision 0.20.1+rocm6.2 (pre-installed in base)
### Core Dependencies (from requirements-rvc.txt)
```
fastapi==0.128.0
uvicorn==0.40.0
pyzmq==27.1.0
numpy==1.23.5
pydantic==2.12.5
python-multipart==0.0.21
```
### Audio Processing Stack
```
librosa==0.10.2
soundfile==0.13.1
sounddevice==0.5.3
pydub==0.25.1
audioread==3.1.0
resampy==0.4.3
soxr==1.0.0
pyworld==0.3.2
praat-parselmouth==0.4.7
torchcrepe==0.0.23
torchfcpe==0.0.4
```
### RVC Core Dependencies
```
fairseq==0.12.2
faiss-cpu==1.7.3
gradio==3.48.0
gradio_client==0.6.1
numba==0.56.4
llvmlite==0.39.0
local-attention==1.11.2
```
### LMDeploy and ML Dependencies
```
lmdeploy==0.11.1
transformers==4.57.3
tokenizers==0.22.2
huggingface-hub==0.36.0
safetensors==0.7.0
accelerate==1.12.0
sentencepiece==0.2.1
einops==0.8.1
peft==0.14.0
```
### Scientific Computing
```
scipy==1.15.3
scikit-learn==1.7.2
matplotlib==3.10.8
pandas==2.3.3
```
### Additional Dependencies
```
av==16.1.0
pillow==10.4.0
omegaconf==2.0.6
hydra-core==1.0.7
python-dotenv==1.2.1
ray==2.53.0
```
### Supporting Libraries
```
protobuf==6.33.4
tiktoken==0.12.0
requests==2.32.5
tqdm==4.67.1
PyYAML==6.0.3
Jinja2==3.1.6
click==8.3.1
psutil==7.2.1
packaging==25.0
filelock==3.20.0
fsspec==2025.10.0
regex==2025.11.3
certifi==2026.1.4
charset-normalizer==3.4.4
urllib3==2.6.3
idna==3.11
```
---
## Key Differences Between Containers
### Python Versions
- **Soprano**: Python 3.11.14 (installed via deadsnakes PPA)
- **RVC**: Python 3.10.x (from ROCm base image, targeting 3.10.19)
### pip Versions
- **Soprano**: pip 25.3
- **RVC**: pip 24.0
### PyTorch Versions
- **Soprano**: PyTorch 2.7.1 with CUDA 11.8
- **RVC**: PyTorch 2.5.1 with ROCm 6.2
### NumPy Versions
- **Soprano**: NumPy 2.4.1 (latest compatible with PyTorch 2.7.1)
- **RVC**: NumPy 1.23.5 (required for fairseq/RVC compatibility)
### FastAPI Versions
- **Both**: FastAPI 0.128.0 (updated from 0.88.0 in original requirements.txt)
- **Both**: Starlette 0.50.0 (dependency of FastAPI 0.128.0)
### Transformers Versions
- **Soprano**: transformers==4.57.5
- **RVC**: transformers==4.57.3
- Slight version difference but compatible
### Unique to Soprano
- `soprano-tts` (editable install from source)
- CUDA-specific NVIDIA packages (cublas, cudnn, etc.)
### Unique to RVC
- `fairseq` (sequence-to-sequence modeling)
- `faiss-cpu` (similarity search)
- `gradio` (web UI framework)
- Audio processing: `librosa`, `soundfile`, `pyworld`, `praat-parselmouth`
- Voice pitch: `torchcrepe`, `torchfcpe`
- Performance: `numba`, `llvmlite`
- Attention: `local-attention`
- Configuration: `omegaconf`, `hydra-core`
---
## Installation Order
### Soprano Container
1. System packages (Python 3.11.14 via deadsnakes PPA, git, build tools)
2. Upgrade pip to 25.3
3. PyTorch with CUDA 11.8
4. Dependencies from `requirements-soprano.txt`
5. Soprano from source with `pip install -e '.[lmdeploy]'`
### RVC Container
1. System packages (ffmpeg, libsndfile1, curl)
2. Set pip to 24.0 (downgrade if needed to match venv)
3. PyTorch with ROCm 6.2 (pre-installed in base image)
4. Dependencies from `requirements-rvc.txt`
5. Models copied to `/app/models/`
---
## Verification
After building containers, verify installations:
### Soprano Container
```bash
docker exec miku-soprano-tts python3 -c "
import torch
import soprano
import lmdeploy
print(f'PyTorch: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'Soprano: {soprano.__version__}')
print(f'LMDeploy: {lmdeploy.__version__}')
"
```
### RVC Container
```bash
docker exec miku-rvc-api python3 -c "
import torch
import fairseq
import librosa
import pyworld
print(f'PyTorch: {torch.__version__}')
print(f'ROCm available: {torch.cuda.is_available()}')
print(f'Fairseq: {fairseq.__version__}')
print(f'Librosa: {librosa.__version__}')
"
```
---
## Maintenance
When updating packages:
1. **Update venv first**: Test in local virtual environment
2. **Export packages**: `pip list --format=freeze > packages.txt`
3. **Update requirements**: Update `requirements-soprano.txt` or `requirements-rvc.txt`
4. **Rebuild container**: `docker-compose build --no-cache <service>`
5. **Test**: Verify functionality matches bare metal
---
## Notes
- All version numbers are explicitly pinned for reproducibility
- PyTorch versions are installed from specific indexes (CUDA/ROCm)
- Soprano is installed as editable package to match bare metal setup
- ROCm base image comes with PyTorch pre-installed (not in requirements file)
- CUDA packages are handled by PyTorch wheel (nvidia-cublas, nvidia-cudnn, etc.)