perf: reduce container sizes and build times
- miku-stt: switch PyTorch CUDA -> CPU-only (~2.5 GB savings) - Silero VAD already runs on CPU via ONNX (onnx=True), CUDA PyTorch was waste - faster-whisper/CTranslate2 uses CUDA directly, no PyTorch GPU needed - torch+torchaudio layer: 3.3 GB -> 796 MB; total image 9+ GB -> 6.83 GB - Tested: Silero VAD loads (ONNX), Whisper loads on cuda, server ready - llama-swap-rocm: add root .dockerignore to fix 31 GB build context - Dockerfile clones all sources from git, never COPYs from context - 19 GB of GGUF model files were being transferred on every build - Now excludes everything (*), near-zero context transfer - anime-face-detector: add .dockerignore to exclude accumulated outputs - api/outputs/ (56 accumulated detection files) no longer baked into image - api/__pycache__/ and images/ also excluded - .gitignore: remove .dockerignore exclusion so these files are tracked
This commit is contained in:
10
.dockerignore
Normal file
10
.dockerignore
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
# .dockerignore for llama-swap-rocm (build context is project root)
|
||||||
|
# The Dockerfile.llamaswap-rocm doesn't COPY anything from the build context —
|
||||||
|
# everything is git-cloned in multi-stage builds. Exclude everything to avoid
|
||||||
|
# sending ~31 GB of unnecessary build context (models, backups, etc.)
|
||||||
|
|
||||||
|
# Exclude everything by default
|
||||||
|
*
|
||||||
|
|
||||||
|
# Only include what the Dockerfile actually needs (nothing from context currently)
|
||||||
|
# If the Dockerfile changes to COPY files, add exceptions here with !filename
|
||||||
3
.gitignore
vendored
3
.gitignore
vendored
@@ -37,9 +37,6 @@ models/*.bin
|
|||||||
*.log
|
*.log
|
||||||
logs/
|
logs/
|
||||||
|
|
||||||
# Docker
|
|
||||||
.dockerignore
|
|
||||||
|
|
||||||
# OS
|
# OS
|
||||||
.DS_Store
|
.DS_Store
|
||||||
Thumbs.db
|
Thumbs.db
|
||||||
|
|||||||
6
face-detector/.dockerignore
Normal file
6
face-detector/.dockerignore
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
# Exclude accumulated detection outputs (volume-mounted at runtime anyway)
|
||||||
|
api/outputs/
|
||||||
|
api/__pycache__/
|
||||||
|
__pycache__/
|
||||||
|
*.pyc
|
||||||
|
images/
|
||||||
@@ -1,12 +1,12 @@
|
|||||||
# RealtimeSTT Container
|
# RealtimeSTT Container
|
||||||
# Uses Faster-Whisper with CUDA for GPU-accelerated inference
|
# Uses Faster-Whisper with CUDA for GPU-accelerated inference
|
||||||
# Includes dual VAD (WebRTC + Silero) for robust voice detection
|
# Includes Silero VAD (ONNX, CPU-only) for robust voice detection
|
||||||
#
|
#
|
||||||
# Updated per RealtimeSTT PR #295:
|
# Updated per RealtimeSTT PR #295:
|
||||||
# - CUDA 12.8.1 (latest stable)
|
# - CUDA 12.8.1 (latest stable)
|
||||||
# - PyTorch 2.7.1 with cu128 support
|
# - PyTorch CPU-only (for Silero VAD tensor ops only - saves ~2.3 GB)
|
||||||
|
# - Faster-Whisper/CTranslate2 uses CUDA directly, no PyTorch GPU needed
|
||||||
# - Ubuntu 24.04 base
|
# - Ubuntu 24.04 base
|
||||||
# - Single Python 3.11 installation
|
|
||||||
|
|
||||||
FROM nvidia/cuda:12.8.1-cudnn-runtime-ubuntu24.04
|
FROM nvidia/cuda:12.8.1-cudnn-runtime-ubuntu24.04
|
||||||
|
|
||||||
@@ -27,7 +27,7 @@ RUN apt-get update && apt-get install -y \
|
|||||||
curl \
|
curl \
|
||||||
&& rm -rf /var/lib/apt/lists/*
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
# Install PyTorch with CUDA 12.8 support (installed first for layer caching)
|
# Install PyTorch CPU-only (for Silero VAD tensor ops - GPU transcription uses CTranslate2 directly)
|
||||||
COPY requirements-gpu-torch.txt .
|
COPY requirements-gpu-torch.txt .
|
||||||
RUN python3 -m pip install --break-system-packages --no-cache-dir -r requirements-gpu-torch.txt
|
RUN python3 -m pip install --break-system-packages --no-cache-dir -r requirements-gpu-torch.txt
|
||||||
|
|
||||||
|
|||||||
@@ -1,5 +1,7 @@
|
|||||||
# PyTorch with CUDA 12.8 support
|
# PyTorch CPU-only (used solely for Silero VAD which runs on CPU)
|
||||||
# Updated per RealtimeSTT PR #295 for better performance
|
# Silero VAD's OnnxWrapper uses torch tensors internally but does not need GPU.
|
||||||
torch==2.7.1+cu128
|
# Faster-Whisper/CTranslate2 handles GPU transcription via CUDA directly.
|
||||||
torchaudio==2.7.1+cu128
|
# torchaudio is required by silero-vad's utils_vad.py top-level import.
|
||||||
--index-url https://download.pytorch.org/whl/cu128
|
torch==2.7.1+cpu
|
||||||
|
torchaudio==2.7.1+cpu
|
||||||
|
--index-url https://download.pytorch.org/whl/cpu
|
||||||
|
|||||||
@@ -9,8 +9,8 @@ ctranslate2>=4.4.0
|
|||||||
# Audio processing
|
# Audio processing
|
||||||
soundfile>=0.12.0
|
soundfile>=0.12.0
|
||||||
|
|
||||||
# VAD - Silero (loaded via torch.hub)
|
# VAD - Silero (loaded via torch.hub, runs on CPU via ONNX)
|
||||||
# No explicit package needed, comes with torch
|
# Requires torch (CPU-only) - see requirements-gpu-torch.txt
|
||||||
|
|
||||||
# Utilities
|
# Utilities
|
||||||
aiohttp>=3.9.0
|
aiohttp>=3.9.0
|
||||||
|
|||||||
Reference in New Issue
Block a user