perf: reduce container sizes and build times

- miku-stt: switch PyTorch CUDA -> CPU-only (~2.5 GB savings)
  - Silero VAD already runs on CPU via ONNX (onnx=True), CUDA PyTorch was waste
  - faster-whisper/CTranslate2 uses CUDA directly, no PyTorch GPU needed
  - torch+torchaudio layer: 3.3 GB -> 796 MB; total image 9+ GB -> 6.83 GB
  - Tested: Silero VAD loads (ONNX), Whisper loads on cuda, server ready

- llama-swap-rocm: add root .dockerignore to fix 31 GB build context
  - Dockerfile clones all sources from git, never COPYs from context
  - 19 GB of GGUF model files were being transferred on every build
  - Now excludes everything (*), near-zero context transfer

- anime-face-detector: add .dockerignore to exclude accumulated outputs
  - api/outputs/ (56 accumulated detection files) no longer baked into image
  - api/__pycache__/ and images/ also excluded

- .gitignore: remove .dockerignore exclusion so these files are tracked
This commit is contained in:
2026-02-25 14:41:04 +02:00
parent 0edf1ef1c0
commit 9e5511da21
6 changed files with 29 additions and 14 deletions

10
.dockerignore Normal file
View File

@@ -0,0 +1,10 @@
# .dockerignore for llama-swap-rocm (build context is project root)
# The Dockerfile.llamaswap-rocm doesn't COPY anything from the build context —
# everything is git-cloned in multi-stage builds. Exclude everything to avoid
# sending ~31 GB of unnecessary build context (models, backups, etc.)
# Exclude everything by default
*
# Only include what the Dockerfile actually needs (nothing from context currently)
# If the Dockerfile changes to COPY files, add exceptions here with !filename

3
.gitignore vendored
View File

@@ -37,9 +37,6 @@ models/*.bin
*.log *.log
logs/ logs/
# Docker
.dockerignore
# OS # OS
.DS_Store .DS_Store
Thumbs.db Thumbs.db

View File

@@ -0,0 +1,6 @@
# Exclude accumulated detection outputs (volume-mounted at runtime anyway)
api/outputs/
api/__pycache__/
__pycache__/
*.pyc
images/

View File

@@ -1,12 +1,12 @@
# RealtimeSTT Container # RealtimeSTT Container
# Uses Faster-Whisper with CUDA for GPU-accelerated inference # Uses Faster-Whisper with CUDA for GPU-accelerated inference
# Includes dual VAD (WebRTC + Silero) for robust voice detection # Includes Silero VAD (ONNX, CPU-only) for robust voice detection
# #
# Updated per RealtimeSTT PR #295: # Updated per RealtimeSTT PR #295:
# - CUDA 12.8.1 (latest stable) # - CUDA 12.8.1 (latest stable)
# - PyTorch 2.7.1 with cu128 support # - PyTorch CPU-only (for Silero VAD tensor ops only - saves ~2.3 GB)
# - Faster-Whisper/CTranslate2 uses CUDA directly, no PyTorch GPU needed
# - Ubuntu 24.04 base # - Ubuntu 24.04 base
# - Single Python 3.11 installation
FROM nvidia/cuda:12.8.1-cudnn-runtime-ubuntu24.04 FROM nvidia/cuda:12.8.1-cudnn-runtime-ubuntu24.04
@@ -27,7 +27,7 @@ RUN apt-get update && apt-get install -y \
curl \ curl \
&& rm -rf /var/lib/apt/lists/* && rm -rf /var/lib/apt/lists/*
# Install PyTorch with CUDA 12.8 support (installed first for layer caching) # Install PyTorch CPU-only (for Silero VAD tensor ops - GPU transcription uses CTranslate2 directly)
COPY requirements-gpu-torch.txt . COPY requirements-gpu-torch.txt .
RUN python3 -m pip install --break-system-packages --no-cache-dir -r requirements-gpu-torch.txt RUN python3 -m pip install --break-system-packages --no-cache-dir -r requirements-gpu-torch.txt

View File

@@ -1,5 +1,7 @@
# PyTorch with CUDA 12.8 support # PyTorch CPU-only (used solely for Silero VAD which runs on CPU)
# Updated per RealtimeSTT PR #295 for better performance # Silero VAD's OnnxWrapper uses torch tensors internally but does not need GPU.
torch==2.7.1+cu128 # Faster-Whisper/CTranslate2 handles GPU transcription via CUDA directly.
torchaudio==2.7.1+cu128 # torchaudio is required by silero-vad's utils_vad.py top-level import.
--index-url https://download.pytorch.org/whl/cu128 torch==2.7.1+cpu
torchaudio==2.7.1+cpu
--index-url https://download.pytorch.org/whl/cpu

View File

@@ -9,8 +9,8 @@ ctranslate2>=4.4.0
# Audio processing # Audio processing
soundfile>=0.12.0 soundfile>=0.12.0
# VAD - Silero (loaded via torch.hub) # VAD - Silero (loaded via torch.hub, runs on CPU via ONNX)
# No explicit package needed, comes with torch # Requires torch (CPU-only) - see requirements-gpu-torch.txt
# Utilities # Utilities
aiohttp>=3.9.0 aiohttp>=3.9.0