Files
miku-discord/stt-realtime/Dockerfile
koko210Serve 9e5511da21 perf: reduce container sizes and build times
- miku-stt: switch PyTorch CUDA -> CPU-only (~2.5 GB savings)
  - Silero VAD already runs on CPU via ONNX (onnx=True), CUDA PyTorch was waste
  - faster-whisper/CTranslate2 uses CUDA directly, no PyTorch GPU needed
  - torch+torchaudio layer: 3.3 GB -> 796 MB; total image 9+ GB -> 6.83 GB
  - Tested: Silero VAD loads (ONNX), Whisper loads on cuda, server ready

- llama-swap-rocm: add root .dockerignore to fix 31 GB build context
  - Dockerfile clones all sources from git, never COPYs from context
  - 19 GB of GGUF model files were being transferred on every build
  - Now excludes everything (*), near-zero context transfer

- anime-face-detector: add .dockerignore to exclude accumulated outputs
  - api/outputs/ (56 accumulated detection files) no longer baked into image
  - api/__pycache__/ and images/ also excluded

- .gitignore: remove .dockerignore exclusion so these files are tracked
2026-02-25 14:41:04 +02:00

53 lines
1.7 KiB
Docker

# RealtimeSTT Container
# Uses Faster-Whisper with CUDA for GPU-accelerated inference
# Includes Silero VAD (ONNX, CPU-only) for robust voice detection
#
# Updated per RealtimeSTT PR #295:
# - CUDA 12.8.1 (latest stable)
# - PyTorch CPU-only (for Silero VAD tensor ops only - saves ~2.3 GB)
# - Faster-Whisper/CTranslate2 uses CUDA directly, no PyTorch GPU needed
# - Ubuntu 24.04 base
FROM nvidia/cuda:12.8.1-cudnn-runtime-ubuntu24.04
# Prevent interactive prompts during build
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
# Set working directory
WORKDIR /app
# Install system dependencies (Ubuntu 24.04 has Python 3.12 by default)
RUN apt-get update && apt-get install -y \
python3-pip \
ffmpeg \
libsndfile1 \
libportaudio2 \
git \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install PyTorch CPU-only (for Silero VAD tensor ops - GPU transcription uses CTranslate2 directly)
COPY requirements-gpu-torch.txt .
RUN python3 -m pip install --break-system-packages --no-cache-dir -r requirements-gpu-torch.txt
# Copy and install other Python dependencies
COPY requirements.txt .
RUN python3 -m pip install --break-system-packages --no-cache-dir -r requirements.txt
# Copy application code
COPY stt_server.py .
# Create models directory (models will be downloaded on first run)
RUN mkdir -p /root/.cache/huggingface
# Expose WebSocket port
EXPOSE 8766
# Health check - use netcat to check if port is listening
HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
CMD python3 -c "import socket; s=socket.socket(); s.settimeout(2); s.connect(('localhost', 8766)); s.close()" || exit 1
# Run the server
CMD ["python3", "stt_server.py"]