- miku-stt: switch PyTorch CUDA -> CPU-only (~2.5 GB savings) - Silero VAD already runs on CPU via ONNX (onnx=True), CUDA PyTorch was waste - faster-whisper/CTranslate2 uses CUDA directly, no PyTorch GPU needed - torch+torchaudio layer: 3.3 GB -> 796 MB; total image 9+ GB -> 6.83 GB - Tested: Silero VAD loads (ONNX), Whisper loads on cuda, server ready - llama-swap-rocm: add root .dockerignore to fix 31 GB build context - Dockerfile clones all sources from git, never COPYs from context - 19 GB of GGUF model files were being transferred on every build - Now excludes everything (*), near-zero context transfer - anime-face-detector: add .dockerignore to exclude accumulated outputs - api/outputs/ (56 accumulated detection files) no longer baked into image - api/__pycache__/ and images/ also excluded - .gitignore: remove .dockerignore exclusion so these files are tracked
53 lines
1.7 KiB
Docker
53 lines
1.7 KiB
Docker
# RealtimeSTT Container
|
|
# Uses Faster-Whisper with CUDA for GPU-accelerated inference
|
|
# Includes Silero VAD (ONNX, CPU-only) for robust voice detection
|
|
#
|
|
# Updated per RealtimeSTT PR #295:
|
|
# - CUDA 12.8.1 (latest stable)
|
|
# - PyTorch CPU-only (for Silero VAD tensor ops only - saves ~2.3 GB)
|
|
# - Faster-Whisper/CTranslate2 uses CUDA directly, no PyTorch GPU needed
|
|
# - Ubuntu 24.04 base
|
|
|
|
FROM nvidia/cuda:12.8.1-cudnn-runtime-ubuntu24.04
|
|
|
|
# Prevent interactive prompts during build
|
|
ENV DEBIAN_FRONTEND=noninteractive
|
|
ENV PYTHONUNBUFFERED=1
|
|
|
|
# Set working directory
|
|
WORKDIR /app
|
|
|
|
# Install system dependencies (Ubuntu 24.04 has Python 3.12 by default)
|
|
RUN apt-get update && apt-get install -y \
|
|
python3-pip \
|
|
ffmpeg \
|
|
libsndfile1 \
|
|
libportaudio2 \
|
|
git \
|
|
curl \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Install PyTorch CPU-only (for Silero VAD tensor ops - GPU transcription uses CTranslate2 directly)
|
|
COPY requirements-gpu-torch.txt .
|
|
RUN python3 -m pip install --break-system-packages --no-cache-dir -r requirements-gpu-torch.txt
|
|
|
|
# Copy and install other Python dependencies
|
|
COPY requirements.txt .
|
|
RUN python3 -m pip install --break-system-packages --no-cache-dir -r requirements.txt
|
|
|
|
# Copy application code
|
|
COPY stt_server.py .
|
|
|
|
# Create models directory (models will be downloaded on first run)
|
|
RUN mkdir -p /root/.cache/huggingface
|
|
|
|
# Expose WebSocket port
|
|
EXPOSE 8766
|
|
|
|
# Health check - use netcat to check if port is listening
|
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
|
|
CMD python3 -c "import socket; s=socket.socket(); s.settimeout(2); s.connect(('localhost', 8766)); s.close()" || exit 1
|
|
|
|
# Run the server
|
|
CMD ["python3", "stt_server.py"]
|