refactor: Implement low-latency STT pipeline with speculative transcription
Major architectural overhaul of the speech-to-text pipeline for real-time voice chat: STT Server Rewrite: - Replaced RealtimeSTT dependency with direct Silero VAD + Faster-Whisper integration - Achieved sub-second latency by eliminating unnecessary abstractions - Uses small.en Whisper model for fast transcription (~850ms) Speculative Transcription (NEW): - Start transcribing at 150ms silence (speculative) while still listening - If speech continues, discard speculative result and keep buffering - If 400ms silence confirmed, use pre-computed speculative result immediately - Reduces latency by ~250-850ms for typical utterances with clear pauses VAD Implementation: - Silero VAD with ONNX (CPU-efficient) for 32ms chunk processing - Direct speech boundary detection without RealtimeSTT overhead - Configurable thresholds for silence detection (400ms final, 150ms speculative) Architecture: - Single Whisper model loaded once, shared across sessions - VAD runs on every 512-sample chunk for immediate speech detection - Background transcription worker thread for non-blocking processing - Greedy decoding (beam_size=1) for maximum speed Performance: - Previous: 400ms silence wait + ~850ms transcription = ~1.25s total latency - Current: 400ms silence wait + 0ms (speculative ready) = ~400ms (best case) - Single model reduces VRAM usage, prevents OOM on GTX 1660 Container Manager Updates: - Updated health check logic to work with new response format - Changed from checking 'warmed_up' flag to just 'status: ready' - Improved terminology from 'warmup' to 'models loading' Files Changed: - stt-realtime/stt_server.py: Complete rewrite with Silero VAD + speculative transcription - stt-realtime/requirements.txt: Removed RealtimeSTT, using torch.hub for Silero VAD - bot/utils/container_manager.py: Updated health check for new STT response format - bot/api.py: Updated docstring to reflect new architecture - backups/: Archived old RealtimeSTT-based implementation This addresses low latency requirements while maintaining accuracy with configurable speech detection thresholds.
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
# container_manager.py
|
||||
"""
|
||||
Manages Docker containers for STT and TTS services.
|
||||
Handles startup, shutdown, and warmup detection.
|
||||
Handles startup, shutdown, and readiness detection.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
@@ -18,12 +18,12 @@ class ContainerManager:
|
||||
STT_CONTAINER = "miku-stt"
|
||||
TTS_CONTAINER = "miku-rvc-api"
|
||||
|
||||
# Warmup check endpoints
|
||||
# Health check endpoints
|
||||
STT_HEALTH_URL = "http://miku-stt:8767/health" # HTTP health check endpoint
|
||||
TTS_HEALTH_URL = "http://miku-rvc-api:8765/health"
|
||||
|
||||
# Warmup timeouts
|
||||
STT_WARMUP_TIMEOUT = 30 # seconds
|
||||
# Startup timeouts (time to load models and become ready)
|
||||
STT_WARMUP_TIMEOUT = 30 # seconds (Whisper model loading)
|
||||
TTS_WARMUP_TIMEOUT = 60 # seconds (RVC takes longer)
|
||||
|
||||
@classmethod
|
||||
@@ -65,17 +65,17 @@ class ContainerManager:
|
||||
|
||||
logger.info(f"✓ {cls.TTS_CONTAINER} started")
|
||||
|
||||
# Wait for warmup
|
||||
logger.info("⏳ Waiting for containers to warm up...")
|
||||
# Wait for models to load and become ready
|
||||
logger.info("⏳ Waiting for models to load...")
|
||||
|
||||
stt_ready = await cls._wait_for_stt_warmup()
|
||||
if not stt_ready:
|
||||
logger.error("STT failed to warm up")
|
||||
logger.error("STT failed to become ready")
|
||||
return False
|
||||
|
||||
tts_ready = await cls._wait_for_tts_warmup()
|
||||
if not tts_ready:
|
||||
logger.error("TTS failed to warm up")
|
||||
logger.error("TTS failed to become ready")
|
||||
return False
|
||||
|
||||
logger.info("✅ All voice containers ready!")
|
||||
@@ -130,7 +130,8 @@ class ContainerManager:
|
||||
async with session.get(cls.STT_HEALTH_URL, timeout=aiohttp.ClientTimeout(total=2)) as resp:
|
||||
if resp.status == 200:
|
||||
data = await resp.json()
|
||||
if data.get("status") == "ready" and data.get("warmed_up"):
|
||||
# New STT server returns {"status": "ready"} when models are loaded
|
||||
if data.get("status") == "ready":
|
||||
logger.info("✓ STT is ready")
|
||||
return True
|
||||
except Exception:
|
||||
|
||||
Reference in New Issue
Block a user