Add restore_runtime_settings() to ConfigManager that reads config_runtime.yaml
on startup and restores persisted values into globals:
- LANGUAGE_MODE, AUTONOMOUS_DEBUG, VOICE_DEBUG_MODE
- USE_CHESHIRE_CAT, PREFER_AMD_GPU, DM_MOOD
Add missing persistence calls to API endpoints:
- POST /language/set now persists to config_runtime.yaml
- POST /voice/debug-mode now persists to config_runtime.yaml
- POST /memory/toggle now persists to config_runtime.yaml
Call restore_runtime_settings() in on_ready() after evil/bipolar restore.
Resolves#22
Replace the minimal sync-only shutdown (which only saved autonomous state)
with a comprehensive async graceful_shutdown() coroutine that:
1. Ends active voice sessions (disconnect, release GPU locks, cleanup audio)
2. Saves autonomous engine state
3. Stops the APScheduler
4. Cancels all tracked background tasks (from task_tracker)
5. Closes the Discord gateway connection
Signal handlers (SIGTERM/SIGINT) now schedule the async shutdown on the
running event loop. The atexit handler is kept as a last-resort sync fallback.
Resolves#5, also addresses #4 (voice cleanup at shutdown)
Major changes:
- Remove unused ML libraries: torch, scikit-learn, langchain-core, langchain-text-splitters, langchain-community, faiss-cpu
- Comment out unused langchain imports in utils/core.py (only used in commented-out code)
- Keep transformers (used in persona_dialogue.py for sentiment analysis)
Results:
- Container size reduced from 14.5GB to 2.6GB
- 82% reduction (11.9GB saved)
- Bot runs correctly without errors
- All functionality preserved
Removed packages:
- torch: ~1.0-1.5GB (not used, only in soprano_to_rvc/)
- scikit-learn: ~200-300MB (not used in bot/)
- langchain-core: ~50-100MB (not used, only in commented code)
- langchain-text-splitters: ~30-50MB (not used, only in commented code)
- langchain-community: ~50-80MB (not used, only in commented code)
- faiss-cpu: ~100-200MB (not used in bot/)
This is Phase 1 of container optimization (Quick Wins).
Further optimizations possible:
- OpenCV headless (150-200MB)
- Evaluate Playwright usage (500MB-1GB)
- Alpine base image (1-1.5GB)
- Multi-stage builds (200-400MB)
Major changes:
- Add Pydantic-based configuration system (bot/config.py, bot/config_manager.py)
- Add config.yaml with all service URLs, models, and feature flags
- Fix config.yaml path resolution in Docker (check /app/config.yaml first)
- Remove Fish Audio API integration (tested feature that didn't work)
- Remove hardcoded ERROR_WEBHOOK_URL, import from config instead
- Add missing Pydantic models (LogConfigUpdateRequest, LogFilterUpdateRequest)
- Enable Cheshire Cat memory system by default (USE_CHESHIRE_CAT=true)
- Add .env.example template with all required environment variables
- Add setup.sh script for user-friendly initialization
- Update docker-compose.yml with proper env file mounting
- Update .gitignore for config files and temporary files
Config system features:
- Static configuration from config.yaml
- Runtime overrides from config_runtime.yaml
- Environment variables for secrets (.env)
- Web UI integration via config_manager
- Graceful fallback to defaults
Secrets handling:
- Move ERROR_WEBHOOK_URL from hardcoded to .env
- Add .env.example with all placeholder values
- Document all required secrets
- Fish API key and voice ID removed from .env
Documentation:
- CONFIG_README.md - Configuration system guide
- CONFIG_SYSTEM_COMPLETE.md - Implementation summary
- FISH_API_REMOVAL_COMPLETE.md - Removal record
- SECRETS_CONFIGURED.md - Secrets setup record
- BOT_STARTUP_FIX.md - Pydantic model fixes
- MIGRATION_CHECKLIST.md - Setup checklist
- WEB_UI_INTEGRATION_COMPLETE.md - Web UI config guide
- Updated readmes/README.md with new features
- Added empty settings.json required by Cat plugin system
- Plugin now appears in ACTIVE PLUGINS list
- Enabled via /plugins/toggle API endpoint
- Ready to inject PFP descriptions when user asks about it
- Create profile_picture_context plugin to detect PFP queries via regex
- Inject current_description.txt only when user asks about profile picture
- Mount bot/memory directory in Cat container for PFP access
- Avoids context bloat by only adding PFP description when relevant
- Patterns match: 'what does your pfp look like', 'describe your avatar', etc.
- Works seamlessly with existing profile picture update system
- No manual sync needed - description auto-updates with PFP changes
MOOD SYSTEM FIX:
- Mount bot/moods directory in docker-compose.yml for Cat container access
- Update miku_personality plugin to load mood descriptions from .txt files
- Add Cat logger for debugging mood loading (replaces print statements)
- Moods now dynamically loaded from working_memory instead of hardcoded neutral
Root cause: The miku_personality plugin's agent_prompt_suffix hook was returning
an empty string, which wiped out the {declarative_memory} and {episodic_memory}
placeholders from the prompt template. This caused the LLM to never receive any
stored facts about users, resulting in hallucinated responses.
Changes:
- miku_personality: Changed agent_prompt_suffix to return the memory context
section with {episodic_memory}, {declarative_memory}, and {tools_output}
placeholders instead of empty string
- discord_bridge: Added before_cat_recalls_declarative_memories hook to increase
k-value from 3 to 10 and lower threshold from 0.7 to 0.5 for better fact
retrieval. Added agent_prompt_prefix to emphasize factual accuracy. Added
debug logging via before_agent_starts hook.
Result: Miku now correctly recalls user facts (favorite songs, games, etc.)
from declarative memory with 100% accuracy.
Tested with:
- 'What is my favorite song?' → Correctly answers 'Monitoring (Best Friend Remix) by DECO*27'
- 'Do you remember my favorite song?' → Correctly recalls the song
- 'What is my favorite video game?' → Correctly answers 'Sonic Adventure'
- Replaced Playwright browser scraping with direct API media extraction
- Both fetch_miku_tweets() and fetch_figurine_tweets_latest() now use twscrape's built-in media info
- Reduced tweet fetching from 10-15 minutes to ~5 seconds
- Eliminated browser timeout/hanging issues
- Relaxed autonomous tweet sharing conditions:
* Increased message threshold from 10 to 20 per hour
* Reduced cooldown from 3600s to 2400s (40 minutes)
* Increased energy threshold from 50% to 70%
* Added 'silly' and 'flirty' moods to allowed sharing moods
This makes both figurine notifications and tweet sharing much more reliable and responsive.
**Critical Bug Fixes:**
1. Per-user memory isolation bug
- Changed CatAdapter from HTTP POST to WebSocket /ws/{user_id}
- User_id now comes from URL path parameter (true per-user isolation)
- Verified: Different users can't see each other's memories
2. Memory API 405 errors
- Replaced non-existent Cat endpoint calls with Qdrant direct queries
- get_memory_points(): Now uses POST /collections/{collection}/points/scroll
- delete_memory_point(): Now uses POST /collections/{collection}/points/delete
3. Memory stats showing null counts
- Reimplemented get_memory_stats() to query Qdrant directly
- Now returns accurate counts: episodic: 20, declarative: 6, procedural: 4
4. Miku couldn't see usernames
- Modified discord_bridge before_cat_reads_message hook
- Prepends [Username says:] to every message text
- LLM now knows who is texting: [Alice says:] Hello Miku!
5. Web UI Memory tab layout
- Tab9 was positioned outside .tab-container div (showed to the right)
- Moved tab9 HTML inside container, before closing divs
- Memory tab now displays below tab buttons like other tabs
**Code Changes:**
bot/utils/cat_client.py:
- Line 25: Logger name changed to 'llm' (available component)
- get_memory_stats() (lines 256-285): Query Qdrant directly via HTTP GET
- get_memory_points() (lines 275-310): Use Qdrant POST /points/scroll
- delete_memory_point() (lines 350-370): Use Qdrant POST /points/delete
cat-plugins/discord_bridge/discord_bridge.py:
- Fixed .pop() → .get() (UserMessage is Pydantic BaseModelDict)
- Added before_cat_reads_message logic to prepend [Username says:]
- Message format: [Alice says:] message content
Dockerfile.llamaswap-rocm:
- Lines 37-44: Added conditional check for UI directory
- if [ -d ui ] before npm install && npm run build
- Fixes build failure when llama-swap UI dir doesn't exist
bot/static/index.html:
- Moved tab9 from lines 1554-1688 (outside container)
- To position before container closing divs (now inside)
- Memory tab button at line 673: 🧠 Memories
**Testing & Verification:**
✅ Per-user isolation verified (Docker exec test)
✅ Memory stats showing real counts (curl test)
✅ Memory API working (facts/episodic loading)
✅ Web UI layout fixed (tab displays correctly)
✅ All 5 services running (llama-swap, llama-swap-amd, qdrant, cat, bot)
✅ Username prepending working (message context for LLM)
**Result:** All Phase 3 critical bugs fixed and verified working.
Key changes:
- CatAdapter (bot/utils/cat_client.py): WebSocket /ws/{user_id} for chat
queries instead of HTTP POST (fixes per-user memory isolation when no
API keys are configured — HTTP defaults all users to user_id='user')
- Memory management API: 8 endpoints for status, stats, facts, episodic
memories, consolidation trigger, multi-step delete with confirmation
- Web UI: Memory tab (tab9) with collection stats, fact/episodic browser,
manual consolidation trigger, and 3-step delete flow requiring exact
confirmation string
- Bot integration: Cat-first response path with query_llama fallback for
both text and embed responses, server mood detection
- Discord bridge plugin: fixed .pop() to .get() (UserMessage is a Pydantic
BaseModelDict, not a raw dict), metadata extraction via extra attributes
- Unified docker-compose: Cat + Qdrant services merged into main compose,
bot depends_on Cat healthcheck
- All plugins (discord_bridge, memory_consolidation, miku_personality)
consolidated into cat-plugins/ for volume mount
- query_llama deprecated but functional for compatibility
Implements unified cross-server memory system for Miku bot:
**Core Changes:**
- discord_bridge plugin with 3 hooks for metadata enrichment
- Unified user identity: discord_user_{id} across servers and DMs
- Minimal filtering: skip only trivial messages (lol, k, 1-2 chars)
- Marks all memories as consolidated=False for Phase 2 processing
**Testing:**
- test_phase1.py validates cross-server memory recall
- PHASE1_TEST_RESULTS.md documents successful validation
- Cross-server test: User says 'blue' in Server A, Miku remembers in Server B ✅
**Documentation:**
- IMPLEMENTATION_PLAN.md - Complete architecture and roadmap
- Phase 2 (sleep consolidation) ready for implementation
This lays the foundation for human-like memory consolidation.
Major architectural overhaul of the speech-to-text pipeline for real-time voice chat:
STT Server Rewrite:
- Replaced RealtimeSTT dependency with direct Silero VAD + Faster-Whisper integration
- Achieved sub-second latency by eliminating unnecessary abstractions
- Uses small.en Whisper model for fast transcription (~850ms)
Speculative Transcription (NEW):
- Start transcribing at 150ms silence (speculative) while still listening
- If speech continues, discard speculative result and keep buffering
- If 400ms silence confirmed, use pre-computed speculative result immediately
- Reduces latency by ~250-850ms for typical utterances with clear pauses
VAD Implementation:
- Silero VAD with ONNX (CPU-efficient) for 32ms chunk processing
- Direct speech boundary detection without RealtimeSTT overhead
- Configurable thresholds for silence detection (400ms final, 150ms speculative)
Architecture:
- Single Whisper model loaded once, shared across sessions
- VAD runs on every 512-sample chunk for immediate speech detection
- Background transcription worker thread for non-blocking processing
- Greedy decoding (beam_size=1) for maximum speed
Performance:
- Previous: 400ms silence wait + ~850ms transcription = ~1.25s total latency
- Current: 400ms silence wait + 0ms (speculative ready) = ~400ms (best case)
- Single model reduces VRAM usage, prevents OOM on GTX 1660
Container Manager Updates:
- Updated health check logic to work with new response format
- Changed from checking 'warmed_up' flag to just 'status: ready'
- Improved terminology from 'warmup' to 'models loading'
Files Changed:
- stt-realtime/stt_server.py: Complete rewrite with Silero VAD + speculative transcription
- stt-realtime/requirements.txt: Removed RealtimeSTT, using torch.hub for Silero VAD
- bot/utils/container_manager.py: Updated health check for new STT response format
- bot/api.py: Updated docstring to reflect new architecture
- backups/: Archived old RealtimeSTT-based implementation
This addresses low latency requirements while maintaining accuracy with configurable
speech detection thresholds.
- Removed local 'import json' statement inside get_servers() function
- This was shadowing the module-level import and causing
'cannot access local variable' error
- json is already imported at the top of the file (line 44)
- Created new logging infrastructure with per-component filtering
- Added 6 log levels: DEBUG, INFO, API, WARNING, ERROR, CRITICAL
- Implemented non-hierarchical level control (any combination can be enabled)
- Migrated 917 print() statements across 31 files to structured logging
- Created web UI (system.html) for runtime configuration with dark theme
- Added global level controls to enable/disable levels across all components
- Added timestamp format control (off/time/date/datetime options)
- Implemented log rotation (10MB per file, 5 backups)
- Added API endpoints for dynamic log configuration
- Configured HTTP request logging with filtering via api.requests component
- Intercepted APScheduler logs with proper formatting
- Fixed persistence paths to use /app/memory for Docker volume compatibility
- Fixed checkbox display bug in web UI (enabled_levels now properly shown)
- Changed System Settings button to open in same tab instead of new window
Components: bot, api, api.requests, autonomous, persona, vision, llm,
conversation, mood, dm, scheduled, gpu, media, server, commands,
sentiment, core, apscheduler
All settings persist across container restarts via JSON config.
Features:
- Built custom ROCm container for AMD RX 6800 GPU
- Added GPU selection toggle in web UI (NVIDIA/AMD)
- Unified model names across both GPUs for seamless switching
- Vision model always uses NVIDIA GPU (optimal performance)
- Text models (llama3.1, darkidol) can use either GPU
- Added /gpu-status and /gpu-select API endpoints
- Implemented GPU state persistence in memory/gpu_state.json
Technical details:
- Multi-stage Dockerfile.llamaswap-rocm with ROCm 6.2.4
- llama.cpp compiled with GGML_HIP=ON for gfx1030 (RX 6800)
- Proper GPU permissions without root (groups 187/989)
- AMD container on port 8091, NVIDIA on port 8090
- Updated bot/utils/llm.py with get_current_gpu_url() and get_vision_gpu_url()
- Modified bot/utils/image_handling.py to always use NVIDIA for vision
- Enhanced web UI with GPU selector button (blue=NVIDIA, red=AMD)
Files modified:
- docker-compose.yml (added llama-swap-amd service)
- bot/globals.py (added LLAMA_AMD_URL)
- bot/api.py (added GPU selection endpoints and helper function)
- bot/utils/llm.py (GPU routing for text models)
- bot/utils/image_handling.py (GPU routing for vision models)
- bot/static/index.html (GPU selector UI)
- llama-swap-rocm-config.yaml (unified model names)
New files:
- Dockerfile.llamaswap-rocm
- bot/memory/gpu_state.json
- bot/utils/gpu_router.py (load balancing utility)
- setup-dual-gpu.sh (setup verification script)
- DUAL_GPU_*.md (documentation files)
ISSUE
=====
When using the manual webhook message feature via API, the following error occurred:
- 'Timeout context manager should be used inside a task'
- 'NoneType' object is not iterable (when sending without files)
The error happened because Discord.py's webhook operations were being awaited
directly in the FastAPI endpoint context, rather than within a task running in
the bot's event loop.
SOLUTION
========
Refactored /manual/send-webhook endpoint to properly handle async operations:
1. Moved webhook creation inside task function
- get_or_create_webhooks_for_channel() now runs in send_webhook_message()
- All Discord operations (webhook selection, sending) happen inside the task
- Follows same pattern as working /manual/send endpoint
2. Fixed file parameter handling
- Changed from 'files=discord_files if discord_files else None'
- To conditional: only pass files parameter when list is non-empty
- Discord.py's webhook.send() cannot iterate over None, requires list or omit
3. Maintained proper file reading
- File content still read in endpoint context (before form closes)
- File data passed to task as pre-read byte arrays
- Prevents form closure issues
TECHNICAL DETAILS
=================
- Discord.py HTTP operations use timeout context managers
- Context managers must run inside bot's event loop (via create_task)
- FastAPI endpoint context is separate from bot's event loop
- Solution: Wrap all Discord API calls in async task function
- Pattern: Read files → Create task → Task handles Discord operations
TESTING
=======
- Manual webhook sending now works without timeout errors
- Both personas (Miku/Evil) send correctly
- File attachments work properly
- Messages without files send correctly