Phase 3: Unified Cheshire Cat integration with WebSocket-based per-user isolation

Key changes: - CatAdapter (bot/utils/cat_client.py): WebSocket /ws/{user_id} for chat queries instead of HTTP POST (fixes per-user memory isolation when no API keys are configured — HTTP defaults all users to user_id='user') - Memory management API: 8 endpoints for status, stats, facts, episodic memories, consolidation trigger, multi-step delete with confirmation - Web UI: Memory tab (tab9) with collection stats, fact/episodic browser, manual consolidation trigger, and 3-step delete flow requiring exact confirmation string - Bot integration: Cat-first response path with query_llama fallback for both text and embed responses, server mood detection - Discord bridge plugin: fixed .pop() to .get() (UserMessage is a Pydantic BaseModelDict, not a raw dict), metadata extraction via extra attributes - Unified docker-compose: Cat + Qdrant services merged into main compose, bot depends_on Cat healthcheck - All plugins (discord_bridge, memory_consolidation, miku_personality) consolidated into cat-plugins/ for volume mount - query_llama deprecated but functional for compatibility
2026-02-07 20:22:03 +02:00
parent edb88e9ede
commit 14e1a8df51
14 changed files with 1382 additions and 70 deletions
--- a/bot/utils/llm.py
+++ b/bot/utils/llm.py
@@ -152,6 +152,13 @@ async def query_llama(user_prompt, user_id, guild_id=None, response_type="dm_res
    """
    Query llama.cpp server via llama-swap with OpenAI-compatible API.
    
+    .. deprecated:: Phase 3
+        For main conversation flow, prefer routing through the Cheshire Cat pipeline
+        (via cat_client.CatAdapter.query) which provides memory-augmented responses.
+        This function remains available for specialized use cases (vision, bipolar mode,
+        image generation, autonomous, sentiment analysis) and as a fallback when Cat
+        is unavailable.
+    
    Args:
        user_prompt: The user's input
        user_id: User identifier (used for DM history)