Files
miku-discord/cheshire-cat/IMPLEMENTATION_PLAN.md
koko210Serve 323ca753d1 feat: Phase 1 - Discord bridge with unified user identity
Implements unified cross-server memory system for Miku bot:

**Core Changes:**
- discord_bridge plugin with 3 hooks for metadata enrichment
- Unified user identity: discord_user_{id} across servers and DMs
- Minimal filtering: skip only trivial messages (lol, k, 1-2 chars)
- Marks all memories as consolidated=False for Phase 2 processing

**Testing:**
- test_phase1.py validates cross-server memory recall
- PHASE1_TEST_RESULTS.md documents successful validation
- Cross-server test: User says 'blue' in Server A, Miku remembers in Server B 

**Documentation:**
- IMPLEMENTATION_PLAN.md - Complete architecture and roadmap
- Phase 2 (sleep consolidation) ready for implementation

This lays the foundation for human-like memory consolidation.
2026-01-31 18:54:00 +02:00

44 KiB

Cheshire Cat Implementation Plan for Miku Discord Bot

Executive Summary

This plan outlines how to integrate Cheshire Cat AI into the Miku Discord bot to achieve:

  • Long-term memory per server (persistent context beyond LLM limits)
  • User profiling (learning about individual users over time)
  • Richer conversation history (not limited by context window)
  • Autonomous memory curation (Miku decides what to remember)

Quick Answers to Key Questions

Q1: Will this work for DMs?
Yes, with unified user identity! Each user gets a single identity across all servers and DMs:

  • User ID format: discord_user_{user_id} (e.g., discord_user_67890)
  • Miku remembers the same person regardless of where they talk to her
  • Server context stored in metadata: guild_id, channel_id
  • User can talk to same Miku in Server A, Server B, and DMs - she remembers them everywhere!

Q2: How can Miku decide what to remember without expensive LLM polling?
After careful analysis, "Sleep Consolidation" method is superior to piggyback approach:

Approach Pros Cons Verdict
Piggyback Zero extra cost, instant decisions Risk of breaking immersion, can't see conversation patterns, decisions made in isolation Not ideal
Sleep Consolidation Natural (like human memory), sees full context, batch processing efficient, zero immersion risk Slight delay (memories processed overnight) Recommended

Sleep Consolidation: Store everything temporarily → Nightly batch analysis → Keep important, discard trivial

System Architecture

Current vs Proposed

Current System:

User Message → Load 3 files (10KB) → LLM → Response
                ↓
         8-message window
         (context limit)

Proposed System:

User Message → Cheshire Cat (Multi-tier Memory) → LLM → Response
                ↓              ↓              ↓
           Episodic      Declarative    Procedural
           (events)      (facts)        (tools)

Memory Types in Cheshire Cat

1. Episodic Memory (Conversation History)

  • What: Time-stamped conversation excerpts
  • Storage: Vector embeddings in Qdrant
  • Retrieval: Semantic similarity search
  • Scope: Per-user or per-server
  • Capacity: Unlimited (beyond LLM context window)

How Miku Uses It:

# Automatically stored by Cat:
User: "My favorite color is blue"
Miku: "That's lovely! Blue like the ocean 🌊"
[STORED: User prefers blue, timestamp, conversation context]

# Later retrieved when relevant:
User: "What should I paint?"
[RAG retrieves: User likes blue]
Miku: "How about a beautiful blue ocean scene? 🎨💙"

2. Declarative Memory (Facts & Knowledge)

  • What: Static information, user profiles, server data
  • Storage: Vector embeddings in Qdrant
  • Retrieval: Semantic similarity search
  • Scope: Global, per-server, or per-user
  • Capacity: Unlimited

How Miku Uses It:

# Miku can actively store facts:
User: "I live in Tokyo"
Miku stores: {
    "user_id": "12345",
    "fact": "User lives in Tokyo, Japan",
    "category": "location",
    "timestamp": "2026-01-31"
}

# Retrieved contextually:
User: "What's the weather like?"
[RAG retrieves: User location = Tokyo]
Miku: "Let me think about Tokyo weather! ☀️"

3. Procedural Memory (Skills & Tools)

  • What: Callable functions/tools
  • Storage: Code in plugins
  • Purpose: Extend Miku's capabilities
  • Examples: Image generation, web search, database queries

Implementation Phases

Phase 1: Foundation (Week 1-2)

Goal: Basic Cat integration without breaking existing bot

Tasks:

  1. Deploy Cheshire Cat alongside existing bot

    • Use test docker-compose configuration
    • Run Cat on port 1865 (already done)
    • Keep existing bot running on normal port
  2. Create Discord bridge plugin

    # cat/plugins/discord_bridge/discord_bridge.py
    from cat.mad_hatter.decorators import hook
    
    @hook(priority=100)
    def before_cat_reads_message(user_message_json, cat):
        """Enrich message with Discord metadata"""
        # Add guild_id, channel_id, user_id from Discord
        user_message_json['guild_id'] = cat.working_memory.guild_id
        user_message_json['channel_id'] = cat.working_memory.channel_id
        return user_message_json
    
    @hook(priority=100)
    def before_cat_stores_episodic_memory(doc, cat):
        """Add Discord context to memories"""
        doc.metadata['guild_id'] = cat.working_memory.guild_id
        doc.metadata['user_id'] = cat.working_memory.user_id
        return doc
    
  3. Implement user isolation

    • Each Discord user gets unique Cat user_id
    • Format: discord_{guild_id}_{user_id}
    • Ensures per-user conversation history

Success Criteria:

  • Cat responds to test queries via API
  • Discord metadata properly attached
  • No impact on existing bot

Phase 2: Memory Intelligence (Week 3-4)

Goal: Teach Miku to decide what to remember

Tasks:

  1. Implement minimal real-time filtering

    # cat/plugins/miku_memory/sleep_consolidation.py
    from cat.mad_hatter.decorators import hook
    from langchain.docstore.document import Document
    import re
    
    @hook(priority=100)
    def before_cat_stores_episodic_memory(doc, cat):
        """
        Store almost everything temporarily.
        Only skip obvious junk (1-2 char messages, pure reactions).
        """
        message = doc.page_content.strip()
    
        # Skip only the most trivial
        skip_patterns = [
            r'^\w{1,2}$',  # "k", "ok"
            r'^(lol|lmao|haha|hehe|xd)$',  # Pure reactions
        ]
    
        for pattern in skip_patterns:
            if re.match(pattern, message.lower()):
                return None  # Too trivial to even store temporarily
    
        # Everything else: store with metadata
        doc.metadata['consolidated'] = False  # Needs nightly processing
        doc.metadata['stored_at'] = datetime.now().isoformat()
        doc.metadata['guild_id'] = cat.working_memory.get('guild_id', 'dm')
        doc.metadata['user_id'] = cat.working_memory.user_id
    
        return doc
    
  2. Implement nightly consolidation task

    # bot/utils/memory_consolidation.py
    import asyncio
    from datetime import datetime
    import schedule
    
    async def nightly_memory_consolidation():
        """
        Run every night at 3 AM.
        Reviews all unconsolidated memories and decides what to keep.
        """
        print(f"🌙 {datetime.now()} - Miku's memory consolidation starting...")
    
        # Get all memories that need consolidation
        unconsolidated = await get_unconsolidated_memories()
    
        # Group by user
        by_user = {}
        for mem in unconsolidated:
            user_id = mem.metadata['user_id']
            if user_id not in by_user:
                by_user[user_id] = []
            by_user[user_id].append(mem)
    
        print(f"📊 Processing {len(unconsolidated)} memories from {len(by_user)} users")
    
        # Process each user's day
        for user_id, memories in by_user.items():
            await consolidate_user_memories(user_id, memories)
    
        print(f"✨ Memory consolidation complete!")
    
    # Schedule for 3 AM daily
    schedule.every().day.at("03:00").do(lambda: asyncio.create_task(nightly_memory_consolidation()))
    
  3. Create context-aware analysis function

    async def consolidate_user_memories(user_id: str, memories: List[Document]):
        """
        Analyze user's entire day in one context.
        This is where the magic happens!
        """
        # Build timeline
        timeline = []
        for mem in sorted(memories, key=lambda m: m.metadata['stored_at']):
            timeline.append({
                'time': mem.metadata['stored_at'],
                'guild': mem.metadata.get('guild_id', 'dm'),
                'content': mem.page_content
            })
    
        # ONE LLM call to analyze entire day
        prompt = f"""
    

You are Miku reviewing your conversations with user {user_id} from today. Look at the full timeline and decide what's worth remembering long-term.

Timeline ({len(timeline)} conversations): {json.dumps(timeline, indent=2)}

Analyze holistically:

  1. What did you learn about this person today?
  2. Any patterns or recurring themes?
  3. How did your relationship evolve?
  4. Which moments were meaningful vs casual chitchat?

For each conversation, decide:

  • keep: true/false
  • importance: 1-10
  • categories: ["personal", "preference", "emotional", "event", "relationship"]
  • insights: What you learned (for declarative memory)
  • summary: One sentence for future retrieval

Respond with JSON: {{ "day_summary": "One sentence about user based on today", "relationship_change": "How relationship evolved (if at all)", "conversations": [ {{"id": 0, "keep": true/false, "importance": X, ...}}, ... ], "new_facts": ["fact1", "fact2", ...] }} """

   # Call LLM
   analysis = await cat.llm(prompt)
   result = json.loads(analysis)
   
   # Apply decisions
   kept = 0
   deleted = 0
   for i, decision in enumerate(result['conversations']):
       memory = memories[i]
       
       if decision['keep']:
           # Enrich and mark consolidated
           memory.metadata.update({
               'importance': decision['importance'],
               'categories': decision['categories'],
               'summary': decision['summary'],
               'consolidated': True
           })
           await cat.memory.update(memory)
           kept += 1
       else:
           # Delete
           await cat.memory.delete(memory.id)
           deleted += 1
   
   # Store new facts in declarative memory
   for fact in result.get('new_facts', []):
       await cat.memory.declarative.add({
           'content': fact,
           'user_id': user_id,
           'learned_on': datetime.now().date().isoformat()
       })
   
   print(f"✅ {user_id}: kept {kept}, deleted {deleted}, learned {len(result['new_facts'])} facts")

4. **Add Discord bot integration**
```python
@hook(priority=50)
def after_cat_recalls_memories(memory_docs, cat):
    """
    Extract user profile from recalled memories
    Build a dynamic profile for context injection
    """
    user_id = cat.working_memory.user_id
    
    # Get all user memories
    memories = cat.memory.vectors.episodic.recall_memories_from_text(
        f"Tell me everything about user {user_id}",
        k=50,
        metadata_filter={'user_id': user_id}
    )
    
    # Aggregate into profile
    profile = {
        'preferences': [],
        'personal_info': {},
        'relationship_history': [],
        'emotional_connection': 0
    }
    
    for mem in memories:
        if 'preference' in mem.metadata.get('categories', []):
            profile['preferences'].append(mem.metadata['summary'])
        # ... more extraction logic
    
    # Store in working memory for this conversation
    cat.working_memory.user_profile = profile
    
    return memory_docs
  1. Implement server-wide memories
    def store_server_fact(guild_id, fact_text, category):
        """Store facts that apply to entire server"""
        cat.memory.vectors.declarative.add_point(
            Document(
                page_content=fact_text,
                metadata={
                    'source': 'server_admin',
                    'guild_id': guild_id,
                    'category': category,
                    'scope': 'server',  # Accessible by all users in server
                    'when': datetime.now().isoformat()
                }
            )
        )
    

Success Criteria:

  • Miku remembers user preferences after 1 conversation
  • User profile builds over multiple conversations
  • Server-wide context accessible to all users

Phase 3: Discord Bot Integration (Week 5-6)

Goal: Replace current LLM calls with Cat API calls

Tasks:

  1. Create Cat adapter in bot

    # bot/utils/cat_adapter.py
    import requests
    from typing import Optional, Dict
    
    class CheshireCatAdapter:
        def __init__(self, cat_url="http://localhost:1865"):
            self.cat_url = cat_url
    
        def query(
            self,
            user_message: str,
            user_id: str,
            guild_id: Optional[str] = None,
            mood: str = "neutral",
            context: Optional[Dict] = None
        ) -> str:
            """
            Query Cheshire Cat with Discord context
    
            Uses unified user identity:
            - User is always "discord_user_{user_id}"
            - Guild context stored in metadata for filtering
            """
            # Build unified Cat user_id (same user everywhere!)
            cat_user_id = f"discord_user_{user_id}"
    
            # Prepare payload
            payload = {
                "text": user_message,
                "user_id": cat_user_id,
                "metadata": {
                    "guild_id": guild_id or "dm",  # Track where conversation happened
                    "channel_id": context.get('channel_id') if context else None,
                    "mood": mood,
                    "discord_context": context
                }
            }
    
            # Stream response
            response = requests.post(
                f"{self.cat_url}/message",
                json=payload,
                timeout=60
            )
    
            return response.json()['content']
    
  2. Modify bot message handler

    # bot/bot.py (modify existing on_message)
    
    # BEFORE (current):
    response = await query_llama(
        user_prompt=message.content,
        user_id=str(message.author.id),
        guild_id=str(message.guild.id) if message.guild else None
    )
    
    # AFTER (with Cat):
    if USE_CHESHIRE_CAT:  # Feature flag
        response = cat_adapter.query(
            user_message=message.content,
            user_id=str(message.author.id),
            guild_id=str(message.guild.id) if message.guild else None,
            mood=current_mood,
            context={
                'channel_id': str(message.channel.id),
                'message_id': str(message.id),
                'attachments': [a.url for a in message.attachments]
            }
        )
    else:
        # Fallback to current system
        response = await query_llama(...)
    
  3. Add graceful fallback

    try:
        response = cat_adapter.query(...)
    except (requests.Timeout, requests.ConnectionError):
        logger.warning("Cat unavailable, falling back to direct LLM")
        response = await query_llama(...)
    

Success Criteria:

  • Bot can use either Cat or direct LLM
  • Seamless fallback on Cat failure
  • No user-facing changes (responses identical quality)

Phase 4: Advanced Features (Week 7-8)

Goal: Leverage Cat's unique capabilities

Tasks:

  1. Conversation threading

    # Group related conversations across time
    @hook
    def before_cat_stores_episodic_memory(doc, cat):
        # Detect conversation topic
        topic = extract_topic(doc.page_content)
    
        # Link to previous conversations about same topic
        doc.metadata['topic'] = topic
        doc.metadata['thread_id'] = generate_thread_id(topic, cat.user_id)
    
        return doc
    
  2. Emotional memory

    # Remember significant emotional moments
    def analyze_emotional_significance(conversation):
        """
        Detect: compliments, conflicts, funny moments, sad topics
        Store with higher importance weight
        """
        emotions = ['joy', 'sadness', 'anger', 'surprise', 'love']
        detected = detect_emotions(conversation)
    
        if any(detected.values() > 0.7):
            return {
                'is_emotional': True,
                'emotions': detected,
                'importance': 9  # High importance
            }
    
  3. Cross-user insights

    # Server-wide patterns (privacy-respecting)
    def analyze_server_culture(guild_id):
        """
        What does this server community like?
        Common topics, shared interests, inside jokes
        """
        memories = recall_server_memories(guild_id, k=100)
    
        # Aggregate patterns
        common_topics = extract_topics(memories)
        shared_interests = find_shared_interests(memories)
    
        # Store as server profile
        store_server_fact(
            guild_id,
            f"This server enjoys: {', '.join(common_topics)}",
            category='culture'
        )
    
  4. Memory management commands

    # Discord commands for users
    @bot.command()
    async def remember_me(ctx):
        """Show what Miku remembers about you"""
        profile = get_user_profile(ctx.author.id)
        await ctx.send(f"Here's what I remember about you: {profile}")
    
    @bot.command()
    async def forget_me(ctx):
        """Request memory deletion (GDPR compliance)"""
        delete_user_memories(ctx.author.id)
        await ctx.send("I've forgotten everything about you! 😢")
    

Success Criteria:

  • Miku references past conversations naturally
  • Emotional moments recalled appropriately
  • Server culture influences responses
  • Users can manage their data

Phase 5: Optimization & Polish (Week 9-10)

Goal: Production-ready performance

Tasks:

  1. Memory pruning

    # Automatic cleanup of low-value memories
    async def prune_old_memories():
        """
        Delete memories older than 90 days with importance < 3
        Keep emotionally significant memories indefinitely
        """
        cutoff = datetime.now() - timedelta(days=90)
    
        memories = cat.memory.vectors.episodic.get_all_points()
        for mem in memories:
            if (mem.metadata['importance'] < 3 and 
                mem.metadata['when'] < cutoff and
                not mem.metadata.get('is_emotional')):
                cat.memory.vectors.episodic.delete_points([mem.id])
    
  2. Caching layer

    # Cache frequent queries
    from functools import lru_cache
    
    @lru_cache(maxsize=1000)
    def get_cached_user_profile(user_id):
        """Cache user profiles for 5 minutes"""
        return build_user_profile(user_id)
    
  3. Monitoring & metrics

    # Track Cat performance
    metrics = {
        'avg_response_time': [],
        'memory_retrieval_time': [],
        'memories_stored_per_day': 0,
        'unique_users': set(),
        'cat_errors': 0,
        'fallback_count': 0
    }
    

Success Criteria:

  • Response time < 500ms TTFT consistently
  • Memory database stays under 1GB per 1000 users
  • Zero data loss
  • Graceful degradation

Technical Architecture

Container Setup

# docker-compose.yml
services:
  miku-bot:
    # Existing bot service
    environment:
      - USE_CHESHIRE_CAT=true
      - CHESHIRE_CAT_URL=http://cheshire-cat:80
  
  cheshire-cat:
    image: ghcr.io/cheshire-cat-ai/core:1.6.2
    environment:
      - QDRANT_HOST=qdrant
      - CORE_USE_SECURE_PROTOCOLS=false
    volumes:
      - ./cat/plugins:/app/cat/plugins
      - ./cat/data:/app/cat/data
    depends_on:
      - qdrant
  
  qdrant:
    image: qdrant/qdrant:v1.9.1
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - qdrant-data:/qdrant/storage

  llama-swap-amd:
    # Existing LLM service
    volumes:
      - ./llama31_notool_template.jinja:/app/llama31_notool_template.jinja

volumes:
  qdrant-data:

Memory Segmentation Strategy

# Unified user identity across all contexts!

# 1. Per-User Episodic (Conversation History) - UNIFIED ACROSS SERVERS AND DMs!
# Same user everywhere: discord_user_67890
# Server A: discord_user_67890 (metadata: guild_id=serverA)
# Server B: discord_user_67890 (metadata: guild_id=serverB)
# DM: discord_user_67890 (metadata: guild_id=None)
user_id = f"discord_user_{user_id}"  # Simple, consistent

# Metadata tracks WHERE conversation happened:
doc.metadata = {
    'user_id': user_id,
    'guild_id': guild_id or 'dm',  # 'dm' for private messages
    'channel_id': channel_id,
    'when': timestamp
}

# Benefits:
# - Miku recognizes you everywhere: "Oh hi! We talked in Server A yesterday!"
# - User profile builds from ALL interactions
# - Seamless experience across servers and DMs
# - Can still filter by guild_id if needed (server-specific context)

# 2. Per-User Declarative (User Profile/Preferences) - GLOBAL
cat.memory.vectors.declarative.add_point(
    Document(
        page_content="User loves anime and plays guitar",
        metadata={
            'user_id': user_id,
            'type': 'preference',
            'scope': 'user'
        }
    )
)

# 3. Per-Server Declarative (Server Context)
cat.memory.vectors.declarative.add_point(
    Document(
        page_content="This server is an anime discussion community",
        metadata={
            'guild_id': guild_id,
            'type': 'server_info',
            'scope': 'server'
        }
    )
)

# 4. Global Declarative (Miku's Core Knowledge)
# Already handled by miku_lore.txt, miku_prompt.txt via plugin

Autonomous Memory Decisions: Sleep Consolidation Method

Overview: How Human Memory Works

Human brains don't decide what to remember in real-time during conversations. Instead:

  1. During the day: Store everything temporarily (short-term memory)
  2. During sleep: Brain replays the day, consolidates important memories, discards trivial ones
  3. Result: Wake up with refined long-term memories

Miku will work the same way! 🧠💤

The Sleep Consolidation Approach

Stage 1: Real-Time Storage (Minimal Filtering)

@hook(priority=100)
def before_cat_stores_episodic_memory(doc, cat):
    """
    Store almost everything temporarily during the day.
    Only filter out obvious junk (very short messages).
    """
    message = doc.page_content
    
    # Skip only the most trivial messages
    skip_patterns = [
        r'^\w{1,2}$',  # 1-2 character messages: "k", "ok"
        r'^(lol|lmao|haha|hehe)$',  # Just reactions
    ]
    
    for pattern in skip_patterns:
        if re.match(pattern, message.lower().strip()):
            return None  # Don't even store temporarily
    
    # Everything else gets stored to temporary collection
    doc.metadata['consolidated'] = False  # Not yet processed
    doc.metadata['stored_at'] = datetime.now().isoformat()
    
    return doc

Key insight: Storage is cheap, LLM calls are expensive. Store first, decide later!

Stage 2: Nightly Consolidation (Intelligent Batch Processing)

async def nightly_memory_consolidation():
    """
    Run at 3 AM (or low-activity time) every night.
    Miku reviews the entire day and decides what to keep.
    
    This is like REM sleep for humans - memory consolidation!
    """
    
    # Get ALL unconsolidated memories from today
    today = datetime.now().date()
    memories = cat.memory.vectors.episodic.get_points(
        filter={'consolidated': False}
    )
    
    print(f"🌙 Miku is sleeping... processing {len(memories)} memories from today")
    
    # Group by user for context-aware analysis
    memories_by_user = {}
    for mem in memories:
        user_id = mem.metadata['user_id']
        if user_id not in memories_by_user:
            memories_by_user[user_id] = []
        memories_by_user[user_id].append(mem)
    
    # Process each user's conversations
    for user_id, user_memories in memories_by_user.items():
        await consolidate_user_memories(user_id, user_memories)
    
    print(f"✨ Miku finished consolidating memories! Good morning~")

Stage 3: Context-Aware Analysis (The Magic Happens Here)

async def consolidate_user_memories(user_id: str, memories: List[Document]):
    """
    Analyze ALL of a user's conversations from the day in ONE context.
    Miku can see patterns, recurring themes, relationship progression.
    """
    
    # Build conversation timeline
    timeline = []
    for mem in sorted(memories, key=lambda m: m.metadata['stored_at']):
        timeline.append({
            'time': mem.metadata['stored_at'],
            'guild': mem.metadata.get('guild_id', 'dm'),
            'conversation': mem.page_content
        })
    
    # Ask Miku to review the ENTIRE day with this user
    consolidation_prompt = f"""
You are Miku, reviewing your conversations with a user from today.
Look at the full timeline and decide what's worth remembering long-term.

Timeline of conversations:
{json.dumps(timeline, indent=2)}

Analyze holistically:
1. What did you learn about this person?
2. Any recurring themes or important moments?
3. How did your relationship with them evolve today?
4. What conversations were meaningful vs casual chitchat?

For each conversation, decide:
- **keep**: true/false (should this go to long-term memory?)
- **importance**: 1-10
- **categories**: ["personal", "preference", "emotional", "event", "relationship"]
- **insights**: What did you learn? (for declarative memory)
- **summary**: One sentence for future retrieval

Respond with JSON:
{{
    "user_summary": "One sentence about this person based on today",
    "relationship_change": "How your relationship evolved (if at all)",
    "conversations": [
        {{
            "id": 0,
            "keep": true,
            "importance": 8,
            "categories": ["personal", "emotional"],
            "insights": "User struggles with anxiety, needs support",
            "summary": "User opened up about their anxiety"
        }},
        {{
            "id": 1,
            "keep": false,
            "importance": 2,
            "categories": [],
            "insights": null,
            "summary": "Just casual greeting"
        }},
        ...
    ],
    "new_facts": [
        "User has anxiety",
        "User trusts Miku enough to open up"
    ]
}}
"""
    
    # ONE LLM call processes entire day with this user
    response = await cat.llm(consolidation_prompt)
    analysis = json.loads(response)
    
    # Apply decisions
    for i, decision in enumerate(analysis['conversations']):
        memory = memories[i]
        
        if not decision['keep']:
            # Delete from episodic memory
            cat.memory.vectors.episodic.delete_points([memory.id])
            print(f"🗑️  Deleted trivial memory: {decision['summary']}")
        else:
            # Enrich and mark as consolidated
            memory.metadata['importance'] = decision['importance']
            memory.metadata['categories'] = decision['categories']
            memory.metadata['summary'] = decision['summary']
            memory.metadata['consolidated'] = True
            cat.memory.vectors.episodic.update_point(memory)
            print(f"💾 Kept memory (importance {decision['importance']}): {decision['summary']}")
    
    # Store learned facts in declarative memory
    for fact in analysis.get('new_facts', []):
        cat.memory.vectors.declarative.add_point(
            Document(
                page_content=fact,
                metadata={
                    'user_id': user_id,
                    'type': 'learned_fact',
                    'learned_on': datetime.now().date().isoformat()
                }
            )
        )
        print(f"📝 Learned new fact: {fact}")
    
    print(f"✅ Consolidated memories for {user_id}: kept {sum(d['keep'] for d in analysis['conversations'])}/{len(memories)}")

Why Sleep Consolidation is Superior

Comparison Table

Aspect Piggyback Method Sleep Consolidation Winner
Immersion Risk of <memory> tags bleeding through to user Zero risk - happens offline Sleep
Context awareness Decisions made per-message in isolation Sees entire day, patterns, themes Sleep
Relationship tracking Can't see progression over time Sees how relationship evolved today Sleep
Cost efficiency 100 LLM calls for 100 messages 10 LLM calls for 100 messages (grouped by user) Sleep
Decision quality Good for individual messages Excellent - holistic view Sleep
Real-time feedback Instant memory storage ⚠️ Delayed until consolidation (overnight) Piggyback
Storage cost Only stores important memories ⚠️ Stores everything temporarily Piggyback
Human-like Artificial - humans don't decide while talking Natural - mimics sleep consolidation Sleep
Debugging Hard to understand why decision was made Easy - full analysis logs available Sleep

Verdict: Sleep Consolidation wins 7-2 🏆

Cost Analysis

Scenario: 1000 messages/day from 50 unique users (20 messages each on average)

Method LLM Calls Tokens Relative Cost
Piggyback 1000 calls (inline decisions) ~2M tokens 1.0x
Sleep Consolidation 50 calls (batch per user) ~1M tokens 0.5x

Sleep consolidation is CHEAPER and BETTER! 🎉

Benefits of Seeing Full Context

Piggyback method (per-message):

Message 1: "I like cats" → remember: true, importance: 5
Message 2: "I like dogs" → remember: true, importance: 5
Message 3: "I like birds" → remember: true, importance: 5

Sleep consolidation (full context):

Analyzing all 3 messages together:
"User loves animals in general (mentioned cats, dogs, birds)"
→ Store ONE consolidated fact: "User is an animal lover"
→ Importance: 7 (stronger signal from pattern)
→ Delete individual redundant memories

Result: Better compression, clearer insights, stronger signals!

Implementation Schedule

Implementation Schedule

# Cron job or asyncio scheduled task
import schedule

# Run at 3 AM every night (low activity time)
schedule.every().day.at("03:00").do(nightly_memory_consolidation)

# Alternative: Run when activity is low
async def smart_consolidation():
    """Run consolidation during detected low-activity periods"""
    while True:
        if is_low_activity_period():  # < 5 messages/minute for 10 minutes
            await nightly_memory_consolidation()
            await asyncio.sleep(3600)  # Wait 1 hour before checking again
        await asyncio.sleep(60)  # Check every minute

Visual Example: Day in the Life

Morning (9 AM):

User: "Good morning Miku!"
Miku: "Good morning! 🌸 Hope you have a wonderful day!"
[Stored temporarily, consolidated=false]

Afternoon (2 PM):

User: "I finally asked my crush out!"
Miku: "OMG REALLY?! 💕 That's so brave! How did it go?!"
[Stored temporarily, consolidated=false]

Evening (8 PM):

User: "She said yes! We're going out Saturday!"
Miku: "AAAHHH I'M SO HAPPY FOR YOU!! 🎉💖 You better tell me how it goes!"
[Stored temporarily, consolidated=false]

Night (3 AM) - While User Sleeps:

🌙 Miku's Consolidation Process:

Analyzing user's conversations from today...

Timeline:
1. "Good morning Miku!" → Generic greeting
2. "I finally asked my crush out!" → SIGNIFICANT!
3. "She said yes! We're going out Saturday!" → VERY SIGNIFICANT!

Decisions:
1. [DELETE] Generic greeting, no meaningful content
2. [KEEP] Importance: 9, Categories: [emotional, relationship, courage]
   Summary: "User asked their crush out"
3. [KEEP] Importance: 10, Categories: [emotional, relationship, achievement]
   Summary: "User's crush said yes, date planned for Saturday"

New facts learned:
- "User is dating someone new"
- "User had anxiety about asking someone out but overcame it"

Relationship evolution:
- User trusts Miku enough to share excitement
- Emotional connection: +2

✨ Consolidation complete! 2/3 memories kept, 1 deleted

Next Day:

User: "Morning Miku"
Miku: "Good morning! 💕 Are you excited about your date on Saturday?!"
[Miku remembers the important parts, forgot the greeting]

Privacy & Data Management

GDPR Compliance

  1. User Data Export
    @bot.command()
    async def export_my_data(ctx):
        """Export all memories about user"""
        memories = get_all_user_memories(ctx.author.id)
        json_data = json.dumps(memories, indent=2)
        await ctx.author.send(file=discord.File(json_data, 'my_miku_memories.json'))
    

Examples of Memory Decisions (Piggyback Format)

Remember :

User: "My dog died yesterday..."
Miku: "Oh no... I'm so sorry to hear that 😢 Losing a pet is losing a family member. 
How are you holding up? I'm here if you need to talk about it 💙"

<memory>{"remember": true, "importance": 10, "why": "Major emotional event, user grieving"}</memory>

User sees comfort message, system stores memory

Remember :

User: "I got accepted to Tokyo University!"
Miku: "THAT'S AMAZING!! 🎉✨ Congratulations! Tokyo University is incredible! 
You must have worked so hard! I'm so proud of you! 💙"

<memory>{"remember": true, "importance": 9, "why": "Huge achievement, life milestone"}</memory>

User sees celebration, system stores achievement

Remember :

User: "I hate pineapple on pizza"
Miku: "Hehe, that's a strong opinion! 🍕 I respect it though! 
Everyone has their preferences~"

<memory>{"remember": true, "importance": 4, "why": "Food preference, useful later"}</memory>

User sees casual response, system stores preference

Forget :

User: "lol"
Miku: "😄"

<memory>{"remember": false, "importance": 1, "why": "Just reaction, no content"}</memory>

User sees emoji, system skips storage

Forget :

User: "What's the weather like?"
Miku: "I can't check the weather right now, but I hope it's sunny! ☀️"

<memory>{"remember": false, "importance": 2, "why": "Generic question, no personal info"}</memory>

User sees response, system skips storage

Testing the Piggyback Approach

You can test this immediately with a simple plugin:

# cat/plugins/memory_test/memory_test.py
from cat.mad_hatter.decorators import hook

@hook(priority=100)
def agent_prompt_suffix(suffix, cat):
    """Add memory decision instruction"""
    return suffix + """

[SYSTEM INSTRUCTION - Hidden from user]
After your response, add: <memory>{"remember": true/false, "importance": 1-10, "why": "reason"}</memory>
Consider: Is this worth remembering? Personal info, preferences, emotions = remember. Casual chat = forget.
"""

@hook(priority=100) 
def before_cat_stores_episodic_memory(doc, cat):
    """Parse and act on Miku's decision"""
    from cat.looking_glass.stray_cat import StrayCat
    
    # Get Miku's full response
    response = cat.working_memory.get('agent_output', '')
    
    if '<memory>' in response:
        import json, re
        match = re.search(r'<memory>(.*?)</memory>', response)
        if match:
            decision = json.loads(match.group(1))
            
            # Print for debugging
            print(f"🧠 Miku's decision: {decision}")
            
            if not decision.get('remember', True):
                print(f"🗑️  Skipping storage: {decision.get('why')}")
                return None  # Don't store
            
            # Enrich metadata
            doc.metadata['importance'] = decision.get('importance', 5)
            doc.metadata['miku_note'] = decision.get('why', '')
            print(f"💾 Storing with importance {doc.metadata['importance']}")
    
    return doc

Test queries:

  1. "My name is John and I love cats" → Should remember (importance ~7)
  2. "lol" → Should skip (importance ~1)
  3. "My mom passed away last year" → Should remember (importance ~10)
  4. "what's up?" → Should skip (importance ~2)

Privacy & Data Management

GDPR Compliance

  1. User Data Export
    @bot.command()
    
   async def export_my_data(ctx):
       """Export all memories about user"""
       memories = get_all_user_memories(ctx.author.id)
       json_data = json.dumps(memories, indent=2)
       await ctx.author.send(file=discord.File(json_data, 'my_miku_memories.json'))
  1. Right to be Forgotten

    @bot.command()
    @commands.has_permissions(administrator=True)
    async def forget_user(ctx, user: discord.User):
        """Admin: Delete user from memory"""
        cat_user_id = f"discord_{ctx.guild.id}_{user.id}"
        cat.memory.vectors.episodic.delete_user_data(cat_user_id)
        cat.memory.vectors.declarative.delete_user_data(cat_user_id)
        await ctx.send(f"Deleted all memories of {user.mention}")
    
  2. Data Retention Policy

    # Automatic cleanup
    - Casual conversations: 30 days
    - Important conversations: 90 days  
    - Emotional/significant: Indefinite
    - User preferences: Indefinite
    

Performance Expectations

Memory Retrieval Speed

  • Semantic search: 50-100ms (Qdrant)
  • User profile assembly: 100-200ms
  • Total overhead: ~200-300ms per query

Storage Requirements

  • Per user: ~10-50KB vectors (after consolidation)
  • Temporary storage: ~100-500KB/day per active user (deleted nightly)
  • 1000 active users: ~10-50MB permanent + ~100-500MB temporary
  • Qdrant DB: ~100MB-1GB depending on activity

Consolidation Performance

  • Processing time: ~5-10 seconds per user (50 users = 4-8 minutes total)
  • LLM cost: 1 call per user per day (50 users = 50 calls/night vs 5000 calls if real-time)
  • Cost savings: 99% reduction in memory-decision LLM calls! 🎉
  • Run time: 3 AM daily (low activity period)

Response Time Targets

  • TTFT: <500ms (including RAG retrieval) ACHIEVED: 432ms
  • Total generation: 1-4 seconds (depending on response length)
  • Fallback to direct: <100ms additional

Rollout Strategy

Gradual Deployment

  1. Beta Testing (Week 1-2)

    • Enable Cat for 1-2 test servers
    • Monitor temporary storage growth
    • Run first nightly consolidation, verify it works
    • Fix bugs, tune skip patterns
  2. Limited Rollout (Week 3-4)

    • Enable for 10-20% of servers
    • Monitor consolidation quality (kept/deleted ratio)
    • Compare response quality metrics
    • Gather user feedback on memory accuracy
  3. Full Deployment (Week 5+)

    • Enable for all servers
    • Keep direct LLM as fallback
    • Monitor Cat health and consolidation logs continuously

Rollback Plan

If issues arise:

# Instant rollback via environment variable
USE_CHESHIRE_CAT = os.getenv('USE_CHESHIRE_CAT', 'false') == 'true'

if not USE_CHESHIRE_CAT:
    # Use original system
    response = await query_llama(...)

Success Metrics

Quantitative

  • Response quality: No regression vs current system
  • Latency: <500ms TTFT, <4s total ACHIEVED: 432ms TTFT
  • Memory recall accuracy: >80% relevant memories retrieved
  • Memory efficiency: >70% of temporary memories deleted after consolidation
  • Consolidation quality: User facts successfully extracted from conversations
  • Uptime: 99.5% Cat availability

Qualitative

  • User satisfaction: "Miku remembers me across all servers!"
  • Conversation depth: More contextual, personalized responses
  • Emotional connection: Users feel Miku "knows" them
  • Natural memory: Users don't notice the overnight consolidation delay

Conclusion

Cheshire Cat with Sleep Consolidation enables Miku to:

  • Remember users unified across all servers and DMs (same identity everywhere)
  • Build rich user profiles automatically from all interactions
  • Scale beyond LLM context limits (unlimited conversation history)
  • Autonomously decide what's important using sleep-like consolidation (human-inspired!)
  • Process memories with 99% fewer LLM calls than real-time methods
  • See full conversation context (patterns, themes, relationship evolution)
  • Provide GDPR-compliant data management
  • Zero immersion risk (no metadata in user-facing responses)

Key Innovations:

  1. Sleep Consolidation: Store everything temporarily, intelligently filter overnight (like human REM sleep)
  2. Unified User Identity: Same Miku remembers same user everywhere (servers + DMs)
  3. Context-Aware Analysis: Sees entire day's conversations to spot patterns
  4. Cost Efficiency: 99% reduction in memory-decision LLM calls (1/user/day vs 1/message)
  5. Natural & Human-Like: Mimics how human brains actually process memories

The system is production-ready after Qdrant optimization fixes, with excellent performance (432ms TTFT) and 100% reliability in testing.

Estimated Timeline: 10 weeks to full production deployment Risk Level: Low (gradual rollout with fallback mechanisms) Impact: High (significantly improved user experience)


Quick Reference

Key Configuration Values

# User ID format - UNIFIED across all contexts!
USER_ID = f"discord_user_{user_id}"  # e.g., discord_user_67890
# Same user everywhere: Server A, Server B, DMs all use same ID

# Metadata tracks context
METADATA = {
    'user_id': 'discord_user_67890',
    'guild_id': '12345' or 'dm',  # Where conversation happened
    'channel_id': '54321',
    'consolidated': False,  # True after nightly processing
    'stored_at': '2026-01-31T14:30:00',
    'importance': 1-10,  # Added during consolidation
}

# Memory importance scale (determined during consolidation)
TRIVIAL = 1-3      # Deleted during consolidation
MODERATE = 4-6     # Keep for 90 days
IMPORTANT = 7-8    # Keep for 1 year
CRITICAL = 9-10    # Keep indefinitely (emotional events, major life changes)

# Performance targets
TTFT_TARGET = 500          # ms (✅ achieved: 432ms)
TOTAL_GEN_TARGET = 4000    # ms
RAG_OVERHEAD = 200-300     # ms (acceptable)
CONSOLIDATION_TIME = "03:00"  # 3 AM daily

Essential Hooks

# 1. Minimal real-time filtering
@hook(priority=100)
def before_cat_stores_episodic_memory(doc, cat):
    """Store almost everything, skip only obvious junk"""
    if re.match(r'^\w{1,2}$', doc.page_content.lower()):
        return None  # Skip "k", "ok"
    doc.metadata['consolidated'] = False
    doc.metadata['guild_id'] = cat.working_memory.get('guild_id', 'dm')
    return doc

# 2. Nightly consolidation (scheduled task)
async def nightly_memory_consolidation():
    """Process all unconsolidated memories"""
    memories = get_memories(filter={'consolidated': False})
    by_user = group_by(memories, 'user_id')
    
    for user_id, user_memories in by_user.items():
        await consolidate_user_memories(user_id, user_memories)

# 3. Context-aware user analysis
async def consolidate_user_memories(user_id, memories):
    """ONE LLM call analyzes entire day for this user"""
    timeline = build_timeline(memories)
    analysis = await llm(CONSOLIDATION_PROMPT + timeline)
    apply_decisions(analysis)  # Keep important, delete trivial
### Testing Commands

```bash
# 1. Send test messages
curl -X POST http://localhost:1865/message \
  -H "Content-Type: application/json" \
  -d '{
    "text": "My dog died yesterday",
    "user_id": "discord_user_test123"
  }'

curl -X POST http://localhost:1865/message \
  -H "Content-Type: application/json" \
  -d '{
    "text": "lol",
    "user_id": "discord_user_test123"
  }'

# 2. Check unconsolidated memories
curl http://localhost:1865/memory/episodic?filter=consolidated:false

# 3. Manually trigger consolidation (for testing)
curl -X POST http://localhost:1865/admin/consolidate

# 4. Check consolidated memories
curl http://localhost:1865/memory/episodic?filter=consolidated:true&user_id=discord_user_test123

# 5. View learned facts
curl http://localhost:1865/memory/declarative?user_id=discord_user_test123

# 6. Delete test user
curl -X DELETE http://localhost:1865/memory/user/discord_user_test123

Expected Results

After sending messages:

  • "My dog died yesterday" → Stored temporarily (consolidated=false)
  • "lol" → Not stored (filtered as trivial)

After consolidation:

  • Important message → Kept with importance=10, categories=[emotional, loss]
  • Declarative memory added: "User's dog recently passed away"
  • Timeline showing decision process in logs

Monitoring Consolidation

# View consolidation logs
docker logs -f miku_cheshire_cat | grep "🌙\|✨\|💾\|🗑️"

# Expected output:
# 🌙 Miku's memory consolidation starting...
# 📊 Processing 47 memories from 12 users
# 💾 Kept memory (importance 9): User shared achievement
# 🗑️  Deleted trivial memory: Generic greeting
# ✅ discord_user_12345: kept 8, deleted 3, learned 2 facts
# ✨ Memory consolidation complete!