Files
miku-discord/readmes/COGNEE_INTEGRATION_PLAN.md
koko210Serve 6ec33bcecb Implement Evil Miku mode with persistence, fix API event loop issues, and improve formatting
- Added Evil Miku mode with 4 evil moods (aggressive, cunning, sarcastic, evil_neutral)
- Created evil mode content files (evil_miku_lore.txt, evil_miku_prompt.txt, evil_miku_lyrics.txt)
- Implemented persistent evil mode state across restarts (saves to memory/evil_mode_state.json)
- Fixed API endpoints to use client.loop.create_task() to prevent timeout errors
- Added evil mode toggle in web UI with red theme styling
- Modified mood rotation to handle evil mode
- Configured DarkIdol uncensored model for evil mode text generation
- Reduced system prompt redundancy by removing duplicate content
- Added markdown escape for single asterisks (actions) while preserving bold formatting
- Evil mode now persists username, pfp, and nicknames across restarts without re-applying changes
2026-01-02 17:11:58 +02:00

23 KiB

Cognee Long-Term Memory Integration Plan

Executive Summary

Goal: Add long-term memory capabilities to Miku using Cognee while keeping the existing fast, JSON-based short-term system.

Strategy: Hybrid two-tier memory architecture

  • Tier 1 (Hot): Current system - 8 messages in-memory, JSON configs (0-5ms latency)
  • Tier 2 (Cold): Cognee - Long-term knowledge graph + vectors (50-200ms latency)

Result: Best of both worlds - fast responses with deep memory when needed.


Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     Discord Event                            │
│              (Message, Reaction, Presence)                   │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
         ┌─────────────────────────────┐
         │   Short-Term Memory (Fast)   │
         │  - Last 8 messages          │
         │  - Current mood             │
         │  - Active context           │
         │  Latency: ~2-5ms            │
         └─────────────┬───────────────┘
                       │
                       ▼
              ┌────────────────┐
              │  LLM Response   │
              └────────┬───────┘
                       │
         ┌─────────────┴─────────────┐
         │                           │
         ▼                           ▼
┌────────────────┐         ┌─────────────────┐
│ Send to Discord│         │  Background Job  │
└────────────────┘         │  Async Ingestion │
                           │  to Cognee       │
                           │  Latency: N/A    │
                           │  (non-blocking)  │
                           └─────────┬────────┘
                                     │
                                     ▼
                           ┌──────────────────────┐
                           │  Long-Term Memory     │
                           │  (Cognee)            │
                           │  - Knowledge graph   │
                           │  - User preferences  │
                           │  - Entity relations  │
                           │  - Historical facts  │
                           │  Query: 50-200ms     │
                           └──────────────────────┘

Performance Analysis

Current System Baseline

# Short-term memory (in-memory)
conversation_history.add_message(...)      # ~0.1ms
messages = conversation_history.format()   # ~2ms
JSON config read/write                      # ~1-3ms
Total per response: ~5-10ms

Cognee Overhead (Estimated)

1. Write Operations (Background - Non-blocking)

# These run asynchronously AFTER Discord message is sent
await cognee.add(message_text)        # 20-50ms
await cognee.cognify()                # 100-500ms (graph processing)

Impact on user: NONE - Happens in background

2. Read Operations (When querying long-term memory)

# Only triggered when deep memory is needed
results = await cognee.search(query)  # 50-200ms

Impact on user: ⚠️ Adds 50-200ms to response time (only when used)

Mitigation Strategies

def should_query_long_term_memory(user_prompt: str, context: dict) -> bool:
    """
    Decide if we need deep memory BEFORE querying Cognee.
    Fast heuristic checks (< 1ms).
    """
    # Triggers for long-term memory:
    triggers = [
        "remember when",
        "you said",
        "last week",
        "last month",
        "you told me",
        "what did i say about",
        "do you recall",
        "preference",
        "favorite",
    ]
    
    prompt_lower = user_prompt.lower()
    
    # 1. Explicit memory queries
    if any(trigger in prompt_lower for trigger in triggers):
        return True
    
    # 2. Short-term context is insufficient
    if context.get('messages_in_history', 0) < 3:
        return False  # Not enough history to need deep search
    
    # 3. Question about user preferences
    if '?' in user_prompt and any(word in prompt_lower for word in ['like', 'prefer', 'think']):
        return True
    
    return False

Strategy 2: Parallel Processing

async def query_with_hybrid_memory(prompt, user_id, guild_id):
    """Query both memory tiers in parallel when needed."""
    
    # Always get short-term (fast)
    short_term = conversation_history.format_for_llm(channel_id)
    
    # Decide if we need long-term
    if should_query_long_term_memory(prompt, context):
        # Query both in parallel
        long_term_task = asyncio.create_task(cognee.search(prompt))
        
        # Don't wait - continue with short-term
        # Only await long-term if it's ready quickly
        try:
            long_term = await asyncio.wait_for(long_term_task, timeout=0.15)  # 150ms max
        except asyncio.TimeoutError:
            long_term = None  # Fallback - proceed without deep memory
    else:
        long_term = None
    
    # Combine contexts
    combined_context = merge_contexts(short_term, long_term)
    
    return await llm_query(combined_context)

Strategy 3: Caching Layer

from functools import lru_cache
from datetime import datetime, timedelta

# Cache frequent queries for 5 minutes
_cognee_cache = {}
_cache_ttl = timedelta(minutes=5)

async def cached_cognee_search(query: str):
    """Cache Cognee results to avoid repeated queries."""
    cache_key = query.lower().strip()
    now = datetime.now()
    
    if cache_key in _cognee_cache:
        result, timestamp = _cognee_cache[cache_key]
        if now - timestamp < _cache_ttl:
            print(f"🎯 Cache hit for: {query[:50]}...")
            return result
    
    # Cache miss - query Cognee
    result = await cognee.search(query)
    _cognee_cache[cache_key] = (result, now)
    
    return result

Strategy 4: Tiered Response Times

# Set different response strategies based on context
RESPONSE_MODES = {
    "instant": {
        "use_long_term": False,
        "max_latency": 100,  # ms
        "contexts": ["reactions", "quick_replies"]
    },
    "normal": {
        "use_long_term": "conditional",  # Only if triggers match
        "max_latency": 300,  # ms
        "contexts": ["server_messages", "dm_casual"]
    },
    "deep": {
        "use_long_term": True,
        "max_latency": 1000,  # ms
        "contexts": ["dm_deep_conversation", "user_questions"]
    }
}

Integration Points

1. Message Ingestion (Background - Non-blocking)

Location: bot/bot.py - on_message event

@globals.client.event
async def on_message(message):
    # ... existing message handling ...
    
    # After Miku responds, ingest to Cognee (non-blocking)
    asyncio.create_task(ingest_to_cognee(
        message=message,
        response=miku_response,
        guild_id=message.guild.id if message.guild else None
    ))
    
    # Continue immediately - don't wait

Implementation: New file bot/utils/cognee_integration.py

async def ingest_to_cognee(message, response, guild_id):
    """
    Background task to add conversation to long-term memory.
    Non-blocking - runs after Discord message is sent.
    """
    try:
        # Build rich context document
        doc = {
            "timestamp": datetime.now().isoformat(),
            "user_id": str(message.author.id),
            "user_name": message.author.display_name,
            "guild_id": str(guild_id) if guild_id else None,
            "message": message.content,
            "miku_response": response,
            "mood": get_current_mood(guild_id),
        }
        
        # Add to Cognee (async)
        await cognee.add([
            f"User {doc['user_name']} said: {doc['message']}",
            f"Miku responded: {doc['miku_response']}"
        ])
        
        # Process into knowledge graph
        await cognee.cognify()
        
        print(f"✅ Ingested to Cognee: {message.id}")
        
    except Exception as e:
        print(f"⚠️ Cognee ingestion failed (non-critical): {e}")

2. Query Enhancement (Conditional)

Location: bot/utils/llm.py - query_llama function

async def query_llama(user_prompt, user_id, guild_id=None, ...):
    # Get short-term context (always)
    short_term = conversation_history.format_for_llm(channel_id, max_messages=8)
    
    # Check if we need long-term memory
    long_term_context = None
    if should_query_long_term_memory(user_prompt, {"guild_id": guild_id}):
        try:
            # Query Cognee with timeout
            long_term_context = await asyncio.wait_for(
                cognee_integration.search_long_term_memory(user_prompt, user_id, guild_id),
                timeout=0.15  # 150ms max
            )
        except asyncio.TimeoutError:
            print("⏱️ Long-term memory query timeout - proceeding without")
        except Exception as e:
            print(f"⚠️ Long-term memory error: {e}")
    
    # Build messages for LLM
    messages = short_term  # Always use short-term
    
    # Inject long-term context if available
    if long_term_context:
        messages.insert(0, {
            "role": "system",
            "content": f"[Long-term memory context]: {long_term_context}"
        })
    
    # ... rest of existing LLM query code ...

3. Autonomous Actions Integration

Location: bot/utils/autonomous.py

async def autonomous_tick_v2(guild_id: int):
    """Enhanced with long-term memory awareness."""
    
    # Get decision from autonomous engine (existing fast logic)
    action_type = autonomous_engine.should_take_action(guild_id)
    
    if action_type is None:
        return
    
    # ENHANCEMENT: Check if action should use long-term context
    context = {}
    
    if action_type in ["engage_user", "join_conversation"]:
        # Get recent server activity from Cognee
        try:
            context["recent_topics"] = await asyncio.wait_for(
                cognee_integration.get_recent_topics(guild_id, hours=24),
                timeout=0.1  # 100ms max - this is background
            )
        except asyncio.TimeoutError:
            pass  # Proceed without - autonomous actions are best-effort
    
    # Execute action with enhanced context
    if action_type == "engage_user":
        await miku_engage_random_user_for_server(guild_id, context=context)
    
    # ... rest of existing action execution ...

4. User Preference Tracking

New Feature: Learn user preferences over time

# bot/utils/cognee_integration.py

async def extract_and_store_preferences(message, response):
    """
    Extract user preferences from conversations and store in Cognee.
    Runs in background - doesn't block responses.
    """
    # Simple heuristic extraction (can be enhanced with LLM later)
    preferences = extract_preferences_simple(message.content)
    
    if preferences:
        for pref in preferences:
            await cognee.add([{
                "type": "user_preference",
                "user_id": str(message.author.id),
                "preference": pref["category"],
                "value": pref["value"],
                "context": message.content[:200],
                "timestamp": datetime.now().isoformat()
            }])

def extract_preferences_simple(text: str) -> list:
    """Fast pattern matching for common preferences."""
    prefs = []
    text_lower = text.lower()
    
    # Pattern: "I love/like/prefer X"
    if "i love" in text_lower or "i like" in text_lower:
        # Extract what they love/like
        # ... simple parsing logic ...
        pass
    
    # Pattern: "my favorite X is Y"
    if "favorite" in text_lower:
        # ... extraction logic ...
        pass
    
    return prefs

Docker Compose Integration

Add Cognee Services

# Add to docker-compose.yml

  cognee-db:
    image: postgres:15-alpine
    container_name: cognee-db
    environment:
      - POSTGRES_USER=cognee
      - POSTGRES_PASSWORD=cognee_pass
      - POSTGRES_DB=cognee
    volumes:
      - cognee_postgres_data:/var/lib/postgresql/data
    restart: unless-stopped
    profiles:
      - cognee  # Optional profile - enable with --profile cognee

  cognee-neo4j:
    image: neo4j:5-community
    container_name: cognee-neo4j
    environment:
      - NEO4J_AUTH=neo4j/cognee_pass
      - NEO4J_PLUGINS=["apoc"]
    ports:
      - "7474:7474"  # Neo4j Browser (optional)
      - "7687:7687"  # Bolt protocol
    volumes:
      - cognee_neo4j_data:/data
    restart: unless-stopped
    profiles:
      - cognee

volumes:
  cognee_postgres_data:
  cognee_neo4j_data:

Update Miku Bot Service

  miku-bot:
    # ... existing config ...
    environment:
      # ... existing env vars ...
      - COGNEE_ENABLED=true
      - COGNEE_DB_URL=postgresql://cognee:cognee_pass@cognee-db:5432/cognee
      - COGNEE_NEO4J_URL=bolt://cognee-neo4j:7687
      - COGNEE_NEO4J_USER=neo4j
      - COGNEE_NEO4J_PASSWORD=cognee_pass
    depends_on:
      - llama-swap
      - cognee-db
      - cognee-neo4j

Performance Benchmarks (Estimated)

Without Cognee (Current)

User message → Discord event → Short-term lookup (5ms) → LLM query (2000ms) → Response
Total: ~2005ms (LLM dominates)

With Cognee (Instant Mode - No long-term query)

User message → Discord event → Short-term lookup (5ms) → LLM query (2000ms) → Response
Background: Cognee ingestion (150ms) - non-blocking
Total: ~2005ms (no change - ingestion is background)

With Cognee (Deep Memory Mode - User asks about past)

User message → Discord event → Short-term (5ms) + Long-term query (150ms) → LLM query (2000ms) → Response
Total: ~2155ms (+150ms overhead, but only when explicitly needed)

Autonomous Actions (Background)

Autonomous tick → Decision (5ms) → Get topics from Cognee (100ms) → Generate message (2000ms) → Post
Total: ~2105ms (+100ms, but autonomous actions are already async)

Feature Enhancements Enabled by Cognee

1. User Memory

# User asks: "What's my favorite anime?"
# Cognee searches: All messages from user mentioning "favorite" + "anime"
# Returns: "You mentioned loving Steins;Gate in a conversation 3 weeks ago"
# Autonomous action: Join conversation
# Cognee query: "What topics have been trending in this server this week?"
# Returns: ["gaming", "anime recommendations", "music production"]
# Miku: "I've noticed you all have been talking about anime a lot lately! Any good recommendations?"

3. Relationship Tracking

# Knowledge graph tracks:
# User A → likes → "cats"
# User B → dislikes → "cats"
# User A → friends_with → User B

# When Miku talks to both: Avoids cat topics to prevent friction

4. Event Recall

# User: "Remember when we talked about that concert?"
# Cognee searches: Conversations with this user + keyword "concert"
# Returns: "Yes! You were excited about the Miku Expo in Los Angeles in July!"

5. Mood Pattern Analysis

# Query Cognee: "When does this server get most active?"
# Returns: "Evenings between 7-10 PM, discussions about gaming"
# Autonomous engine: Schedule more engagement during peak times

Implementation Phases

Phase 1: Foundation (Week 1)

  • Add Cognee to requirements.txt
  • Create bot/utils/cognee_integration.py
  • Set up Docker services (PostgreSQL, Neo4j)
  • Basic initialization and health checks
  • Test ingestion in background (non-blocking)

Phase 2: Basic Integration (Week 2)

  • Add background ingestion to on_message
  • Implement should_query_long_term_memory() heuristics
  • Add conditional long-term queries to query_llama()
  • Add caching layer
  • Monitor latency impact

Phase 3: Advanced Features (Week 3)

  • User preference extraction
  • Topic trend analysis for autonomous actions
  • Relationship tracking between users
  • Event recall capabilities

Phase 4: Optimization (Week 4)

  • Fine-tune timeout thresholds
  • Implement smart caching strategies
  • Add Cognee query statistics to dashboard
  • Performance benchmarking and tuning

Configuration Management

Keep JSON Files (Hot Config)

# These remain JSON for instant access:
- servers_config.json       # Current mood, sleep state, settings
- autonomous_context.json   # Real-time autonomous state
- blocked_users.json        # Security/moderation
- figurine_subscribers.json # Active subscriptions

# Reason: Need instant read/write, changed frequently

Migrate to Cognee (Historical Data)

# These can move to Cognee over time:
- Full DM history (dms/*.json)  Cognee knowledge graph
- Profile picture metadata  Cognee (searchable by mood)
- Reaction logs  Cognee (analyze patterns)

# Reason: Historical, queried infrequently, benefit from graph relationships

Hybrid Approach

// servers_config.json - Keep recent data
{
  "guild_id": 123,
  "current_mood": "bubbly",
  "is_sleeping": false,
  "recent_topics": ["cached", "from", "cognee"]  // Cache Cognee query results
}

Monitoring & Observability

Add Performance Tracking

# bot/utils/cognee_integration.py

import time
from dataclasses import dataclass
from typing import Optional

@dataclass
class CogneeMetrics:
    """Track Cognee performance."""
    total_queries: int = 0
    cache_hits: int = 0
    cache_misses: int = 0
    avg_query_time: float = 0.0
    timeouts: int = 0
    errors: int = 0
    background_ingestions: int = 0

cognee_metrics = CogneeMetrics()

async def search_long_term_memory(query: str, user_id: str, guild_id: Optional[int]) -> str:
    """Search with metrics tracking."""
    start = time.time()
    cognee_metrics.total_queries += 1
    
    try:
        result = await cached_cognee_search(query)
        
        elapsed = time.time() - start
        cognee_metrics.avg_query_time = (
            (cognee_metrics.avg_query_time * (cognee_metrics.total_queries - 1) + elapsed) 
            / cognee_metrics.total_queries
        )
        
        return result
        
    except asyncio.TimeoutError:
        cognee_metrics.timeouts += 1
        raise
    except Exception as e:
        cognee_metrics.errors += 1
        raise

Dashboard Integration

Add to bot/api.py:

@app.get("/cognee/metrics")
def get_cognee_metrics():
    """Get Cognee performance metrics."""
    from utils.cognee_integration import cognee_metrics
    
    return {
        "enabled": globals.COGNEE_ENABLED,
        "total_queries": cognee_metrics.total_queries,
        "cache_hit_rate": (
            cognee_metrics.cache_hits / cognee_metrics.total_queries 
            if cognee_metrics.total_queries > 0 else 0
        ),
        "avg_query_time_ms": cognee_metrics.avg_query_time * 1000,
        "timeouts": cognee_metrics.timeouts,
        "errors": cognee_metrics.errors,
        "background_ingestions": cognee_metrics.background_ingestions
    }

Risk Mitigation

Risk 1: Cognee Service Failure

Mitigation: Graceful degradation

if not cognee_available():
    # Fall back to short-term memory only
    # Bot continues functioning normally
    return short_term_context_only

Risk 2: Increased Latency

Mitigation: Aggressive timeouts + caching

MAX_COGNEE_QUERY_TIME = 150  # ms
# If timeout, proceed without long-term context

Risk 3: Storage Growth

Mitigation: Data retention policies

# Auto-cleanup old data from Cognee
# Keep: Last 90 days of conversations
# Archive: Older data to cold storage

Risk 4: Context Pollution

Mitigation: Relevance scoring

# Only inject Cognee results if confidence > 0.7
if cognee_result.score < 0.7:
    # Too irrelevant - don't add to context
    pass

Cost-Benefit Analysis

Benefits

Deep Memory: Recall conversations from weeks/months ago User Preferences: Remember what users like/dislike Smarter Autonomous: Context-aware engagement Relationship Graph: Understand user dynamics No User Impact: Background ingestion, conditional queries Scalable: Handles unlimited conversation history

Costs

⚠️ Complexity: +2 services (PostgreSQL, Neo4j) ⚠️ Storage: ~100MB-1GB per month (depending on activity) ⚠️ Latency: +50-150ms when querying (conditional) ⚠️ Memory: +500MB RAM for Neo4j, +200MB for PostgreSQL ⚠️ Maintenance: Additional service to monitor

Verdict

Worth it if:

  • Your servers have active, long-running conversations
  • Users want Miku to remember personal details
  • You want smarter autonomous behavior based on trends

Skip it if:

  • Conversations are mostly one-off interactions
  • Current 8-message context is sufficient
  • Hardware resources are limited

Quick Start Commands

1. Enable Cognee

# Start with Cognee services
docker-compose --profile cognee up -d

# Check Cognee health
docker-compose logs cognee-neo4j
docker-compose logs cognee-db

2. Test Integration

# In Discord, test long-term memory:
User: "Remember that I love cats"
Miku: "Got it! I'll remember that you love cats! 🐱"

# Later...
User: "What do I love?"
Miku: "You told me you love cats! 🐱"

3. Monitor Performance

# Check metrics via API
curl http://localhost:3939/cognee/metrics

# View Cognee dashboard (optional)
# Open browser: http://localhost:7474 (Neo4j Browser)

Conclusion

Recommended Approach: Implement Phase 1-2 first, then evaluate based on real usage patterns.

Expected Latency Impact:

  • 95% of messages: 0ms (background ingestion only)
  • 5% of messages: +50-150ms (when long-term memory explicitly needed)

Key Success Factors:

  1. Keep JSON configs for hot data
  2. Background ingestion (non-blocking)
  3. Conditional long-term queries only
  4. Aggressive timeouts (150ms max)
  5. Caching layer for repeated queries
  6. Graceful degradation on failure

This hybrid approach gives you deep memory capabilities without sacrificing the snappy response times users expect from Discord bots.