Files

koko210Serve 6ec33bcecb Implement Evil Miku mode with persistence, fix API event loop issues, and improve formatting

- Added Evil Miku mode with 4 evil moods (aggressive, cunning, sarcastic, evil_neutral)
- Created evil mode content files (evil_miku_lore.txt, evil_miku_prompt.txt, evil_miku_lyrics.txt)
- Implemented persistent evil mode state across restarts (saves to memory/evil_mode_state.json)
- Fixed API endpoints to use client.loop.create_task() to prevent timeout errors
- Added evil mode toggle in web UI with red theme styling
- Modified mood rotation to handle evil mode
- Configured DarkIdol uncensored model for evil mode text generation
- Reduced system prompt redundancy by removing duplicate content
- Added markdown escape for single asterisks (actions) while preserving bold formatting
- Evil mode now persists username, pfp, and nicknames across restarts without re-applying changes

2026-01-02 17:11:58 +02:00

23 KiB

Raw Blame History

Cognee Long-Term Memory Integration Plan

Executive Summary

Goal: Add long-term memory capabilities to Miku using Cognee while keeping the existing fast, JSON-based short-term system.

Strategy: Hybrid two-tier memory architecture

Tier 1 (Hot): Current system - 8 messages in-memory, JSON configs (0-5ms latency)
Tier 2 (Cold): Cognee - Long-term knowledge graph + vectors (50-200ms latency)

Result: Best of both worlds - fast responses with deep memory when needed.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     Discord Event                            │
│              (Message, Reaction, Presence)                   │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
         ┌─────────────────────────────┐
         │   Short-Term Memory (Fast)   │
         │  - Last 8 messages          │
         │  - Current mood             │
         │  - Active context           │
         │  Latency: ~2-5ms            │
         └─────────────┬───────────────┘
                       │
                       ▼
              ┌────────────────┐
              │  LLM Response   │
              └────────┬───────┘
                       │
         ┌─────────────┴─────────────┐
         │                           │
         ▼                           ▼
┌────────────────┐         ┌─────────────────┐
│ Send to Discord│         │  Background Job  │
└────────────────┘         │  Async Ingestion │
                           │  to Cognee       │
                           │  Latency: N/A    │
                           │  (non-blocking)  │
                           └─────────┬────────┘
                                     │
                                     ▼
                           ┌──────────────────────┐
                           │  Long-Term Memory     │
                           │  (Cognee)            │
                           │  - Knowledge graph   │
                           │  - User preferences  │
                           │  - Entity relations  │
                           │  - Historical facts  │
                           │  Query: 50-200ms     │
                           └──────────────────────┘

Performance Analysis

Current System Baseline

# Short-term memory (in-memory)
conversation_history.add_message(...)      # ~0.1ms
messages = conversation_history.format()   # ~2ms
JSON config read/write                      # ~1-3ms
Total per response: ~5-10ms

Cognee Overhead (Estimated)

1. Write Operations (Background - Non-blocking)

# These run asynchronously AFTER Discord message is sent
await cognee.add(message_text)        # 20-50ms
await cognee.cognify()                # 100-500ms (graph processing)

Impact on user: ✅ NONE - Happens in background

2. Read Operations (When querying long-term memory)

# Only triggered when deep memory is needed
results = await cognee.search(query)  # 50-200ms

Impact on user: ⚠️ Adds 50-200ms to response time (only when used)

Mitigation Strategies

Strategy 1: Intelligent Query Decision (Recommended)

def should_query_long_term_memory(user_prompt: str, context: dict) -> bool:
    """
    Decide if we need deep memory BEFORE querying Cognee.
    Fast heuristic checks (< 1ms).
    """
    # Triggers for long-term memory:
    triggers = [
        "remember when",
        "you said",
        "last week",
        "last month",
        "you told me",
        "what did i say about",
        "do you recall",
        "preference",
        "favorite",
    ]
    
    prompt_lower = user_prompt.lower()
    
    # 1. Explicit memory queries
    if any(trigger in prompt_lower for trigger in triggers):
        return True
    
    # 2. Short-term context is insufficient
    if context.get('messages_in_history', 0) < 3:
        return False  # Not enough history to need deep search
    
    # 3. Question about user preferences
    if '?' in user_prompt and any(word in prompt_lower for word in ['like', 'prefer', 'think']):
        return True
    
    return False

Strategy 2: Parallel Processing

async def query_with_hybrid_memory(prompt, user_id, guild_id):
    """Query both memory tiers in parallel when needed."""
    
    # Always get short-term (fast)
    short_term = conversation_history.format_for_llm(channel_id)
    
    # Decide if we need long-term
    if should_query_long_term_memory(prompt, context):
        # Query both in parallel
        long_term_task = asyncio.create_task(cognee.search(prompt))
        
        # Don't wait - continue with short-term
        # Only await long-term if it's ready quickly
        try:
            long_term = await asyncio.wait_for(long_term_task, timeout=0.15)  # 150ms max
        except asyncio.TimeoutError:
            long_term = None  # Fallback - proceed without deep memory
    else:
        long_term = None
    
    # Combine contexts
    combined_context = merge_contexts(short_term, long_term)
    
    return await llm_query(combined_context)

Strategy 3: Caching Layer

from functools import lru_cache
from datetime import datetime, timedelta

# Cache frequent queries for 5 minutes
_cognee_cache = {}
_cache_ttl = timedelta(minutes=5)

async def cached_cognee_search(query: str):
    """Cache Cognee results to avoid repeated queries."""
    cache_key = query.lower().strip()
    now = datetime.now()
    
    if cache_key in _cognee_cache:
        result, timestamp = _cognee_cache[cache_key]
        if now - timestamp < _cache_ttl:
            print(f"🎯 Cache hit for: {query[:50]}...")
            return result
    
    # Cache miss - query Cognee
    result = await cognee.search(query)
    _cognee_cache[cache_key] = (result, now)
    
    return result

Strategy 4: Tiered Response Times

# Set different response strategies based on context
RESPONSE_MODES = {
    "instant": {
        "use_long_term": False,
        "max_latency": 100,  # ms
        "contexts": ["reactions", "quick_replies"]
    },
    "normal": {
        "use_long_term": "conditional",  # Only if triggers match
        "max_latency": 300,  # ms
        "contexts": ["server_messages", "dm_casual"]
    },
    "deep": {
        "use_long_term": True,
        "max_latency": 1000,  # ms
        "contexts": ["dm_deep_conversation", "user_questions"]
    }
}

Integration Points

1. Message Ingestion (Background - Non-blocking)

Location: bot/bot.py - on_message event

@globals.client.event
async def on_message(message):
    # ... existing message handling ...
    
    # After Miku responds, ingest to Cognee (non-blocking)
    asyncio.create_task(ingest_to_cognee(
        message=message,
        response=miku_response,
        guild_id=message.guild.id if message.guild else None
    ))
    
    # Continue immediately - don't wait

Implementation: New file bot/utils/cognee_integration.py

async def ingest_to_cognee(message, response, guild_id):
    """
    Background task to add conversation to long-term memory.
    Non-blocking - runs after Discord message is sent.
    """
    try:
        # Build rich context document
        doc = {
            "timestamp": datetime.now().isoformat(),
            "user_id": str(message.author.id),
            "user_name": message.author.display_name,
            "guild_id": str(guild_id) if guild_id else None,
            "message": message.content,
            "miku_response": response,
            "mood": get_current_mood(guild_id),
        }
        
        # Add to Cognee (async)
        await cognee.add([
            f"User {doc['user_name']} said: {doc['message']}",
            f"Miku responded: {doc['miku_response']}"
        ])
        
        # Process into knowledge graph
        await cognee.cognify()
        
        print(f"✅ Ingested to Cognee: {message.id}")
        
    except Exception as e:
        print(f"⚠️ Cognee ingestion failed (non-critical): {e}")

2. Query Enhancement (Conditional)

Location: bot/utils/llm.py - query_llama function

async def query_llama(user_prompt, user_id, guild_id=None, ...):
    # Get short-term context (always)
    short_term = conversation_history.format_for_llm(channel_id, max_messages=8)
    
    # Check if we need long-term memory
    long_term_context = None
    if should_query_long_term_memory(user_prompt, {"guild_id": guild_id}):
        try:
            # Query Cognee with timeout
            long_term_context = await asyncio.wait_for(
                cognee_integration.search_long_term_memory(user_prompt, user_id, guild_id),
                timeout=0.15  # 150ms max
            )
        except asyncio.TimeoutError:
            print("⏱️ Long-term memory query timeout - proceeding without")
        except Exception as e:
            print(f"⚠️ Long-term memory error: {e}")
    
    # Build messages for LLM
    messages = short_term  # Always use short-term
    
    # Inject long-term context if available
    if long_term_context:
        messages.insert(0, {
            "role": "system",
            "content": f"[Long-term memory context]: {long_term_context}"
        })
    
    # ... rest of existing LLM query code ...

3. Autonomous Actions Integration

Location: bot/utils/autonomous.py

async def autonomous_tick_v2(guild_id: int):
    """Enhanced with long-term memory awareness."""
    
    # Get decision from autonomous engine (existing fast logic)
    action_type = autonomous_engine.should_take_action(guild_id)
    
    if action_type is None:
        return
    
    # ENHANCEMENT: Check if action should use long-term context
    context = {}
    
    if action_type in ["engage_user", "join_conversation"]:
        # Get recent server activity from Cognee
        try:
            context["recent_topics"] = await asyncio.wait_for(
                cognee_integration.get_recent_topics(guild_id, hours=24),
                timeout=0.1  # 100ms max - this is background
            )
        except asyncio.TimeoutError:
            pass  # Proceed without - autonomous actions are best-effort
    
    # Execute action with enhanced context
    if action_type == "engage_user":
        await miku_engage_random_user_for_server(guild_id, context=context)
    
    # ... rest of existing action execution ...

4. User Preference Tracking

New Feature: Learn user preferences over time

# bot/utils/cognee_integration.py

async def extract_and_store_preferences(message, response):
    """
    Extract user preferences from conversations and store in Cognee.
    Runs in background - doesn't block responses.
    """
    # Simple heuristic extraction (can be enhanced with LLM later)
    preferences = extract_preferences_simple(message.content)
    
    if preferences:
        for pref in preferences:
            await cognee.add([{
                "type": "user_preference",
                "user_id": str(message.author.id),
                "preference": pref["category"],
                "value": pref["value"],
                "context": message.content[:200],
                "timestamp": datetime.now().isoformat()
            }])

def extract_preferences_simple(text: str) -> list:
    """Fast pattern matching for common preferences."""
    prefs = []
    text_lower = text.lower()
    
    # Pattern: "I love/like/prefer X"
    if "i love" in text_lower or "i like" in text_lower:
        # Extract what they love/like
        # ... simple parsing logic ...
        pass
    
    # Pattern: "my favorite X is Y"
    if "favorite" in text_lower:
        # ... extraction logic ...
        pass
    
    return prefs

Docker Compose Integration

Add Cognee Services

# Add to docker-compose.yml

  cognee-db:
    image: postgres:15-alpine
    container_name: cognee-db
    environment:
      - POSTGRES_USER=cognee
      - POSTGRES_PASSWORD=cognee_pass
      - POSTGRES_DB=cognee
    volumes:
      - cognee_postgres_data:/var/lib/postgresql/data
    restart: unless-stopped
    profiles:
      - cognee  # Optional profile - enable with --profile cognee

  cognee-neo4j:
    image: neo4j:5-community
    container_name: cognee-neo4j
    environment:
      - NEO4J_AUTH=neo4j/cognee_pass
      - NEO4J_PLUGINS=["apoc"]
    ports:
      - "7474:7474"  # Neo4j Browser (optional)
      - "7687:7687"  # Bolt protocol
    volumes:
      - cognee_neo4j_data:/data
    restart: unless-stopped
    profiles:
      - cognee

volumes:
  cognee_postgres_data:
  cognee_neo4j_data:

Update Miku Bot Service

  miku-bot:
    # ... existing config ...
    environment:
      # ... existing env vars ...
      - COGNEE_ENABLED=true
      - COGNEE_DB_URL=postgresql://cognee:cognee_pass@cognee-db:5432/cognee
      - COGNEE_NEO4J_URL=bolt://cognee-neo4j:7687
      - COGNEE_NEO4J_USER=neo4j
      - COGNEE_NEO4J_PASSWORD=cognee_pass
    depends_on:
      - llama-swap
      - cognee-db
      - cognee-neo4j

Performance Benchmarks (Estimated)

Without Cognee (Current)

User message → Discord event → Short-term lookup (5ms) → LLM query (2000ms) → Response
Total: ~2005ms (LLM dominates)

With Cognee (Instant Mode - No long-term query)

User message → Discord event → Short-term lookup (5ms) → LLM query (2000ms) → Response
Background: Cognee ingestion (150ms) - non-blocking
Total: ~2005ms (no change - ingestion is background)

With Cognee (Deep Memory Mode - User asks about past)

User message → Discord event → Short-term (5ms) + Long-term query (150ms) → LLM query (2000ms) → Response
Total: ~2155ms (+150ms overhead, but only when explicitly needed)

Autonomous Actions (Background)

Autonomous tick → Decision (5ms) → Get topics from Cognee (100ms) → Generate message (2000ms) → Post
Total: ~2105ms (+100ms, but autonomous actions are already async)

Feature Enhancements Enabled by Cognee

1. User Memory

# User asks: "What's my favorite anime?"
# Cognee searches: All messages from user mentioning "favorite" + "anime"
# Returns: "You mentioned loving Steins;Gate in a conversation 3 weeks ago"

2. Topic Trends

# Autonomous action: Join conversation
# Cognee query: "What topics have been trending in this server this week?"
# Returns: ["gaming", "anime recommendations", "music production"]
# Miku: "I've noticed you all have been talking about anime a lot lately! Any good recommendations?"

3. Relationship Tracking

# Knowledge graph tracks:
# User A → likes → "cats"
# User B → dislikes → "cats"
# User A → friends_with → User B

# When Miku talks to both: Avoids cat topics to prevent friction

4. Event Recall

# User: "Remember when we talked about that concert?"
# Cognee searches: Conversations with this user + keyword "concert"
# Returns: "Yes! You were excited about the Miku Expo in Los Angeles in July!"

5. Mood Pattern Analysis

# Query Cognee: "When does this server get most active?"
# Returns: "Evenings between 7-10 PM, discussions about gaming"
# Autonomous engine: Schedule more engagement during peak times

Implementation Phases

Phase 1: Foundation (Week 1)

Add Cognee to requirements.txt
Create bot/utils/cognee_integration.py
Set up Docker services (PostgreSQL, Neo4j)
Basic initialization and health checks
Test ingestion in background (non-blocking)

Phase 2: Basic Integration (Week 2)

Add background ingestion to on_message
Implement should_query_long_term_memory() heuristics
Add conditional long-term queries to query_llama()
Add caching layer
Monitor latency impact

Phase 3: Advanced Features (Week 3)

User preference extraction
Topic trend analysis for autonomous actions
Relationship tracking between users
Event recall capabilities

Phase 4: Optimization (Week 4)

Fine-tune timeout thresholds
Implement smart caching strategies
Add Cognee query statistics to dashboard
Performance benchmarking and tuning

Configuration Management

Keep JSON Files (Hot Config)

# These remain JSON for instant access:
- servers_config.json       # Current mood, sleep state, settings
- autonomous_context.json   # Real-time autonomous state
- blocked_users.json        # Security/moderation
- figurine_subscribers.json # Active subscriptions

# Reason: Need instant read/write, changed frequently

Migrate to Cognee (Historical Data)

# These can move to Cognee over time:
- Full DM history (dms/*.json) → Cognee knowledge graph
- Profile picture metadata → Cognee (searchable by mood)
- Reaction logs → Cognee (analyze patterns)

# Reason: Historical, queried infrequently, benefit from graph relationships

Hybrid Approach

// servers_config.json - Keep recent data
{
  "guild_id": 123,
  "current_mood": "bubbly",
  "is_sleeping": false,
  "recent_topics": ["cached", "from", "cognee"]  // Cache Cognee query results
}

Monitoring & Observability

Add Performance Tracking

# bot/utils/cognee_integration.py

import time
from dataclasses import dataclass
from typing import Optional

@dataclass
class CogneeMetrics:
    """Track Cognee performance."""
    total_queries: int = 0
    cache_hits: int = 0
    cache_misses: int = 0
    avg_query_time: float = 0.0
    timeouts: int = 0
    errors: int = 0
    background_ingestions: int = 0

cognee_metrics = CogneeMetrics()

async def search_long_term_memory(query: str, user_id: str, guild_id: Optional[int]) -> str:
    """Search with metrics tracking."""
    start = time.time()
    cognee_metrics.total_queries += 1
    
    try:
        result = await cached_cognee_search(query)
        
        elapsed = time.time() - start
        cognee_metrics.avg_query_time = (
            (cognee_metrics.avg_query_time * (cognee_metrics.total_queries - 1) + elapsed) 
            / cognee_metrics.total_queries
        )
        
        return result
        
    except asyncio.TimeoutError:
        cognee_metrics.timeouts += 1
        raise
    except Exception as e:
        cognee_metrics.errors += 1
        raise

Dashboard Integration

Add to bot/api.py:

@app.get("/cognee/metrics")
def get_cognee_metrics():
    """Get Cognee performance metrics."""
    from utils.cognee_integration import cognee_metrics
    
    return {
        "enabled": globals.COGNEE_ENABLED,
        "total_queries": cognee_metrics.total_queries,
        "cache_hit_rate": (
            cognee_metrics.cache_hits / cognee_metrics.total_queries 
            if cognee_metrics.total_queries > 0 else 0
        ),
        "avg_query_time_ms": cognee_metrics.avg_query_time * 1000,
        "timeouts": cognee_metrics.timeouts,
        "errors": cognee_metrics.errors,
        "background_ingestions": cognee_metrics.background_ingestions
    }

Risk Mitigation

Risk 1: Cognee Service Failure

Mitigation: Graceful degradation

if not cognee_available():
    # Fall back to short-term memory only
    # Bot continues functioning normally
    return short_term_context_only

Risk 2: Increased Latency

Mitigation: Aggressive timeouts + caching

MAX_COGNEE_QUERY_TIME = 150  # ms
# If timeout, proceed without long-term context

Risk 3: Storage Growth

Mitigation: Data retention policies

# Auto-cleanup old data from Cognee
# Keep: Last 90 days of conversations
# Archive: Older data to cold storage

Risk 4: Context Pollution

Mitigation: Relevance scoring

# Only inject Cognee results if confidence > 0.7
if cognee_result.score < 0.7:
    # Too irrelevant - don't add to context
    pass

Cost-Benefit Analysis

Benefits

✅ Deep Memory: Recall conversations from weeks/months ago ✅ User Preferences: Remember what users like/dislike ✅ Smarter Autonomous: Context-aware engagement ✅ Relationship Graph: Understand user dynamics ✅ No User Impact: Background ingestion, conditional queries ✅ Scalable: Handles unlimited conversation history

Costs

⚠️ Complexity: +2 services (PostgreSQL, Neo4j) ⚠️ Storage: ~100MB-1GB per month (depending on activity) ⚠️ Latency: +50-150ms when querying (conditional) ⚠️ Memory: +500MB RAM for Neo4j, +200MB for PostgreSQL ⚠️ Maintenance: Additional service to monitor

Verdict

✅ Worth it if:

Your servers have active, long-running conversations
Users want Miku to remember personal details
You want smarter autonomous behavior based on trends

❌ Skip it if:

Conversations are mostly one-off interactions
Current 8-message context is sufficient
Hardware resources are limited

Quick Start Commands

1. Enable Cognee

# Start with Cognee services
docker-compose --profile cognee up -d

# Check Cognee health
docker-compose logs cognee-neo4j
docker-compose logs cognee-db

2. Test Integration

# In Discord, test long-term memory:
User: "Remember that I love cats"
Miku: "Got it! I'll remember that you love cats! 🐱"

# Later...
User: "What do I love?"
Miku: "You told me you love cats! 🐱"

3. Monitor Performance

# Check metrics via API
curl http://localhost:3939/cognee/metrics

# View Cognee dashboard (optional)
# Open browser: http://localhost:7474 (Neo4j Browser)

Conclusion

Recommended Approach: Implement Phase 1-2 first, then evaluate based on real usage patterns.

Expected Latency Impact:

95% of messages: 0ms (background ingestion only)
5% of messages: +50-150ms (when long-term memory explicitly needed)

Key Success Factors:

✅ Keep JSON configs for hot data
✅ Background ingestion (non-blocking)
✅ Conditional long-term queries only
✅ Aggressive timeouts (150ms max)
✅ Caching layer for repeated queries
✅ Graceful degradation on failure

This hybrid approach gives you deep memory capabilities without sacrificing the snappy response times users expect from Discord bots.

23 KiB Raw Blame History