add: cheshire-cat configuration, tooling, tests, and documentation

Configuration: - .env.example, .gitignore, compose.yml (main docker compose) - docker-compose-amd.yml (ROCm), docker-compose-macos.yml - start.sh, stop.sh convenience scripts - LICENSE (Apache 2.0, from upstream Cheshire Cat) Memory management utilities: - analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py - check_memories.py, extract_declarative_facts.py, store_declarative_facts.py - compare_systems.py (system comparison tool) - benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py Test suite: - quick_test.py, test_setup.py, test_setup_simple.py - test_consolidation_direct.py, test_declarative_recall.py, test_recall.py - test_end_to_end.py, test_full_pipeline.py - test_phase2.py, test_phase2_comprehensive.py Documentation: - README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md - PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md - POST_OPTIMIZATION_ANALYSIS.md
2026-03-04 00:51:14 +02:00
parent eafab336b4
commit ae1e0aa144
35 changed files with 6055 additions and 0 deletions
--- a/cheshire-cat/TEST_README.md
+++ b/cheshire-cat/TEST_README.md
@@ -0,0 +1,202 @@
+# Cheshire Cat Test Environment for Miku Bot
+
+This is a standalone test environment for evaluating Cheshire Cat AI as a potential memory/context system for the Miku Discord bot.
+
+## 🎯 Goals
+
+1. **Test performance** - Measure latency, overhead, and real-time viability
+2. **Evaluate memory** - Compare RAG-based context retrieval vs full context loading
+3. **Benchmark CPU impact** - Assess performance on AMD FX-6100
+4. **Make informed decision** - Data-driven choice on integration
+
+## 📁 Directory Structure
+
+```
+cheshire-cat/
+├── cat/                    # Cat data (created on first run)
+│   ├── data/              # Cat's internal data
+│   ├── plugins/           # Custom plugins
+│   ├── static/            # Static assets
+│   └── long_term_memory/  # Qdrant vector storage
+├── .env                    # Environment configuration
+├── docker-compose.test.yml # Docker setup
+├── test_setup.py          # Initial setup script
+├── benchmark_cat.py       # Comprehensive benchmarks
+├── compare_systems.py     # Compare Cat vs current system
+└── TEST_README.md         # This file
+```
+
+## 🚀 Quick Start
+
+### 1. Prerequisites
+
+- Docker and Docker Compose installed
+- Miku bot's llama-swap service running
+- Python 3.8+ with requests library
+
+```bash
+pip3 install requests
+```
+
+### 2. Start Cheshire Cat
+
+```bash
+# From the cheshire-cat directory
+docker-compose -f docker-compose.test.yml up -d
+```
+
+Wait ~30 seconds for services to start.
+
+### 3. Configure and Test
+
+```bash
+# Run setup script (configures LLM, uploads knowledge base)
+python3 test_setup.py
+```
+
+This will:
+- ✅ Wait for Cat to be ready
+- ✅ Configure Cat to use llama-swap
+- ✅ Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
+- ✅ Run test queries
+
+### 4. Run Benchmarks
+
+```bash
+# Comprehensive performance benchmark
+python3 benchmark_cat.py
+```
+
+This tests:
+- Simple greetings (low complexity)
+- Factual queries (medium complexity)
+- Memory recall (high complexity)
+- Voice chat simulation (rapid-fire queries)
+
+### 5. Compare with Current System
+
+```bash
+# Side-by-side comparison
+python3 compare_systems.py
+```
+
+Compares latency between:
+- 🐱 Cheshire Cat (RAG-based context)
+- 📦 Current system (full context loading)
+
+## 🔍 What to Look For
+
+### ✅ Good Signs (Proceed with Integration)
+
+- Mean latency < 1500ms
+- P95 latency < 2000ms
+- Consistent performance across query types
+- RAG retrieves relevant context accurately
+
+### ⚠️ Warning Signs (Reconsider)
+
+- Mean latency > 2000ms
+- High variance (large stdev)
+- RAG misses important context
+- Frequent errors or timeouts
+
+### ❌ Stop Signs (Don't Use)
+
+- Mean latency > 3000ms
+- P95 latency > 5000ms
+- RAG retrieval quality is poor
+- System crashes or hangs
+
+## 📊 Understanding the Results
+
+### Latency Metrics
+
+- **Mean**: Average response time
+- **Median**: Middle value (less affected by outliers)
+- **P95**: 95% of queries are faster than this
+- **P99**: 99% of queries are faster than this
+
+### Voice Chat Target
+
+For real-time voice chat:
+- Target: < 2000ms total latency
+- Acceptable: 1000-1500ms mean
+- Borderline: 1500-2000ms mean
+- Too slow: > 2000ms mean
+
+### FX-6100 Considerations
+
+Your CPU may add overhead:
+- Embedding generation: ~600ms
+- Vector search: ~100-200ms
+- Total Cat overhead: ~800ms
+
+**With GPU embeddings**, this drops to ~250ms.
+
+## 🛠️ Troubleshooting
+
+### Cat won't start
+
+```bash
+# Check logs
+docker logs miku_cheshire_cat_test
+
+# Check if ports are in use
+sudo netstat -tlnp | grep 1865
+```
+
+### Can't connect to llama-swap
+
+The compose file tries to connect via:
+1. External network: `miku-discord_default`
+2. Host network: `host.docker.internal`
+
+If both fail, check llama-swap URL in test_setup.py and adjust.
+
+### Embeddings are slow
+
+Try GPU acceleration in docker-compose.test.yml (requires spare VRAM).
+
+### Knowledge upload fails
+
+Upload files manually via admin panel:
+- http://localhost:1865/admin
+- Go to "Rabbit Hole" tab
+- Drag and drop files
+
+## 🔗 Useful Endpoints
+
+- **Admin Panel**: http://localhost:1865/admin
+- **API Docs**: http://localhost:1865/docs
+- **Qdrant Dashboard**: http://localhost:6333/dashboard
+- **Health Check**: http://localhost:1865/
+
+## 📝 Decision Criteria
+
+After running benchmarks, consider:
+
+| Metric | Target | Your Result |
+|--------|--------|-------------|
+| Mean latency | < 1500ms | _____ ms |
+| P95 latency | < 2000ms | _____ ms |
+| Success rate | > 95% | _____ % |
+| RAG accuracy | Good | _____ |
+
+**Decision:**
+- ✅ All targets met → **Integrate with bot**
+- ⚠️ Some targets met → **Try GPU embeddings or hybrid approach**
+- ❌ Targets not met → **Stick with current system**
+
+## 🧹 Cleanup
+
+```bash
+# Stop services
+docker-compose -f docker-compose.test.yml down
+
+# Remove volumes (deletes all data)
+docker-compose -f docker-compose.test.yml down -v
+```
+
+---
+
+**Remember**: This is a test environment. Don't integrate with production bot until you're confident in the results!