add: cheshire-cat configuration, tooling, tests, and documentation

Configuration: - .env.example, .gitignore, compose.yml (main docker compose) - docker-compose-amd.yml (ROCm), docker-compose-macos.yml - start.sh, stop.sh convenience scripts - LICENSE (Apache 2.0, from upstream Cheshire Cat) Memory management utilities: - analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py - check_memories.py, extract_declarative_facts.py, store_declarative_facts.py - compare_systems.py (system comparison tool) - benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py Test suite: - quick_test.py, test_setup.py, test_setup_simple.py - test_consolidation_direct.py, test_declarative_recall.py, test_recall.py - test_end_to_end.py, test_full_pipeline.py - test_phase2.py, test_phase2_comprehensive.py Documentation: - README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md - PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md - POST_OPTIMIZATION_ANALYSIS.md
2026-03-04 00:51:14 +02:00
parent eafab336b4
commit ae1e0aa144
35 changed files with 6055 additions and 0 deletions
--- a/cheshire-cat/SETUP_COMPLETE.md
+++ b/cheshire-cat/SETUP_COMPLETE.md
@@ -0,0 +1,226 @@
+# 🎉 Cheshire Cat Test Environment Setup Complete!
+
+## 📦 What Was Created
+
+A complete standalone testing environment for evaluating Cheshire Cat AI as a memory system for Miku Bot.
+
+### Files Created:
+
+1. **docker-compose.test.yml** - Docker services configuration
+   - Cheshire Cat Core (connected to llama-swap)
+   - Qdrant vector database
+   - Connected to your existing bot network
+
+2. **.env** - Environment configuration
+   - Core settings
+   - Qdrant settings
+   - Debug mode enabled
+
+3. **test_setup.py** - Automated setup script
+   - Configures Cat to use llama-swap
+   - Uploads Miku knowledge base
+   - Runs test queries
+
+4. **benchmark_cat.py** - Comprehensive performance testing
+   - Tests various query types
+   - Measures latency statistics
+   - Voice chat simulation
+   - Generates detailed reports
+
+5. **compare_systems.py** - Side-by-side comparison
+   - Compares Cat vs current system
+   - Direct performance comparison
+   - Latency analysis
+
+6. **start.sh** - Quick start script
+7. **stop.sh** - Quick stop script
+8. **TEST_README.md** - Full documentation
+
+## 🚀 Next Steps
+
+### Step 1: Start Services
+
+```bash
+./start.sh
+```
+
+Or manually:
+```bash
+docker-compose -f docker-compose.test.yml up -d
+```
+
+### Step 2: Configure and Upload Knowledge
+
+```bash
+python3 test_setup.py
+```
+
+This will:
+- Wait for Cat to be ready
+- Configure it to use your llama-swap
+- Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
+- Run initial test queries
+
+### Step 3: Run Benchmarks
+
+```bash
+python3 benchmark_cat.py
+```
+
+Expected runtime: ~10-15 minutes
+
+Look for:
+- Mean latency < 1500ms = Good for voice chat
+- P95 latency < 2000ms = Acceptable
+- Success rate > 95% = Reliable
+
+### Step 4: Compare Systems
+
+```bash
+python3 compare_systems.py
+```
+
+This compares Cat directly against your current query_llama() system.
+
+### Step 5: Analyze Results
+
+Review the output to decide:
+
+✅ **Proceed with integration** if:
+- Latency is acceptable (< 1500ms mean)
+- RAG retrieval is accurate
+- Performance is consistent
+
+⚠️ **Try optimizations** if:
+- Latency is borderline (1500-2000ms)
+- Consider GPU embeddings
+- Try hybrid approach
+
+❌ **Stick with current system** if:
+- Latency is too high (> 2000ms)
+- RAG quality is poor
+- Too many errors
+
+## 🔍 Monitoring
+
+### Check Service Status
+```bash
+docker ps | grep miku
+```
+
+### View Logs
+```bash
+docker logs miku_cheshire_cat_test -f
+docker logs miku_qdrant_test -f
+```
+
+### Access Interfaces
+- Admin Panel: http://localhost:1865/admin
+- API Docs: http://localhost:1865/docs
+- Qdrant: http://localhost:6333/dashboard
+
+## 📊 Key Metrics to Watch
+
+### From FX-6100 Analysis:
+
+Expected Cat overhead on your CPU:
+- **Embedding generation**: ~600ms (CPU-based)
+- **Vector search**: ~100-200ms
+- **Total overhead**: ~800ms
+
+With GPU embeddings (if spare VRAM):
+- **Total overhead**: ~250ms (much better!)
+
+### Voice Chat Viability
+
+Your current system: ~500-1500ms
+Target with Cat: < 1500ms mean latency
+
+If Cat adds ~800ms overhead:
+- Simple queries: 500ms + 800ms = 1300ms ✅ OK
+- Complex queries: 1500ms + 800ms = 2300ms ⚠️ Borderline
+
+**GPU embeddings would bring this to acceptable range.**
+
+## 🛠️ Troubleshooting
+
+### Can't connect to llama-swap?
+
+Edit `test_setup.py` line 10:
+```python
+# Try one of these:
+LLAMA_SWAP_URL = "http://llama-swap:8080/v1"  # Docker network
+LLAMA_SWAP_URL = "http://host.docker.internal:8080/v1"  # Host access
+LLAMA_SWAP_URL = "http://YOUR_IP:8080/v1"  # Direct IP
+```
+
+### Embeddings too slow?
+
+Try GPU acceleration:
+1. Edit `docker-compose.test.yml` to add GPU support
+2. Configure embedder to use CUDA in `test_setup.py`
+
+### Knowledge upload fails?
+
+Upload manually:
+- Go to http://localhost:1865/admin
+- Click "Rabbit Hole" tab
+- Drag and drop: miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
+
+## 🧹 Cleanup
+
+### Stop services (keep data):
+```bash
+./stop.sh
+```
+
+### Stop and remove all data:
+```bash
+docker-compose -f docker-compose.test.yml down -v
+```
+
+## 📈 Expected Results
+
+Based on your FX-6100 CPU:
+
+### Pessimistic (CPU embeddings):
+- Mean latency: 1600-2200ms
+- Suitable for text chat: ✅
+- Suitable for voice chat: ⚠️ Borderline
+
+### Optimistic (GPU embeddings):
+- Mean latency: 900-1400ms
+- Suitable for text chat: ✅
+- Suitable for voice chat: ✅
+
+## 🎯 Decision Matrix
+
+After benchmarking:
+
+| Scenario | Action |
+|----------|--------|
+| Mean < 1500ms, RAG accurate | ✅ **Integrate fully** |
+| Mean 1500-2000ms | ⚠️ **Try GPU embeddings** |
+| Mean > 2000ms | ⚠️ **Hybrid approach only** |
+| Mean > 3000ms | ❌ **Don't use** |
+
+## 📚 Documentation
+
+- Full guide: `TEST_README.md`
+- Original local-cat docs: `README.md`
+- Cheshire Cat docs: https://cheshire-cat-ai.github.io/docs/
+
+---
+
+## ✨ Summary
+
+You now have a complete, isolated testing environment to:
+1. ✅ Measure real performance on your FX-6100
+2. ✅ Compare against your current system
+3. ✅ Test RAG accuracy with Miku's knowledge
+4. ✅ Simulate voice chat workloads
+5. ✅ Make a data-driven decision
+
+**Ready to test? Run:** `./start.sh`
+
+Good luck! 🚀