add: cheshire-cat configuration, tooling, tests, and documentation

Configuration:
- .env.example, .gitignore, compose.yml (main docker compose)
- docker-compose-amd.yml (ROCm), docker-compose-macos.yml
- start.sh, stop.sh convenience scripts
- LICENSE (Apache 2.0, from upstream Cheshire Cat)

Memory management utilities:
- analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py
- check_memories.py, extract_declarative_facts.py, store_declarative_facts.py
- compare_systems.py (system comparison tool)
- benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py

Test suite:
- quick_test.py, test_setup.py, test_setup_simple.py
- test_consolidation_direct.py, test_declarative_recall.py, test_recall.py
- test_end_to_end.py, test_full_pipeline.py
- test_phase2.py, test_phase2_comprehensive.py

Documentation:
- README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md
- PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md
- POST_OPTIMIZATION_ANALYSIS.md
This commit is contained in:
2026-03-04 00:51:14 +02:00
parent eafab336b4
commit ae1e0aa144
35 changed files with 6055 additions and 0 deletions

202
cheshire-cat/TEST_README.md Normal file
View File

@@ -0,0 +1,202 @@
# Cheshire Cat Test Environment for Miku Bot
This is a standalone test environment for evaluating Cheshire Cat AI as a potential memory/context system for the Miku Discord bot.
## 🎯 Goals
1. **Test performance** - Measure latency, overhead, and real-time viability
2. **Evaluate memory** - Compare RAG-based context retrieval vs full context loading
3. **Benchmark CPU impact** - Assess performance on AMD FX-6100
4. **Make informed decision** - Data-driven choice on integration
## 📁 Directory Structure
```
cheshire-cat/
├── cat/ # Cat data (created on first run)
│ ├── data/ # Cat's internal data
│ ├── plugins/ # Custom plugins
│ ├── static/ # Static assets
│ └── long_term_memory/ # Qdrant vector storage
├── .env # Environment configuration
├── docker-compose.test.yml # Docker setup
├── test_setup.py # Initial setup script
├── benchmark_cat.py # Comprehensive benchmarks
├── compare_systems.py # Compare Cat vs current system
└── TEST_README.md # This file
```
## 🚀 Quick Start
### 1. Prerequisites
- Docker and Docker Compose installed
- Miku bot's llama-swap service running
- Python 3.8+ with requests library
```bash
pip3 install requests
```
### 2. Start Cheshire Cat
```bash
# From the cheshire-cat directory
docker-compose -f docker-compose.test.yml up -d
```
Wait ~30 seconds for services to start.
### 3. Configure and Test
```bash
# Run setup script (configures LLM, uploads knowledge base)
python3 test_setup.py
```
This will:
- ✅ Wait for Cat to be ready
- ✅ Configure Cat to use llama-swap
- ✅ Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
- ✅ Run test queries
### 4. Run Benchmarks
```bash
# Comprehensive performance benchmark
python3 benchmark_cat.py
```
This tests:
- Simple greetings (low complexity)
- Factual queries (medium complexity)
- Memory recall (high complexity)
- Voice chat simulation (rapid-fire queries)
### 5. Compare with Current System
```bash
# Side-by-side comparison
python3 compare_systems.py
```
Compares latency between:
- 🐱 Cheshire Cat (RAG-based context)
- 📦 Current system (full context loading)
## 🔍 What to Look For
### ✅ Good Signs (Proceed with Integration)
- Mean latency < 1500ms
- P95 latency < 2000ms
- Consistent performance across query types
- RAG retrieves relevant context accurately
### ⚠️ Warning Signs (Reconsider)
- Mean latency > 2000ms
- High variance (large stdev)
- RAG misses important context
- Frequent errors or timeouts
### ❌ Stop Signs (Don't Use)
- Mean latency > 3000ms
- P95 latency > 5000ms
- RAG retrieval quality is poor
- System crashes or hangs
## 📊 Understanding the Results
### Latency Metrics
- **Mean**: Average response time
- **Median**: Middle value (less affected by outliers)
- **P95**: 95% of queries are faster than this
- **P99**: 99% of queries are faster than this
### Voice Chat Target
For real-time voice chat:
- Target: < 2000ms total latency
- Acceptable: 1000-1500ms mean
- Borderline: 1500-2000ms mean
- Too slow: > 2000ms mean
### FX-6100 Considerations
Your CPU may add overhead:
- Embedding generation: ~600ms
- Vector search: ~100-200ms
- Total Cat overhead: ~800ms
**With GPU embeddings**, this drops to ~250ms.
## 🛠️ Troubleshooting
### Cat won't start
```bash
# Check logs
docker logs miku_cheshire_cat_test
# Check if ports are in use
sudo netstat -tlnp | grep 1865
```
### Can't connect to llama-swap
The compose file tries to connect via:
1. External network: `miku-discord_default`
2. Host network: `host.docker.internal`
If both fail, check llama-swap URL in test_setup.py and adjust.
### Embeddings are slow
Try GPU acceleration in docker-compose.test.yml (requires spare VRAM).
### Knowledge upload fails
Upload files manually via admin panel:
- http://localhost:1865/admin
- Go to "Rabbit Hole" tab
- Drag and drop files
## 🔗 Useful Endpoints
- **Admin Panel**: http://localhost:1865/admin
- **API Docs**: http://localhost:1865/docs
- **Qdrant Dashboard**: http://localhost:6333/dashboard
- **Health Check**: http://localhost:1865/
## 📝 Decision Criteria
After running benchmarks, consider:
| Metric | Target | Your Result |
|--------|--------|-------------|
| Mean latency | < 1500ms | _____ ms |
| P95 latency | < 2000ms | _____ ms |
| Success rate | > 95% | _____ % |
| RAG accuracy | Good | _____ |
**Decision:**
- ✅ All targets met → **Integrate with bot**
- ⚠️ Some targets met → **Try GPU embeddings or hybrid approach**
- ❌ Targets not met → **Stick with current system**
## 🧹 Cleanup
```bash
# Stop services
docker-compose -f docker-compose.test.yml down
# Remove volumes (deletes all data)
docker-compose -f docker-compose.test.yml down -v
```
---
**Remember**: This is a test environment. Don't integrate with production bot until you're confident in the results!