add: cheshire-cat configuration, tooling, tests, and documentation
Configuration: - .env.example, .gitignore, compose.yml (main docker compose) - docker-compose-amd.yml (ROCm), docker-compose-macos.yml - start.sh, stop.sh convenience scripts - LICENSE (Apache 2.0, from upstream Cheshire Cat) Memory management utilities: - analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py - check_memories.py, extract_declarative_facts.py, store_declarative_facts.py - compare_systems.py (system comparison tool) - benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py Test suite: - quick_test.py, test_setup.py, test_setup_simple.py - test_consolidation_direct.py, test_declarative_recall.py, test_recall.py - test_end_to_end.py, test_full_pipeline.py - test_phase2.py, test_phase2_comprehensive.py Documentation: - README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md - PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md - POST_OPTIMIZATION_ANALYSIS.md
This commit is contained in:
202
cheshire-cat/TEST_README.md
Normal file
202
cheshire-cat/TEST_README.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Cheshire Cat Test Environment for Miku Bot
|
||||
|
||||
This is a standalone test environment for evaluating Cheshire Cat AI as a potential memory/context system for the Miku Discord bot.
|
||||
|
||||
## 🎯 Goals
|
||||
|
||||
1. **Test performance** - Measure latency, overhead, and real-time viability
|
||||
2. **Evaluate memory** - Compare RAG-based context retrieval vs full context loading
|
||||
3. **Benchmark CPU impact** - Assess performance on AMD FX-6100
|
||||
4. **Make informed decision** - Data-driven choice on integration
|
||||
|
||||
## 📁 Directory Structure
|
||||
|
||||
```
|
||||
cheshire-cat/
|
||||
├── cat/ # Cat data (created on first run)
|
||||
│ ├── data/ # Cat's internal data
|
||||
│ ├── plugins/ # Custom plugins
|
||||
│ ├── static/ # Static assets
|
||||
│ └── long_term_memory/ # Qdrant vector storage
|
||||
├── .env # Environment configuration
|
||||
├── docker-compose.test.yml # Docker setup
|
||||
├── test_setup.py # Initial setup script
|
||||
├── benchmark_cat.py # Comprehensive benchmarks
|
||||
├── compare_systems.py # Compare Cat vs current system
|
||||
└── TEST_README.md # This file
|
||||
```
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### 1. Prerequisites
|
||||
|
||||
- Docker and Docker Compose installed
|
||||
- Miku bot's llama-swap service running
|
||||
- Python 3.8+ with requests library
|
||||
|
||||
```bash
|
||||
pip3 install requests
|
||||
```
|
||||
|
||||
### 2. Start Cheshire Cat
|
||||
|
||||
```bash
|
||||
# From the cheshire-cat directory
|
||||
docker-compose -f docker-compose.test.yml up -d
|
||||
```
|
||||
|
||||
Wait ~30 seconds for services to start.
|
||||
|
||||
### 3. Configure and Test
|
||||
|
||||
```bash
|
||||
# Run setup script (configures LLM, uploads knowledge base)
|
||||
python3 test_setup.py
|
||||
```
|
||||
|
||||
This will:
|
||||
- ✅ Wait for Cat to be ready
|
||||
- ✅ Configure Cat to use llama-swap
|
||||
- ✅ Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
|
||||
- ✅ Run test queries
|
||||
|
||||
### 4. Run Benchmarks
|
||||
|
||||
```bash
|
||||
# Comprehensive performance benchmark
|
||||
python3 benchmark_cat.py
|
||||
```
|
||||
|
||||
This tests:
|
||||
- Simple greetings (low complexity)
|
||||
- Factual queries (medium complexity)
|
||||
- Memory recall (high complexity)
|
||||
- Voice chat simulation (rapid-fire queries)
|
||||
|
||||
### 5. Compare with Current System
|
||||
|
||||
```bash
|
||||
# Side-by-side comparison
|
||||
python3 compare_systems.py
|
||||
```
|
||||
|
||||
Compares latency between:
|
||||
- 🐱 Cheshire Cat (RAG-based context)
|
||||
- 📦 Current system (full context loading)
|
||||
|
||||
## 🔍 What to Look For
|
||||
|
||||
### ✅ Good Signs (Proceed with Integration)
|
||||
|
||||
- Mean latency < 1500ms
|
||||
- P95 latency < 2000ms
|
||||
- Consistent performance across query types
|
||||
- RAG retrieves relevant context accurately
|
||||
|
||||
### ⚠️ Warning Signs (Reconsider)
|
||||
|
||||
- Mean latency > 2000ms
|
||||
- High variance (large stdev)
|
||||
- RAG misses important context
|
||||
- Frequent errors or timeouts
|
||||
|
||||
### ❌ Stop Signs (Don't Use)
|
||||
|
||||
- Mean latency > 3000ms
|
||||
- P95 latency > 5000ms
|
||||
- RAG retrieval quality is poor
|
||||
- System crashes or hangs
|
||||
|
||||
## 📊 Understanding the Results
|
||||
|
||||
### Latency Metrics
|
||||
|
||||
- **Mean**: Average response time
|
||||
- **Median**: Middle value (less affected by outliers)
|
||||
- **P95**: 95% of queries are faster than this
|
||||
- **P99**: 99% of queries are faster than this
|
||||
|
||||
### Voice Chat Target
|
||||
|
||||
For real-time voice chat:
|
||||
- Target: < 2000ms total latency
|
||||
- Acceptable: 1000-1500ms mean
|
||||
- Borderline: 1500-2000ms mean
|
||||
- Too slow: > 2000ms mean
|
||||
|
||||
### FX-6100 Considerations
|
||||
|
||||
Your CPU may add overhead:
|
||||
- Embedding generation: ~600ms
|
||||
- Vector search: ~100-200ms
|
||||
- Total Cat overhead: ~800ms
|
||||
|
||||
**With GPU embeddings**, this drops to ~250ms.
|
||||
|
||||
## 🛠️ Troubleshooting
|
||||
|
||||
### Cat won't start
|
||||
|
||||
```bash
|
||||
# Check logs
|
||||
docker logs miku_cheshire_cat_test
|
||||
|
||||
# Check if ports are in use
|
||||
sudo netstat -tlnp | grep 1865
|
||||
```
|
||||
|
||||
### Can't connect to llama-swap
|
||||
|
||||
The compose file tries to connect via:
|
||||
1. External network: `miku-discord_default`
|
||||
2. Host network: `host.docker.internal`
|
||||
|
||||
If both fail, check llama-swap URL in test_setup.py and adjust.
|
||||
|
||||
### Embeddings are slow
|
||||
|
||||
Try GPU acceleration in docker-compose.test.yml (requires spare VRAM).
|
||||
|
||||
### Knowledge upload fails
|
||||
|
||||
Upload files manually via admin panel:
|
||||
- http://localhost:1865/admin
|
||||
- Go to "Rabbit Hole" tab
|
||||
- Drag and drop files
|
||||
|
||||
## 🔗 Useful Endpoints
|
||||
|
||||
- **Admin Panel**: http://localhost:1865/admin
|
||||
- **API Docs**: http://localhost:1865/docs
|
||||
- **Qdrant Dashboard**: http://localhost:6333/dashboard
|
||||
- **Health Check**: http://localhost:1865/
|
||||
|
||||
## 📝 Decision Criteria
|
||||
|
||||
After running benchmarks, consider:
|
||||
|
||||
| Metric | Target | Your Result |
|
||||
|--------|--------|-------------|
|
||||
| Mean latency | < 1500ms | _____ ms |
|
||||
| P95 latency | < 2000ms | _____ ms |
|
||||
| Success rate | > 95% | _____ % |
|
||||
| RAG accuracy | Good | _____ |
|
||||
|
||||
**Decision:**
|
||||
- ✅ All targets met → **Integrate with bot**
|
||||
- ⚠️ Some targets met → **Try GPU embeddings or hybrid approach**
|
||||
- ❌ Targets not met → **Stick with current system**
|
||||
|
||||
## 🧹 Cleanup
|
||||
|
||||
```bash
|
||||
# Stop services
|
||||
docker-compose -f docker-compose.test.yml down
|
||||
|
||||
# Remove volumes (deletes all data)
|
||||
docker-compose -f docker-compose.test.yml down -v
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Remember**: This is a test environment. Don't integrate with production bot until you're confident in the results!
|
||||
Reference in New Issue
Block a user