Configuration: - .env.example, .gitignore, compose.yml (main docker compose) - docker-compose-amd.yml (ROCm), docker-compose-macos.yml - start.sh, stop.sh convenience scripts - LICENSE (Apache 2.0, from upstream Cheshire Cat) Memory management utilities: - analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py - check_memories.py, extract_declarative_facts.py, store_declarative_facts.py - compare_systems.py (system comparison tool) - benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py Test suite: - quick_test.py, test_setup.py, test_setup_simple.py - test_consolidation_direct.py, test_declarative_recall.py, test_recall.py - test_end_to_end.py, test_full_pipeline.py - test_phase2.py, test_phase2_comprehensive.py Documentation: - README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md - PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md - POST_OPTIMIZATION_ANALYSIS.md
203 lines
4.9 KiB
Markdown
203 lines
4.9 KiB
Markdown
# Cheshire Cat Test Environment for Miku Bot
|
|
|
|
This is a standalone test environment for evaluating Cheshire Cat AI as a potential memory/context system for the Miku Discord bot.
|
|
|
|
## 🎯 Goals
|
|
|
|
1. **Test performance** - Measure latency, overhead, and real-time viability
|
|
2. **Evaluate memory** - Compare RAG-based context retrieval vs full context loading
|
|
3. **Benchmark CPU impact** - Assess performance on AMD FX-6100
|
|
4. **Make informed decision** - Data-driven choice on integration
|
|
|
|
## 📁 Directory Structure
|
|
|
|
```
|
|
cheshire-cat/
|
|
├── cat/ # Cat data (created on first run)
|
|
│ ├── data/ # Cat's internal data
|
|
│ ├── plugins/ # Custom plugins
|
|
│ ├── static/ # Static assets
|
|
│ └── long_term_memory/ # Qdrant vector storage
|
|
├── .env # Environment configuration
|
|
├── docker-compose.test.yml # Docker setup
|
|
├── test_setup.py # Initial setup script
|
|
├── benchmark_cat.py # Comprehensive benchmarks
|
|
├── compare_systems.py # Compare Cat vs current system
|
|
└── TEST_README.md # This file
|
|
```
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### 1. Prerequisites
|
|
|
|
- Docker and Docker Compose installed
|
|
- Miku bot's llama-swap service running
|
|
- Python 3.8+ with requests library
|
|
|
|
```bash
|
|
pip3 install requests
|
|
```
|
|
|
|
### 2. Start Cheshire Cat
|
|
|
|
```bash
|
|
# From the cheshire-cat directory
|
|
docker-compose -f docker-compose.test.yml up -d
|
|
```
|
|
|
|
Wait ~30 seconds for services to start.
|
|
|
|
### 3. Configure and Test
|
|
|
|
```bash
|
|
# Run setup script (configures LLM, uploads knowledge base)
|
|
python3 test_setup.py
|
|
```
|
|
|
|
This will:
|
|
- ✅ Wait for Cat to be ready
|
|
- ✅ Configure Cat to use llama-swap
|
|
- ✅ Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
|
|
- ✅ Run test queries
|
|
|
|
### 4. Run Benchmarks
|
|
|
|
```bash
|
|
# Comprehensive performance benchmark
|
|
python3 benchmark_cat.py
|
|
```
|
|
|
|
This tests:
|
|
- Simple greetings (low complexity)
|
|
- Factual queries (medium complexity)
|
|
- Memory recall (high complexity)
|
|
- Voice chat simulation (rapid-fire queries)
|
|
|
|
### 5. Compare with Current System
|
|
|
|
```bash
|
|
# Side-by-side comparison
|
|
python3 compare_systems.py
|
|
```
|
|
|
|
Compares latency between:
|
|
- 🐱 Cheshire Cat (RAG-based context)
|
|
- 📦 Current system (full context loading)
|
|
|
|
## 🔍 What to Look For
|
|
|
|
### ✅ Good Signs (Proceed with Integration)
|
|
|
|
- Mean latency < 1500ms
|
|
- P95 latency < 2000ms
|
|
- Consistent performance across query types
|
|
- RAG retrieves relevant context accurately
|
|
|
|
### ⚠️ Warning Signs (Reconsider)
|
|
|
|
- Mean latency > 2000ms
|
|
- High variance (large stdev)
|
|
- RAG misses important context
|
|
- Frequent errors or timeouts
|
|
|
|
### ❌ Stop Signs (Don't Use)
|
|
|
|
- Mean latency > 3000ms
|
|
- P95 latency > 5000ms
|
|
- RAG retrieval quality is poor
|
|
- System crashes or hangs
|
|
|
|
## 📊 Understanding the Results
|
|
|
|
### Latency Metrics
|
|
|
|
- **Mean**: Average response time
|
|
- **Median**: Middle value (less affected by outliers)
|
|
- **P95**: 95% of queries are faster than this
|
|
- **P99**: 99% of queries are faster than this
|
|
|
|
### Voice Chat Target
|
|
|
|
For real-time voice chat:
|
|
- Target: < 2000ms total latency
|
|
- Acceptable: 1000-1500ms mean
|
|
- Borderline: 1500-2000ms mean
|
|
- Too slow: > 2000ms mean
|
|
|
|
### FX-6100 Considerations
|
|
|
|
Your CPU may add overhead:
|
|
- Embedding generation: ~600ms
|
|
- Vector search: ~100-200ms
|
|
- Total Cat overhead: ~800ms
|
|
|
|
**With GPU embeddings**, this drops to ~250ms.
|
|
|
|
## 🛠️ Troubleshooting
|
|
|
|
### Cat won't start
|
|
|
|
```bash
|
|
# Check logs
|
|
docker logs miku_cheshire_cat_test
|
|
|
|
# Check if ports are in use
|
|
sudo netstat -tlnp | grep 1865
|
|
```
|
|
|
|
### Can't connect to llama-swap
|
|
|
|
The compose file tries to connect via:
|
|
1. External network: `miku-discord_default`
|
|
2. Host network: `host.docker.internal`
|
|
|
|
If both fail, check llama-swap URL in test_setup.py and adjust.
|
|
|
|
### Embeddings are slow
|
|
|
|
Try GPU acceleration in docker-compose.test.yml (requires spare VRAM).
|
|
|
|
### Knowledge upload fails
|
|
|
|
Upload files manually via admin panel:
|
|
- http://localhost:1865/admin
|
|
- Go to "Rabbit Hole" tab
|
|
- Drag and drop files
|
|
|
|
## 🔗 Useful Endpoints
|
|
|
|
- **Admin Panel**: http://localhost:1865/admin
|
|
- **API Docs**: http://localhost:1865/docs
|
|
- **Qdrant Dashboard**: http://localhost:6333/dashboard
|
|
- **Health Check**: http://localhost:1865/
|
|
|
|
## 📝 Decision Criteria
|
|
|
|
After running benchmarks, consider:
|
|
|
|
| Metric | Target | Your Result |
|
|
|--------|--------|-------------|
|
|
| Mean latency | < 1500ms | _____ ms |
|
|
| P95 latency | < 2000ms | _____ ms |
|
|
| Success rate | > 95% | _____ % |
|
|
| RAG accuracy | Good | _____ |
|
|
|
|
**Decision:**
|
|
- ✅ All targets met → **Integrate with bot**
|
|
- ⚠️ Some targets met → **Try GPU embeddings or hybrid approach**
|
|
- ❌ Targets not met → **Stick with current system**
|
|
|
|
## 🧹 Cleanup
|
|
|
|
```bash
|
|
# Stop services
|
|
docker-compose -f docker-compose.test.yml down
|
|
|
|
# Remove volumes (deletes all data)
|
|
docker-compose -f docker-compose.test.yml down -v
|
|
```
|
|
|
|
---
|
|
|
|
**Remember**: This is a test environment. Don't integrate with production bot until you're confident in the results!
|