Files
miku-discord/cheshire-cat/SETUP_COMPLETE.md
koko210Serve ae1e0aa144 add: cheshire-cat configuration, tooling, tests, and documentation
Configuration:
- .env.example, .gitignore, compose.yml (main docker compose)
- docker-compose-amd.yml (ROCm), docker-compose-macos.yml
- start.sh, stop.sh convenience scripts
- LICENSE (Apache 2.0, from upstream Cheshire Cat)

Memory management utilities:
- analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py
- check_memories.py, extract_declarative_facts.py, store_declarative_facts.py
- compare_systems.py (system comparison tool)
- benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py

Test suite:
- quick_test.py, test_setup.py, test_setup_simple.py
- test_consolidation_direct.py, test_declarative_recall.py, test_recall.py
- test_end_to_end.py, test_full_pipeline.py
- test_phase2.py, test_phase2_comprehensive.py

Documentation:
- README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md
- PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md
- POST_OPTIMIZATION_ANALYSIS.md
2026-03-04 00:51:14 +02:00

227 lines
5.0 KiB
Markdown

# 🎉 Cheshire Cat Test Environment Setup Complete!
## 📦 What Was Created
A complete standalone testing environment for evaluating Cheshire Cat AI as a memory system for Miku Bot.
### Files Created:
1. **docker-compose.test.yml** - Docker services configuration
- Cheshire Cat Core (connected to llama-swap)
- Qdrant vector database
- Connected to your existing bot network
2. **.env** - Environment configuration
- Core settings
- Qdrant settings
- Debug mode enabled
3. **test_setup.py** - Automated setup script
- Configures Cat to use llama-swap
- Uploads Miku knowledge base
- Runs test queries
4. **benchmark_cat.py** - Comprehensive performance testing
- Tests various query types
- Measures latency statistics
- Voice chat simulation
- Generates detailed reports
5. **compare_systems.py** - Side-by-side comparison
- Compares Cat vs current system
- Direct performance comparison
- Latency analysis
6. **start.sh** - Quick start script
7. **stop.sh** - Quick stop script
8. **TEST_README.md** - Full documentation
## 🚀 Next Steps
### Step 1: Start Services
```bash
./start.sh
```
Or manually:
```bash
docker-compose -f docker-compose.test.yml up -d
```
### Step 2: Configure and Upload Knowledge
```bash
python3 test_setup.py
```
This will:
- Wait for Cat to be ready
- Configure it to use your llama-swap
- Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
- Run initial test queries
### Step 3: Run Benchmarks
```bash
python3 benchmark_cat.py
```
Expected runtime: ~10-15 minutes
Look for:
- Mean latency < 1500ms = Good for voice chat
- P95 latency < 2000ms = Acceptable
- Success rate > 95% = Reliable
### Step 4: Compare Systems
```bash
python3 compare_systems.py
```
This compares Cat directly against your current query_llama() system.
### Step 5: Analyze Results
Review the output to decide:
**Proceed with integration** if:
- Latency is acceptable (< 1500ms mean)
- RAG retrieval is accurate
- Performance is consistent
⚠️ **Try optimizations** if:
- Latency is borderline (1500-2000ms)
- Consider GPU embeddings
- Try hybrid approach
**Stick with current system** if:
- Latency is too high (> 2000ms)
- RAG quality is poor
- Too many errors
## 🔍 Monitoring
### Check Service Status
```bash
docker ps | grep miku
```
### View Logs
```bash
docker logs miku_cheshire_cat_test -f
docker logs miku_qdrant_test -f
```
### Access Interfaces
- Admin Panel: http://localhost:1865/admin
- API Docs: http://localhost:1865/docs
- Qdrant: http://localhost:6333/dashboard
## 📊 Key Metrics to Watch
### From FX-6100 Analysis:
Expected Cat overhead on your CPU:
- **Embedding generation**: ~600ms (CPU-based)
- **Vector search**: ~100-200ms
- **Total overhead**: ~800ms
With GPU embeddings (if spare VRAM):
- **Total overhead**: ~250ms (much better!)
### Voice Chat Viability
Your current system: ~500-1500ms
Target with Cat: < 1500ms mean latency
If Cat adds ~800ms overhead:
- Simple queries: 500ms + 800ms = 1300ms ✅ OK
- Complex queries: 1500ms + 800ms = 2300ms ⚠️ Borderline
**GPU embeddings would bring this to acceptable range.**
## 🛠️ Troubleshooting
### Can't connect to llama-swap?
Edit `test_setup.py` line 10:
```python
# Try one of these:
LLAMA_SWAP_URL = "http://llama-swap:8080/v1" # Docker network
LLAMA_SWAP_URL = "http://host.docker.internal:8080/v1" # Host access
LLAMA_SWAP_URL = "http://YOUR_IP:8080/v1" # Direct IP
```
### Embeddings too slow?
Try GPU acceleration:
1. Edit `docker-compose.test.yml` to add GPU support
2. Configure embedder to use CUDA in `test_setup.py`
### Knowledge upload fails?
Upload manually:
- Go to http://localhost:1865/admin
- Click "Rabbit Hole" tab
- Drag and drop: miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
## 🧹 Cleanup
### Stop services (keep data):
```bash
./stop.sh
```
### Stop and remove all data:
```bash
docker-compose -f docker-compose.test.yml down -v
```
## 📈 Expected Results
Based on your FX-6100 CPU:
### Pessimistic (CPU embeddings):
- Mean latency: 1600-2200ms
- Suitable for text chat: ✅
- Suitable for voice chat: ⚠️ Borderline
### Optimistic (GPU embeddings):
- Mean latency: 900-1400ms
- Suitable for text chat: ✅
- Suitable for voice chat: ✅
## 🎯 Decision Matrix
After benchmarking:
| Scenario | Action |
|----------|--------|
| Mean < 1500ms, RAG accurate | ✅ **Integrate fully** |
| Mean 1500-2000ms | ⚠️ **Try GPU embeddings** |
| Mean > 2000ms | ⚠️ **Hybrid approach only** |
| Mean > 3000ms | ❌ **Don't use** |
## 📚 Documentation
- Full guide: `TEST_README.md`
- Original local-cat docs: `README.md`
- Cheshire Cat docs: https://cheshire-cat-ai.github.io/docs/
---
## ✨ Summary
You now have a complete, isolated testing environment to:
1. ✅ Measure real performance on your FX-6100
2. ✅ Compare against your current system
3. ✅ Test RAG accuracy with Miku's knowledge
4. ✅ Simulate voice chat workloads
5. ✅ Make a data-driven decision
**Ready to test? Run:** `./start.sh`
Good luck! 🚀