add: cheshire-cat configuration, tooling, tests, and documentation
Configuration: - .env.example, .gitignore, compose.yml (main docker compose) - docker-compose-amd.yml (ROCm), docker-compose-macos.yml - start.sh, stop.sh convenience scripts - LICENSE (Apache 2.0, from upstream Cheshire Cat) Memory management utilities: - analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py - check_memories.py, extract_declarative_facts.py, store_declarative_facts.py - compare_systems.py (system comparison tool) - benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py Test suite: - quick_test.py, test_setup.py, test_setup_simple.py - test_consolidation_direct.py, test_declarative_recall.py, test_recall.py - test_end_to_end.py, test_full_pipeline.py - test_phase2.py, test_phase2_comprehensive.py Documentation: - README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md - PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md - POST_OPTIMIZATION_ANALYSIS.md
This commit is contained in:
226
cheshire-cat/SETUP_COMPLETE.md
Normal file
226
cheshire-cat/SETUP_COMPLETE.md
Normal file
@@ -0,0 +1,226 @@
|
||||
# 🎉 Cheshire Cat Test Environment Setup Complete!
|
||||
|
||||
## 📦 What Was Created
|
||||
|
||||
A complete standalone testing environment for evaluating Cheshire Cat AI as a memory system for Miku Bot.
|
||||
|
||||
### Files Created:
|
||||
|
||||
1. **docker-compose.test.yml** - Docker services configuration
|
||||
- Cheshire Cat Core (connected to llama-swap)
|
||||
- Qdrant vector database
|
||||
- Connected to your existing bot network
|
||||
|
||||
2. **.env** - Environment configuration
|
||||
- Core settings
|
||||
- Qdrant settings
|
||||
- Debug mode enabled
|
||||
|
||||
3. **test_setup.py** - Automated setup script
|
||||
- Configures Cat to use llama-swap
|
||||
- Uploads Miku knowledge base
|
||||
- Runs test queries
|
||||
|
||||
4. **benchmark_cat.py** - Comprehensive performance testing
|
||||
- Tests various query types
|
||||
- Measures latency statistics
|
||||
- Voice chat simulation
|
||||
- Generates detailed reports
|
||||
|
||||
5. **compare_systems.py** - Side-by-side comparison
|
||||
- Compares Cat vs current system
|
||||
- Direct performance comparison
|
||||
- Latency analysis
|
||||
|
||||
6. **start.sh** - Quick start script
|
||||
7. **stop.sh** - Quick stop script
|
||||
8. **TEST_README.md** - Full documentation
|
||||
|
||||
## 🚀 Next Steps
|
||||
|
||||
### Step 1: Start Services
|
||||
|
||||
```bash
|
||||
./start.sh
|
||||
```
|
||||
|
||||
Or manually:
|
||||
```bash
|
||||
docker-compose -f docker-compose.test.yml up -d
|
||||
```
|
||||
|
||||
### Step 2: Configure and Upload Knowledge
|
||||
|
||||
```bash
|
||||
python3 test_setup.py
|
||||
```
|
||||
|
||||
This will:
|
||||
- Wait for Cat to be ready
|
||||
- Configure it to use your llama-swap
|
||||
- Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
|
||||
- Run initial test queries
|
||||
|
||||
### Step 3: Run Benchmarks
|
||||
|
||||
```bash
|
||||
python3 benchmark_cat.py
|
||||
```
|
||||
|
||||
Expected runtime: ~10-15 minutes
|
||||
|
||||
Look for:
|
||||
- Mean latency < 1500ms = Good for voice chat
|
||||
- P95 latency < 2000ms = Acceptable
|
||||
- Success rate > 95% = Reliable
|
||||
|
||||
### Step 4: Compare Systems
|
||||
|
||||
```bash
|
||||
python3 compare_systems.py
|
||||
```
|
||||
|
||||
This compares Cat directly against your current query_llama() system.
|
||||
|
||||
### Step 5: Analyze Results
|
||||
|
||||
Review the output to decide:
|
||||
|
||||
✅ **Proceed with integration** if:
|
||||
- Latency is acceptable (< 1500ms mean)
|
||||
- RAG retrieval is accurate
|
||||
- Performance is consistent
|
||||
|
||||
⚠️ **Try optimizations** if:
|
||||
- Latency is borderline (1500-2000ms)
|
||||
- Consider GPU embeddings
|
||||
- Try hybrid approach
|
||||
|
||||
❌ **Stick with current system** if:
|
||||
- Latency is too high (> 2000ms)
|
||||
- RAG quality is poor
|
||||
- Too many errors
|
||||
|
||||
## 🔍 Monitoring
|
||||
|
||||
### Check Service Status
|
||||
```bash
|
||||
docker ps | grep miku
|
||||
```
|
||||
|
||||
### View Logs
|
||||
```bash
|
||||
docker logs miku_cheshire_cat_test -f
|
||||
docker logs miku_qdrant_test -f
|
||||
```
|
||||
|
||||
### Access Interfaces
|
||||
- Admin Panel: http://localhost:1865/admin
|
||||
- API Docs: http://localhost:1865/docs
|
||||
- Qdrant: http://localhost:6333/dashboard
|
||||
|
||||
## 📊 Key Metrics to Watch
|
||||
|
||||
### From FX-6100 Analysis:
|
||||
|
||||
Expected Cat overhead on your CPU:
|
||||
- **Embedding generation**: ~600ms (CPU-based)
|
||||
- **Vector search**: ~100-200ms
|
||||
- **Total overhead**: ~800ms
|
||||
|
||||
With GPU embeddings (if spare VRAM):
|
||||
- **Total overhead**: ~250ms (much better!)
|
||||
|
||||
### Voice Chat Viability
|
||||
|
||||
Your current system: ~500-1500ms
|
||||
Target with Cat: < 1500ms mean latency
|
||||
|
||||
If Cat adds ~800ms overhead:
|
||||
- Simple queries: 500ms + 800ms = 1300ms ✅ OK
|
||||
- Complex queries: 1500ms + 800ms = 2300ms ⚠️ Borderline
|
||||
|
||||
**GPU embeddings would bring this to acceptable range.**
|
||||
|
||||
## 🛠️ Troubleshooting
|
||||
|
||||
### Can't connect to llama-swap?
|
||||
|
||||
Edit `test_setup.py` line 10:
|
||||
```python
|
||||
# Try one of these:
|
||||
LLAMA_SWAP_URL = "http://llama-swap:8080/v1" # Docker network
|
||||
LLAMA_SWAP_URL = "http://host.docker.internal:8080/v1" # Host access
|
||||
LLAMA_SWAP_URL = "http://YOUR_IP:8080/v1" # Direct IP
|
||||
```
|
||||
|
||||
### Embeddings too slow?
|
||||
|
||||
Try GPU acceleration:
|
||||
1. Edit `docker-compose.test.yml` to add GPU support
|
||||
2. Configure embedder to use CUDA in `test_setup.py`
|
||||
|
||||
### Knowledge upload fails?
|
||||
|
||||
Upload manually:
|
||||
- Go to http://localhost:1865/admin
|
||||
- Click "Rabbit Hole" tab
|
||||
- Drag and drop: miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
|
||||
|
||||
## 🧹 Cleanup
|
||||
|
||||
### Stop services (keep data):
|
||||
```bash
|
||||
./stop.sh
|
||||
```
|
||||
|
||||
### Stop and remove all data:
|
||||
```bash
|
||||
docker-compose -f docker-compose.test.yml down -v
|
||||
```
|
||||
|
||||
## 📈 Expected Results
|
||||
|
||||
Based on your FX-6100 CPU:
|
||||
|
||||
### Pessimistic (CPU embeddings):
|
||||
- Mean latency: 1600-2200ms
|
||||
- Suitable for text chat: ✅
|
||||
- Suitable for voice chat: ⚠️ Borderline
|
||||
|
||||
### Optimistic (GPU embeddings):
|
||||
- Mean latency: 900-1400ms
|
||||
- Suitable for text chat: ✅
|
||||
- Suitable for voice chat: ✅
|
||||
|
||||
## 🎯 Decision Matrix
|
||||
|
||||
After benchmarking:
|
||||
|
||||
| Scenario | Action |
|
||||
|----------|--------|
|
||||
| Mean < 1500ms, RAG accurate | ✅ **Integrate fully** |
|
||||
| Mean 1500-2000ms | ⚠️ **Try GPU embeddings** |
|
||||
| Mean > 2000ms | ⚠️ **Hybrid approach only** |
|
||||
| Mean > 3000ms | ❌ **Don't use** |
|
||||
|
||||
## 📚 Documentation
|
||||
|
||||
- Full guide: `TEST_README.md`
|
||||
- Original local-cat docs: `README.md`
|
||||
- Cheshire Cat docs: https://cheshire-cat-ai.github.io/docs/
|
||||
|
||||
---
|
||||
|
||||
## ✨ Summary
|
||||
|
||||
You now have a complete, isolated testing environment to:
|
||||
1. ✅ Measure real performance on your FX-6100
|
||||
2. ✅ Compare against your current system
|
||||
3. ✅ Test RAG accuracy with Miku's knowledge
|
||||
4. ✅ Simulate voice chat workloads
|
||||
5. ✅ Make a data-driven decision
|
||||
|
||||
**Ready to test? Run:** `./start.sh`
|
||||
|
||||
Good luck! 🚀
|
||||
Reference in New Issue
Block a user