Configuration: - .env.example, .gitignore, compose.yml (main docker compose) - docker-compose-amd.yml (ROCm), docker-compose-macos.yml - start.sh, stop.sh convenience scripts - LICENSE (Apache 2.0, from upstream Cheshire Cat) Memory management utilities: - analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py - check_memories.py, extract_declarative_facts.py, store_declarative_facts.py - compare_systems.py (system comparison tool) - benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py Test suite: - quick_test.py, test_setup.py, test_setup_simple.py - test_consolidation_direct.py, test_declarative_recall.py, test_recall.py - test_end_to_end.py, test_full_pipeline.py - test_phase2.py, test_phase2_comprehensive.py Documentation: - README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md - PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md - POST_OPTIMIZATION_ANALYSIS.md
227 lines
5.0 KiB
Markdown
227 lines
5.0 KiB
Markdown
# 🎉 Cheshire Cat Test Environment Setup Complete!
|
|
|
|
## 📦 What Was Created
|
|
|
|
A complete standalone testing environment for evaluating Cheshire Cat AI as a memory system for Miku Bot.
|
|
|
|
### Files Created:
|
|
|
|
1. **docker-compose.test.yml** - Docker services configuration
|
|
- Cheshire Cat Core (connected to llama-swap)
|
|
- Qdrant vector database
|
|
- Connected to your existing bot network
|
|
|
|
2. **.env** - Environment configuration
|
|
- Core settings
|
|
- Qdrant settings
|
|
- Debug mode enabled
|
|
|
|
3. **test_setup.py** - Automated setup script
|
|
- Configures Cat to use llama-swap
|
|
- Uploads Miku knowledge base
|
|
- Runs test queries
|
|
|
|
4. **benchmark_cat.py** - Comprehensive performance testing
|
|
- Tests various query types
|
|
- Measures latency statistics
|
|
- Voice chat simulation
|
|
- Generates detailed reports
|
|
|
|
5. **compare_systems.py** - Side-by-side comparison
|
|
- Compares Cat vs current system
|
|
- Direct performance comparison
|
|
- Latency analysis
|
|
|
|
6. **start.sh** - Quick start script
|
|
7. **stop.sh** - Quick stop script
|
|
8. **TEST_README.md** - Full documentation
|
|
|
|
## 🚀 Next Steps
|
|
|
|
### Step 1: Start Services
|
|
|
|
```bash
|
|
./start.sh
|
|
```
|
|
|
|
Or manually:
|
|
```bash
|
|
docker-compose -f docker-compose.test.yml up -d
|
|
```
|
|
|
|
### Step 2: Configure and Upload Knowledge
|
|
|
|
```bash
|
|
python3 test_setup.py
|
|
```
|
|
|
|
This will:
|
|
- Wait for Cat to be ready
|
|
- Configure it to use your llama-swap
|
|
- Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
|
|
- Run initial test queries
|
|
|
|
### Step 3: Run Benchmarks
|
|
|
|
```bash
|
|
python3 benchmark_cat.py
|
|
```
|
|
|
|
Expected runtime: ~10-15 minutes
|
|
|
|
Look for:
|
|
- Mean latency < 1500ms = Good for voice chat
|
|
- P95 latency < 2000ms = Acceptable
|
|
- Success rate > 95% = Reliable
|
|
|
|
### Step 4: Compare Systems
|
|
|
|
```bash
|
|
python3 compare_systems.py
|
|
```
|
|
|
|
This compares Cat directly against your current query_llama() system.
|
|
|
|
### Step 5: Analyze Results
|
|
|
|
Review the output to decide:
|
|
|
|
✅ **Proceed with integration** if:
|
|
- Latency is acceptable (< 1500ms mean)
|
|
- RAG retrieval is accurate
|
|
- Performance is consistent
|
|
|
|
⚠️ **Try optimizations** if:
|
|
- Latency is borderline (1500-2000ms)
|
|
- Consider GPU embeddings
|
|
- Try hybrid approach
|
|
|
|
❌ **Stick with current system** if:
|
|
- Latency is too high (> 2000ms)
|
|
- RAG quality is poor
|
|
- Too many errors
|
|
|
|
## 🔍 Monitoring
|
|
|
|
### Check Service Status
|
|
```bash
|
|
docker ps | grep miku
|
|
```
|
|
|
|
### View Logs
|
|
```bash
|
|
docker logs miku_cheshire_cat_test -f
|
|
docker logs miku_qdrant_test -f
|
|
```
|
|
|
|
### Access Interfaces
|
|
- Admin Panel: http://localhost:1865/admin
|
|
- API Docs: http://localhost:1865/docs
|
|
- Qdrant: http://localhost:6333/dashboard
|
|
|
|
## 📊 Key Metrics to Watch
|
|
|
|
### From FX-6100 Analysis:
|
|
|
|
Expected Cat overhead on your CPU:
|
|
- **Embedding generation**: ~600ms (CPU-based)
|
|
- **Vector search**: ~100-200ms
|
|
- **Total overhead**: ~800ms
|
|
|
|
With GPU embeddings (if spare VRAM):
|
|
- **Total overhead**: ~250ms (much better!)
|
|
|
|
### Voice Chat Viability
|
|
|
|
Your current system: ~500-1500ms
|
|
Target with Cat: < 1500ms mean latency
|
|
|
|
If Cat adds ~800ms overhead:
|
|
- Simple queries: 500ms + 800ms = 1300ms ✅ OK
|
|
- Complex queries: 1500ms + 800ms = 2300ms ⚠️ Borderline
|
|
|
|
**GPU embeddings would bring this to acceptable range.**
|
|
|
|
## 🛠️ Troubleshooting
|
|
|
|
### Can't connect to llama-swap?
|
|
|
|
Edit `test_setup.py` line 10:
|
|
```python
|
|
# Try one of these:
|
|
LLAMA_SWAP_URL = "http://llama-swap:8080/v1" # Docker network
|
|
LLAMA_SWAP_URL = "http://host.docker.internal:8080/v1" # Host access
|
|
LLAMA_SWAP_URL = "http://YOUR_IP:8080/v1" # Direct IP
|
|
```
|
|
|
|
### Embeddings too slow?
|
|
|
|
Try GPU acceleration:
|
|
1. Edit `docker-compose.test.yml` to add GPU support
|
|
2. Configure embedder to use CUDA in `test_setup.py`
|
|
|
|
### Knowledge upload fails?
|
|
|
|
Upload manually:
|
|
- Go to http://localhost:1865/admin
|
|
- Click "Rabbit Hole" tab
|
|
- Drag and drop: miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
|
|
|
|
## 🧹 Cleanup
|
|
|
|
### Stop services (keep data):
|
|
```bash
|
|
./stop.sh
|
|
```
|
|
|
|
### Stop and remove all data:
|
|
```bash
|
|
docker-compose -f docker-compose.test.yml down -v
|
|
```
|
|
|
|
## 📈 Expected Results
|
|
|
|
Based on your FX-6100 CPU:
|
|
|
|
### Pessimistic (CPU embeddings):
|
|
- Mean latency: 1600-2200ms
|
|
- Suitable for text chat: ✅
|
|
- Suitable for voice chat: ⚠️ Borderline
|
|
|
|
### Optimistic (GPU embeddings):
|
|
- Mean latency: 900-1400ms
|
|
- Suitable for text chat: ✅
|
|
- Suitable for voice chat: ✅
|
|
|
|
## 🎯 Decision Matrix
|
|
|
|
After benchmarking:
|
|
|
|
| Scenario | Action |
|
|
|----------|--------|
|
|
| Mean < 1500ms, RAG accurate | ✅ **Integrate fully** |
|
|
| Mean 1500-2000ms | ⚠️ **Try GPU embeddings** |
|
|
| Mean > 2000ms | ⚠️ **Hybrid approach only** |
|
|
| Mean > 3000ms | ❌ **Don't use** |
|
|
|
|
## 📚 Documentation
|
|
|
|
- Full guide: `TEST_README.md`
|
|
- Original local-cat docs: `README.md`
|
|
- Cheshire Cat docs: https://cheshire-cat-ai.github.io/docs/
|
|
|
|
---
|
|
|
|
## ✨ Summary
|
|
|
|
You now have a complete, isolated testing environment to:
|
|
1. ✅ Measure real performance on your FX-6100
|
|
2. ✅ Compare against your current system
|
|
3. ✅ Test RAG accuracy with Miku's knowledge
|
|
4. ✅ Simulate voice chat workloads
|
|
5. ✅ Make a data-driven decision
|
|
|
|
**Ready to test? Run:** `./start.sh`
|
|
|
|
Good luck! 🚀
|