227 lines
5.0 KiB
Markdown
227 lines
5.0 KiB
Markdown
|
|
# 🎉 Cheshire Cat Test Environment Setup Complete!
|
||
|
|
|
||
|
|
## 📦 What Was Created
|
||
|
|
|
||
|
|
A complete standalone testing environment for evaluating Cheshire Cat AI as a memory system for Miku Bot.
|
||
|
|
|
||
|
|
### Files Created:
|
||
|
|
|
||
|
|
1. **docker-compose.test.yml** - Docker services configuration
|
||
|
|
- Cheshire Cat Core (connected to llama-swap)
|
||
|
|
- Qdrant vector database
|
||
|
|
- Connected to your existing bot network
|
||
|
|
|
||
|
|
2. **.env** - Environment configuration
|
||
|
|
- Core settings
|
||
|
|
- Qdrant settings
|
||
|
|
- Debug mode enabled
|
||
|
|
|
||
|
|
3. **test_setup.py** - Automated setup script
|
||
|
|
- Configures Cat to use llama-swap
|
||
|
|
- Uploads Miku knowledge base
|
||
|
|
- Runs test queries
|
||
|
|
|
||
|
|
4. **benchmark_cat.py** - Comprehensive performance testing
|
||
|
|
- Tests various query types
|
||
|
|
- Measures latency statistics
|
||
|
|
- Voice chat simulation
|
||
|
|
- Generates detailed reports
|
||
|
|
|
||
|
|
5. **compare_systems.py** - Side-by-side comparison
|
||
|
|
- Compares Cat vs current system
|
||
|
|
- Direct performance comparison
|
||
|
|
- Latency analysis
|
||
|
|
|
||
|
|
6. **start.sh** - Quick start script
|
||
|
|
7. **stop.sh** - Quick stop script
|
||
|
|
8. **TEST_README.md** - Full documentation
|
||
|
|
|
||
|
|
## 🚀 Next Steps
|
||
|
|
|
||
|
|
### Step 1: Start Services
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./start.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
Or manually:
|
||
|
|
```bash
|
||
|
|
docker-compose -f docker-compose.test.yml up -d
|
||
|
|
```
|
||
|
|
|
||
|
|
### Step 2: Configure and Upload Knowledge
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python3 test_setup.py
|
||
|
|
```
|
||
|
|
|
||
|
|
This will:
|
||
|
|
- Wait for Cat to be ready
|
||
|
|
- Configure it to use your llama-swap
|
||
|
|
- Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
|
||
|
|
- Run initial test queries
|
||
|
|
|
||
|
|
### Step 3: Run Benchmarks
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python3 benchmark_cat.py
|
||
|
|
```
|
||
|
|
|
||
|
|
Expected runtime: ~10-15 minutes
|
||
|
|
|
||
|
|
Look for:
|
||
|
|
- Mean latency < 1500ms = Good for voice chat
|
||
|
|
- P95 latency < 2000ms = Acceptable
|
||
|
|
- Success rate > 95% = Reliable
|
||
|
|
|
||
|
|
### Step 4: Compare Systems
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python3 compare_systems.py
|
||
|
|
```
|
||
|
|
|
||
|
|
This compares Cat directly against your current query_llama() system.
|
||
|
|
|
||
|
|
### Step 5: Analyze Results
|
||
|
|
|
||
|
|
Review the output to decide:
|
||
|
|
|
||
|
|
✅ **Proceed with integration** if:
|
||
|
|
- Latency is acceptable (< 1500ms mean)
|
||
|
|
- RAG retrieval is accurate
|
||
|
|
- Performance is consistent
|
||
|
|
|
||
|
|
⚠️ **Try optimizations** if:
|
||
|
|
- Latency is borderline (1500-2000ms)
|
||
|
|
- Consider GPU embeddings
|
||
|
|
- Try hybrid approach
|
||
|
|
|
||
|
|
❌ **Stick with current system** if:
|
||
|
|
- Latency is too high (> 2000ms)
|
||
|
|
- RAG quality is poor
|
||
|
|
- Too many errors
|
||
|
|
|
||
|
|
## 🔍 Monitoring
|
||
|
|
|
||
|
|
### Check Service Status
|
||
|
|
```bash
|
||
|
|
docker ps | grep miku
|
||
|
|
```
|
||
|
|
|
||
|
|
### View Logs
|
||
|
|
```bash
|
||
|
|
docker logs miku_cheshire_cat_test -f
|
||
|
|
docker logs miku_qdrant_test -f
|
||
|
|
```
|
||
|
|
|
||
|
|
### Access Interfaces
|
||
|
|
- Admin Panel: http://localhost:1865/admin
|
||
|
|
- API Docs: http://localhost:1865/docs
|
||
|
|
- Qdrant: http://localhost:6333/dashboard
|
||
|
|
|
||
|
|
## 📊 Key Metrics to Watch
|
||
|
|
|
||
|
|
### From FX-6100 Analysis:
|
||
|
|
|
||
|
|
Expected Cat overhead on your CPU:
|
||
|
|
- **Embedding generation**: ~600ms (CPU-based)
|
||
|
|
- **Vector search**: ~100-200ms
|
||
|
|
- **Total overhead**: ~800ms
|
||
|
|
|
||
|
|
With GPU embeddings (if spare VRAM):
|
||
|
|
- **Total overhead**: ~250ms (much better!)
|
||
|
|
|
||
|
|
### Voice Chat Viability
|
||
|
|
|
||
|
|
Your current system: ~500-1500ms
|
||
|
|
Target with Cat: < 1500ms mean latency
|
||
|
|
|
||
|
|
If Cat adds ~800ms overhead:
|
||
|
|
- Simple queries: 500ms + 800ms = 1300ms ✅ OK
|
||
|
|
- Complex queries: 1500ms + 800ms = 2300ms ⚠️ Borderline
|
||
|
|
|
||
|
|
**GPU embeddings would bring this to acceptable range.**
|
||
|
|
|
||
|
|
## 🛠️ Troubleshooting
|
||
|
|
|
||
|
|
### Can't connect to llama-swap?
|
||
|
|
|
||
|
|
Edit `test_setup.py` line 10:
|
||
|
|
```python
|
||
|
|
# Try one of these:
|
||
|
|
LLAMA_SWAP_URL = "http://llama-swap:8080/v1" # Docker network
|
||
|
|
LLAMA_SWAP_URL = "http://host.docker.internal:8080/v1" # Host access
|
||
|
|
LLAMA_SWAP_URL = "http://YOUR_IP:8080/v1" # Direct IP
|
||
|
|
```
|
||
|
|
|
||
|
|
### Embeddings too slow?
|
||
|
|
|
||
|
|
Try GPU acceleration:
|
||
|
|
1. Edit `docker-compose.test.yml` to add GPU support
|
||
|
|
2. Configure embedder to use CUDA in `test_setup.py`
|
||
|
|
|
||
|
|
### Knowledge upload fails?
|
||
|
|
|
||
|
|
Upload manually:
|
||
|
|
- Go to http://localhost:1865/admin
|
||
|
|
- Click "Rabbit Hole" tab
|
||
|
|
- Drag and drop: miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
|
||
|
|
|
||
|
|
## 🧹 Cleanup
|
||
|
|
|
||
|
|
### Stop services (keep data):
|
||
|
|
```bash
|
||
|
|
./stop.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
### Stop and remove all data:
|
||
|
|
```bash
|
||
|
|
docker-compose -f docker-compose.test.yml down -v
|
||
|
|
```
|
||
|
|
|
||
|
|
## 📈 Expected Results
|
||
|
|
|
||
|
|
Based on your FX-6100 CPU:
|
||
|
|
|
||
|
|
### Pessimistic (CPU embeddings):
|
||
|
|
- Mean latency: 1600-2200ms
|
||
|
|
- Suitable for text chat: ✅
|
||
|
|
- Suitable for voice chat: ⚠️ Borderline
|
||
|
|
|
||
|
|
### Optimistic (GPU embeddings):
|
||
|
|
- Mean latency: 900-1400ms
|
||
|
|
- Suitable for text chat: ✅
|
||
|
|
- Suitable for voice chat: ✅
|
||
|
|
|
||
|
|
## 🎯 Decision Matrix
|
||
|
|
|
||
|
|
After benchmarking:
|
||
|
|
|
||
|
|
| Scenario | Action |
|
||
|
|
|----------|--------|
|
||
|
|
| Mean < 1500ms, RAG accurate | ✅ **Integrate fully** |
|
||
|
|
| Mean 1500-2000ms | ⚠️ **Try GPU embeddings** |
|
||
|
|
| Mean > 2000ms | ⚠️ **Hybrid approach only** |
|
||
|
|
| Mean > 3000ms | ❌ **Don't use** |
|
||
|
|
|
||
|
|
## 📚 Documentation
|
||
|
|
|
||
|
|
- Full guide: `TEST_README.md`
|
||
|
|
- Original local-cat docs: `README.md`
|
||
|
|
- Cheshire Cat docs: https://cheshire-cat-ai.github.io/docs/
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## ✨ Summary
|
||
|
|
|
||
|
|
You now have a complete, isolated testing environment to:
|
||
|
|
1. ✅ Measure real performance on your FX-6100
|
||
|
|
2. ✅ Compare against your current system
|
||
|
|
3. ✅ Test RAG accuracy with Miku's knowledge
|
||
|
|
4. ✅ Simulate voice chat workloads
|
||
|
|
5. ✅ Make a data-driven decision
|
||
|
|
|
||
|
|
**Ready to test? Run:** `./start.sh`
|
||
|
|
|
||
|
|
Good luck! 🚀
|