Configuration: - .env.example, .gitignore, compose.yml (main docker compose) - docker-compose-amd.yml (ROCm), docker-compose-macos.yml - start.sh, stop.sh convenience scripts - LICENSE (Apache 2.0, from upstream Cheshire Cat) Memory management utilities: - analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py - check_memories.py, extract_declarative_facts.py, store_declarative_facts.py - compare_systems.py (system comparison tool) - benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py Test suite: - quick_test.py, test_setup.py, test_setup_simple.py - test_consolidation_direct.py, test_declarative_recall.py, test_recall.py - test_end_to_end.py, test_full_pipeline.py - test_phase2.py, test_phase2_comprehensive.py Documentation: - README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md - PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md - POST_OPTIMIZATION_ANALYSIS.md
5.0 KiB
🎉 Cheshire Cat Test Environment Setup Complete!
📦 What Was Created
A complete standalone testing environment for evaluating Cheshire Cat AI as a memory system for Miku Bot.
Files Created:
-
docker-compose.test.yml - Docker services configuration
- Cheshire Cat Core (connected to llama-swap)
- Qdrant vector database
- Connected to your existing bot network
-
.env - Environment configuration
- Core settings
- Qdrant settings
- Debug mode enabled
-
test_setup.py - Automated setup script
- Configures Cat to use llama-swap
- Uploads Miku knowledge base
- Runs test queries
-
benchmark_cat.py - Comprehensive performance testing
- Tests various query types
- Measures latency statistics
- Voice chat simulation
- Generates detailed reports
-
compare_systems.py - Side-by-side comparison
- Compares Cat vs current system
- Direct performance comparison
- Latency analysis
-
start.sh - Quick start script
-
stop.sh - Quick stop script
-
TEST_README.md - Full documentation
🚀 Next Steps
Step 1: Start Services
./start.sh
Or manually:
docker-compose -f docker-compose.test.yml up -d
Step 2: Configure and Upload Knowledge
python3 test_setup.py
This will:
- Wait for Cat to be ready
- Configure it to use your llama-swap
- Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
- Run initial test queries
Step 3: Run Benchmarks
python3 benchmark_cat.py
Expected runtime: ~10-15 minutes
Look for:
- Mean latency < 1500ms = Good for voice chat
- P95 latency < 2000ms = Acceptable
- Success rate > 95% = Reliable
Step 4: Compare Systems
python3 compare_systems.py
This compares Cat directly against your current query_llama() system.
Step 5: Analyze Results
Review the output to decide:
✅ Proceed with integration if:
- Latency is acceptable (< 1500ms mean)
- RAG retrieval is accurate
- Performance is consistent
⚠️ Try optimizations if:
- Latency is borderline (1500-2000ms)
- Consider GPU embeddings
- Try hybrid approach
❌ Stick with current system if:
- Latency is too high (> 2000ms)
- RAG quality is poor
- Too many errors
🔍 Monitoring
Check Service Status
docker ps | grep miku
View Logs
docker logs miku_cheshire_cat_test -f
docker logs miku_qdrant_test -f
Access Interfaces
- Admin Panel: http://localhost:1865/admin
- API Docs: http://localhost:1865/docs
- Qdrant: http://localhost:6333/dashboard
📊 Key Metrics to Watch
From FX-6100 Analysis:
Expected Cat overhead on your CPU:
- Embedding generation: ~600ms (CPU-based)
- Vector search: ~100-200ms
- Total overhead: ~800ms
With GPU embeddings (if spare VRAM):
- Total overhead: ~250ms (much better!)
Voice Chat Viability
Your current system: ~500-1500ms Target with Cat: < 1500ms mean latency
If Cat adds ~800ms overhead:
- Simple queries: 500ms + 800ms = 1300ms ✅ OK
- Complex queries: 1500ms + 800ms = 2300ms ⚠️ Borderline
GPU embeddings would bring this to acceptable range.
🛠️ Troubleshooting
Can't connect to llama-swap?
Edit test_setup.py line 10:
# Try one of these:
LLAMA_SWAP_URL = "http://llama-swap:8080/v1" # Docker network
LLAMA_SWAP_URL = "http://host.docker.internal:8080/v1" # Host access
LLAMA_SWAP_URL = "http://YOUR_IP:8080/v1" # Direct IP
Embeddings too slow?
Try GPU acceleration:
- Edit
docker-compose.test.ymlto add GPU support - Configure embedder to use CUDA in
test_setup.py
Knowledge upload fails?
Upload manually:
- Go to http://localhost:1865/admin
- Click "Rabbit Hole" tab
- Drag and drop: miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
🧹 Cleanup
Stop services (keep data):
./stop.sh
Stop and remove all data:
docker-compose -f docker-compose.test.yml down -v
📈 Expected Results
Based on your FX-6100 CPU:
Pessimistic (CPU embeddings):
- Mean latency: 1600-2200ms
- Suitable for text chat: ✅
- Suitable for voice chat: ⚠️ Borderline
Optimistic (GPU embeddings):
- Mean latency: 900-1400ms
- Suitable for text chat: ✅
- Suitable for voice chat: ✅
🎯 Decision Matrix
After benchmarking:
| Scenario | Action |
|---|---|
| Mean < 1500ms, RAG accurate | ✅ Integrate fully |
| Mean 1500-2000ms | ⚠️ Try GPU embeddings |
| Mean > 2000ms | ⚠️ Hybrid approach only |
| Mean > 3000ms | ❌ Don't use |
📚 Documentation
- Full guide:
TEST_README.md - Original local-cat docs:
README.md - Cheshire Cat docs: https://cheshire-cat-ai.github.io/docs/
✨ Summary
You now have a complete, isolated testing environment to:
- ✅ Measure real performance on your FX-6100
- ✅ Compare against your current system
- ✅ Test RAG accuracy with Miku's knowledge
- ✅ Simulate voice chat workloads
- ✅ Make a data-driven decision
Ready to test? Run: ./start.sh
Good luck! 🚀