Configuration: - .env.example, .gitignore, compose.yml (main docker compose) - docker-compose-amd.yml (ROCm), docker-compose-macos.yml - start.sh, stop.sh convenience scripts - LICENSE (Apache 2.0, from upstream Cheshire Cat) Memory management utilities: - analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py - check_memories.py, extract_declarative_facts.py, store_declarative_facts.py - compare_systems.py (system comparison tool) - benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py Test suite: - quick_test.py, test_setup.py, test_setup_simple.py - test_consolidation_direct.py, test_declarative_recall.py, test_recall.py - test_end_to_end.py, test_full_pipeline.py - test_phase2.py, test_phase2_comprehensive.py Documentation: - README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md - PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md - POST_OPTIMIZATION_ANALYSIS.md
4.9 KiB
4.9 KiB
Cheshire Cat Test Environment for Miku Bot
This is a standalone test environment for evaluating Cheshire Cat AI as a potential memory/context system for the Miku Discord bot.
🎯 Goals
- Test performance - Measure latency, overhead, and real-time viability
- Evaluate memory - Compare RAG-based context retrieval vs full context loading
- Benchmark CPU impact - Assess performance on AMD FX-6100
- Make informed decision - Data-driven choice on integration
📁 Directory Structure
cheshire-cat/
├── cat/ # Cat data (created on first run)
│ ├── data/ # Cat's internal data
│ ├── plugins/ # Custom plugins
│ ├── static/ # Static assets
│ └── long_term_memory/ # Qdrant vector storage
├── .env # Environment configuration
├── docker-compose.test.yml # Docker setup
├── test_setup.py # Initial setup script
├── benchmark_cat.py # Comprehensive benchmarks
├── compare_systems.py # Compare Cat vs current system
└── TEST_README.md # This file
🚀 Quick Start
1. Prerequisites
- Docker and Docker Compose installed
- Miku bot's llama-swap service running
- Python 3.8+ with requests library
pip3 install requests
2. Start Cheshire Cat
# From the cheshire-cat directory
docker-compose -f docker-compose.test.yml up -d
Wait ~30 seconds for services to start.
3. Configure and Test
# Run setup script (configures LLM, uploads knowledge base)
python3 test_setup.py
This will:
- ✅ Wait for Cat to be ready
- ✅ Configure Cat to use llama-swap
- ✅ Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
- ✅ Run test queries
4. Run Benchmarks
# Comprehensive performance benchmark
python3 benchmark_cat.py
This tests:
- Simple greetings (low complexity)
- Factual queries (medium complexity)
- Memory recall (high complexity)
- Voice chat simulation (rapid-fire queries)
5. Compare with Current System
# Side-by-side comparison
python3 compare_systems.py
Compares latency between:
- 🐱 Cheshire Cat (RAG-based context)
- 📦 Current system (full context loading)
🔍 What to Look For
✅ Good Signs (Proceed with Integration)
- Mean latency < 1500ms
- P95 latency < 2000ms
- Consistent performance across query types
- RAG retrieves relevant context accurately
⚠️ Warning Signs (Reconsider)
- Mean latency > 2000ms
- High variance (large stdev)
- RAG misses important context
- Frequent errors or timeouts
❌ Stop Signs (Don't Use)
- Mean latency > 3000ms
- P95 latency > 5000ms
- RAG retrieval quality is poor
- System crashes or hangs
📊 Understanding the Results
Latency Metrics
- Mean: Average response time
- Median: Middle value (less affected by outliers)
- P95: 95% of queries are faster than this
- P99: 99% of queries are faster than this
Voice Chat Target
For real-time voice chat:
- Target: < 2000ms total latency
- Acceptable: 1000-1500ms mean
- Borderline: 1500-2000ms mean
- Too slow: > 2000ms mean
FX-6100 Considerations
Your CPU may add overhead:
- Embedding generation: ~600ms
- Vector search: ~100-200ms
- Total Cat overhead: ~800ms
With GPU embeddings, this drops to ~250ms.
🛠️ Troubleshooting
Cat won't start
# Check logs
docker logs miku_cheshire_cat_test
# Check if ports are in use
sudo netstat -tlnp | grep 1865
Can't connect to llama-swap
The compose file tries to connect via:
- External network:
miku-discord_default - Host network:
host.docker.internal
If both fail, check llama-swap URL in test_setup.py and adjust.
Embeddings are slow
Try GPU acceleration in docker-compose.test.yml (requires spare VRAM).
Knowledge upload fails
Upload files manually via admin panel:
- http://localhost:1865/admin
- Go to "Rabbit Hole" tab
- Drag and drop files
🔗 Useful Endpoints
- Admin Panel: http://localhost:1865/admin
- API Docs: http://localhost:1865/docs
- Qdrant Dashboard: http://localhost:6333/dashboard
- Health Check: http://localhost:1865/
📝 Decision Criteria
After running benchmarks, consider:
| Metric | Target | Your Result |
|---|---|---|
| Mean latency | < 1500ms | _____ ms |
| P95 latency | < 2000ms | _____ ms |
| Success rate | > 95% | _____ % |
| RAG accuracy | Good | _____ |
Decision:
- ✅ All targets met → Integrate with bot
- ⚠️ Some targets met → Try GPU embeddings or hybrid approach
- ❌ Targets not met → Stick with current system
🧹 Cleanup
# Stop services
docker-compose -f docker-compose.test.yml down
# Remove volumes (deletes all data)
docker-compose -f docker-compose.test.yml down -v
Remember: This is a test environment. Don't integrate with production bot until you're confident in the results!