# Cheshire Cat Test Environment for Miku Bot This is a standalone test environment for evaluating Cheshire Cat AI as a potential memory/context system for the Miku Discord bot. ## ๐ŸŽฏ Goals 1. **Test performance** - Measure latency, overhead, and real-time viability 2. **Evaluate memory** - Compare RAG-based context retrieval vs full context loading 3. **Benchmark CPU impact** - Assess performance on AMD FX-6100 4. **Make informed decision** - Data-driven choice on integration ## ๐Ÿ“ Directory Structure ``` cheshire-cat/ โ”œโ”€โ”€ cat/ # Cat data (created on first run) โ”‚ โ”œโ”€โ”€ data/ # Cat's internal data โ”‚ โ”œโ”€โ”€ plugins/ # Custom plugins โ”‚ โ”œโ”€โ”€ static/ # Static assets โ”‚ โ””โ”€โ”€ long_term_memory/ # Qdrant vector storage โ”œโ”€โ”€ .env # Environment configuration โ”œโ”€โ”€ docker-compose.test.yml # Docker setup โ”œโ”€โ”€ test_setup.py # Initial setup script โ”œโ”€โ”€ benchmark_cat.py # Comprehensive benchmarks โ”œโ”€โ”€ compare_systems.py # Compare Cat vs current system โ””โ”€โ”€ TEST_README.md # This file ``` ## ๐Ÿš€ Quick Start ### 1. Prerequisites - Docker and Docker Compose installed - Miku bot's llama-swap service running - Python 3.8+ with requests library ```bash pip3 install requests ``` ### 2. Start Cheshire Cat ```bash # From the cheshire-cat directory docker-compose -f docker-compose.test.yml up -d ``` Wait ~30 seconds for services to start. ### 3. Configure and Test ```bash # Run setup script (configures LLM, uploads knowledge base) python3 test_setup.py ``` This will: - โœ… Wait for Cat to be ready - โœ… Configure Cat to use llama-swap - โœ… Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt - โœ… Run test queries ### 4. Run Benchmarks ```bash # Comprehensive performance benchmark python3 benchmark_cat.py ``` This tests: - Simple greetings (low complexity) - Factual queries (medium complexity) - Memory recall (high complexity) - Voice chat simulation (rapid-fire queries) ### 5. Compare with Current System ```bash # Side-by-side comparison python3 compare_systems.py ``` Compares latency between: - ๐Ÿฑ Cheshire Cat (RAG-based context) - ๐Ÿ“ฆ Current system (full context loading) ## ๐Ÿ” What to Look For ### โœ… Good Signs (Proceed with Integration) - Mean latency < 1500ms - P95 latency < 2000ms - Consistent performance across query types - RAG retrieves relevant context accurately ### โš ๏ธ Warning Signs (Reconsider) - Mean latency > 2000ms - High variance (large stdev) - RAG misses important context - Frequent errors or timeouts ### โŒ Stop Signs (Don't Use) - Mean latency > 3000ms - P95 latency > 5000ms - RAG retrieval quality is poor - System crashes or hangs ## ๐Ÿ“Š Understanding the Results ### Latency Metrics - **Mean**: Average response time - **Median**: Middle value (less affected by outliers) - **P95**: 95% of queries are faster than this - **P99**: 99% of queries are faster than this ### Voice Chat Target For real-time voice chat: - Target: < 2000ms total latency - Acceptable: 1000-1500ms mean - Borderline: 1500-2000ms mean - Too slow: > 2000ms mean ### FX-6100 Considerations Your CPU may add overhead: - Embedding generation: ~600ms - Vector search: ~100-200ms - Total Cat overhead: ~800ms **With GPU embeddings**, this drops to ~250ms. ## ๐Ÿ› ๏ธ Troubleshooting ### Cat won't start ```bash # Check logs docker logs miku_cheshire_cat_test # Check if ports are in use sudo netstat -tlnp | grep 1865 ``` ### Can't connect to llama-swap The compose file tries to connect via: 1. External network: `miku-discord_default` 2. Host network: `host.docker.internal` If both fail, check llama-swap URL in test_setup.py and adjust. ### Embeddings are slow Try GPU acceleration in docker-compose.test.yml (requires spare VRAM). ### Knowledge upload fails Upload files manually via admin panel: - http://localhost:1865/admin - Go to "Rabbit Hole" tab - Drag and drop files ## ๐Ÿ”— Useful Endpoints - **Admin Panel**: http://localhost:1865/admin - **API Docs**: http://localhost:1865/docs - **Qdrant Dashboard**: http://localhost:6333/dashboard - **Health Check**: http://localhost:1865/ ## ๐Ÿ“ Decision Criteria After running benchmarks, consider: | Metric | Target | Your Result | |--------|--------|-------------| | Mean latency | < 1500ms | _____ ms | | P95 latency | < 2000ms | _____ ms | | Success rate | > 95% | _____ % | | RAG accuracy | Good | _____ | **Decision:** - โœ… All targets met โ†’ **Integrate with bot** - โš ๏ธ Some targets met โ†’ **Try GPU embeddings or hybrid approach** - โŒ Targets not met โ†’ **Stick with current system** ## ๐Ÿงน Cleanup ```bash # Stop services docker-compose -f docker-compose.test.yml down # Remove volumes (deletes all data) docker-compose -f docker-compose.test.yml down -v ``` --- **Remember**: This is a test environment. Don't integrate with production bot until you're confident in the results!