Files
miku-discord/cheshire-cat/TEST_README.md
koko210Serve ae1e0aa144 add: cheshire-cat configuration, tooling, tests, and documentation
Configuration:
- .env.example, .gitignore, compose.yml (main docker compose)
- docker-compose-amd.yml (ROCm), docker-compose-macos.yml
- start.sh, stop.sh convenience scripts
- LICENSE (Apache 2.0, from upstream Cheshire Cat)

Memory management utilities:
- analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py
- check_memories.py, extract_declarative_facts.py, store_declarative_facts.py
- compare_systems.py (system comparison tool)
- benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py

Test suite:
- quick_test.py, test_setup.py, test_setup_simple.py
- test_consolidation_direct.py, test_declarative_recall.py, test_recall.py
- test_end_to_end.py, test_full_pipeline.py
- test_phase2.py, test_phase2_comprehensive.py

Documentation:
- README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md
- PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md
- POST_OPTIMIZATION_ANALYSIS.md
2026-03-04 00:51:14 +02:00

4.9 KiB

Cheshire Cat Test Environment for Miku Bot

This is a standalone test environment for evaluating Cheshire Cat AI as a potential memory/context system for the Miku Discord bot.

🎯 Goals

  1. Test performance - Measure latency, overhead, and real-time viability
  2. Evaluate memory - Compare RAG-based context retrieval vs full context loading
  3. Benchmark CPU impact - Assess performance on AMD FX-6100
  4. Make informed decision - Data-driven choice on integration

📁 Directory Structure

cheshire-cat/
├── cat/                    # Cat data (created on first run)
│   ├── data/              # Cat's internal data
│   ├── plugins/           # Custom plugins
│   ├── static/            # Static assets
│   └── long_term_memory/  # Qdrant vector storage
├── .env                    # Environment configuration
├── docker-compose.test.yml # Docker setup
├── test_setup.py          # Initial setup script
├── benchmark_cat.py       # Comprehensive benchmarks
├── compare_systems.py     # Compare Cat vs current system
└── TEST_README.md         # This file

🚀 Quick Start

1. Prerequisites

  • Docker and Docker Compose installed
  • Miku bot's llama-swap service running
  • Python 3.8+ with requests library
pip3 install requests

2. Start Cheshire Cat

# From the cheshire-cat directory
docker-compose -f docker-compose.test.yml up -d

Wait ~30 seconds for services to start.

3. Configure and Test

# Run setup script (configures LLM, uploads knowledge base)
python3 test_setup.py

This will:

  • Wait for Cat to be ready
  • Configure Cat to use llama-swap
  • Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
  • Run test queries

4. Run Benchmarks

# Comprehensive performance benchmark
python3 benchmark_cat.py

This tests:

  • Simple greetings (low complexity)
  • Factual queries (medium complexity)
  • Memory recall (high complexity)
  • Voice chat simulation (rapid-fire queries)

5. Compare with Current System

# Side-by-side comparison
python3 compare_systems.py

Compares latency between:

  • 🐱 Cheshire Cat (RAG-based context)
  • 📦 Current system (full context loading)

🔍 What to Look For

Good Signs (Proceed with Integration)

  • Mean latency < 1500ms
  • P95 latency < 2000ms
  • Consistent performance across query types
  • RAG retrieves relevant context accurately

⚠️ Warning Signs (Reconsider)

  • Mean latency > 2000ms
  • High variance (large stdev)
  • RAG misses important context
  • Frequent errors or timeouts

Stop Signs (Don't Use)

  • Mean latency > 3000ms
  • P95 latency > 5000ms
  • RAG retrieval quality is poor
  • System crashes or hangs

📊 Understanding the Results

Latency Metrics

  • Mean: Average response time
  • Median: Middle value (less affected by outliers)
  • P95: 95% of queries are faster than this
  • P99: 99% of queries are faster than this

Voice Chat Target

For real-time voice chat:

  • Target: < 2000ms total latency
  • Acceptable: 1000-1500ms mean
  • Borderline: 1500-2000ms mean
  • Too slow: > 2000ms mean

FX-6100 Considerations

Your CPU may add overhead:

  • Embedding generation: ~600ms
  • Vector search: ~100-200ms
  • Total Cat overhead: ~800ms

With GPU embeddings, this drops to ~250ms.

🛠️ Troubleshooting

Cat won't start

# Check logs
docker logs miku_cheshire_cat_test

# Check if ports are in use
sudo netstat -tlnp | grep 1865

Can't connect to llama-swap

The compose file tries to connect via:

  1. External network: miku-discord_default
  2. Host network: host.docker.internal

If both fail, check llama-swap URL in test_setup.py and adjust.

Embeddings are slow

Try GPU acceleration in docker-compose.test.yml (requires spare VRAM).

Knowledge upload fails

Upload files manually via admin panel:

🔗 Useful Endpoints

📝 Decision Criteria

After running benchmarks, consider:

Metric Target Your Result
Mean latency < 1500ms _____ ms
P95 latency < 2000ms _____ ms
Success rate > 95% _____ %
RAG accuracy Good _____

Decision:

  • All targets met → Integrate with bot
  • ⚠️ Some targets met → Try GPU embeddings or hybrid approach
  • Targets not met → Stick with current system

🧹 Cleanup

# Stop services
docker-compose -f docker-compose.test.yml down

# Remove volumes (deletes all data)
docker-compose -f docker-compose.test.yml down -v

Remember: This is a test environment. Don't integrate with production bot until you're confident in the results!