Files
miku-discord/cheshire-cat/SETUP_COMPLETE.md
koko210Serve ae1e0aa144 add: cheshire-cat configuration, tooling, tests, and documentation
Configuration:
- .env.example, .gitignore, compose.yml (main docker compose)
- docker-compose-amd.yml (ROCm), docker-compose-macos.yml
- start.sh, stop.sh convenience scripts
- LICENSE (Apache 2.0, from upstream Cheshire Cat)

Memory management utilities:
- analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py
- check_memories.py, extract_declarative_facts.py, store_declarative_facts.py
- compare_systems.py (system comparison tool)
- benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py

Test suite:
- quick_test.py, test_setup.py, test_setup_simple.py
- test_consolidation_direct.py, test_declarative_recall.py, test_recall.py
- test_end_to_end.py, test_full_pipeline.py
- test_phase2.py, test_phase2_comprehensive.py

Documentation:
- README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md
- PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md
- POST_OPTIMIZATION_ANALYSIS.md
2026-03-04 00:51:14 +02:00

5.0 KiB

🎉 Cheshire Cat Test Environment Setup Complete!

📦 What Was Created

A complete standalone testing environment for evaluating Cheshire Cat AI as a memory system for Miku Bot.

Files Created:

  1. docker-compose.test.yml - Docker services configuration

    • Cheshire Cat Core (connected to llama-swap)
    • Qdrant vector database
    • Connected to your existing bot network
  2. .env - Environment configuration

    • Core settings
    • Qdrant settings
    • Debug mode enabled
  3. test_setup.py - Automated setup script

    • Configures Cat to use llama-swap
    • Uploads Miku knowledge base
    • Runs test queries
  4. benchmark_cat.py - Comprehensive performance testing

    • Tests various query types
    • Measures latency statistics
    • Voice chat simulation
    • Generates detailed reports
  5. compare_systems.py - Side-by-side comparison

    • Compares Cat vs current system
    • Direct performance comparison
    • Latency analysis
  6. start.sh - Quick start script

  7. stop.sh - Quick stop script

  8. TEST_README.md - Full documentation

🚀 Next Steps

Step 1: Start Services

./start.sh

Or manually:

docker-compose -f docker-compose.test.yml up -d

Step 2: Configure and Upload Knowledge

python3 test_setup.py

This will:

  • Wait for Cat to be ready
  • Configure it to use your llama-swap
  • Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
  • Run initial test queries

Step 3: Run Benchmarks

python3 benchmark_cat.py

Expected runtime: ~10-15 minutes

Look for:

  • Mean latency < 1500ms = Good for voice chat
  • P95 latency < 2000ms = Acceptable
  • Success rate > 95% = Reliable

Step 4: Compare Systems

python3 compare_systems.py

This compares Cat directly against your current query_llama() system.

Step 5: Analyze Results

Review the output to decide:

Proceed with integration if:

  • Latency is acceptable (< 1500ms mean)
  • RAG retrieval is accurate
  • Performance is consistent

⚠️ Try optimizations if:

  • Latency is borderline (1500-2000ms)
  • Consider GPU embeddings
  • Try hybrid approach

Stick with current system if:

  • Latency is too high (> 2000ms)
  • RAG quality is poor
  • Too many errors

🔍 Monitoring

Check Service Status

docker ps | grep miku

View Logs

docker logs miku_cheshire_cat_test -f
docker logs miku_qdrant_test -f

Access Interfaces

📊 Key Metrics to Watch

From FX-6100 Analysis:

Expected Cat overhead on your CPU:

  • Embedding generation: ~600ms (CPU-based)
  • Vector search: ~100-200ms
  • Total overhead: ~800ms

With GPU embeddings (if spare VRAM):

  • Total overhead: ~250ms (much better!)

Voice Chat Viability

Your current system: ~500-1500ms Target with Cat: < 1500ms mean latency

If Cat adds ~800ms overhead:

  • Simple queries: 500ms + 800ms = 1300ms OK
  • Complex queries: 1500ms + 800ms = 2300ms ⚠️ Borderline

GPU embeddings would bring this to acceptable range.

🛠️ Troubleshooting

Can't connect to llama-swap?

Edit test_setup.py line 10:

# Try one of these:
LLAMA_SWAP_URL = "http://llama-swap:8080/v1"  # Docker network
LLAMA_SWAP_URL = "http://host.docker.internal:8080/v1"  # Host access
LLAMA_SWAP_URL = "http://YOUR_IP:8080/v1"  # Direct IP

Embeddings too slow?

Try GPU acceleration:

  1. Edit docker-compose.test.yml to add GPU support
  2. Configure embedder to use CUDA in test_setup.py

Knowledge upload fails?

Upload manually:

🧹 Cleanup

Stop services (keep data):

./stop.sh

Stop and remove all data:

docker-compose -f docker-compose.test.yml down -v

📈 Expected Results

Based on your FX-6100 CPU:

Pessimistic (CPU embeddings):

  • Mean latency: 1600-2200ms
  • Suitable for text chat:
  • Suitable for voice chat: ⚠️ Borderline

Optimistic (GPU embeddings):

  • Mean latency: 900-1400ms
  • Suitable for text chat:
  • Suitable for voice chat:

🎯 Decision Matrix

After benchmarking:

Scenario Action
Mean < 1500ms, RAG accurate Integrate fully
Mean 1500-2000ms ⚠️ Try GPU embeddings
Mean > 2000ms ⚠️ Hybrid approach only
Mean > 3000ms Don't use

📚 Documentation


Summary

You now have a complete, isolated testing environment to:

  1. Measure real performance on your FX-6100
  2. Compare against your current system
  3. Test RAG accuracy with Miku's knowledge
  4. Simulate voice chat workloads
  5. Make a data-driven decision

Ready to test? Run: ./start.sh

Good luck! 🚀