Files

koko210Serve ae1e0aa144 add: cheshire-cat configuration, tooling, tests, and documentation

Configuration:
- .env.example, .gitignore, compose.yml (main docker compose)
- docker-compose-amd.yml (ROCm), docker-compose-macos.yml
- start.sh, stop.sh convenience scripts
- LICENSE (Apache 2.0, from upstream Cheshire Cat)

Memory management utilities:
- analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py
- check_memories.py, extract_declarative_facts.py, store_declarative_facts.py
- compare_systems.py (system comparison tool)
- benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py

Test suite:
- quick_test.py, test_setup.py, test_setup_simple.py
- test_consolidation_direct.py, test_declarative_recall.py, test_recall.py
- test_end_to_end.py, test_full_pipeline.py
- test_phase2.py, test_phase2_comprehensive.py

Documentation:
- README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md
- PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md
- POST_OPTIMIZATION_ANALYSIS.md

2026-03-04 00:51:14 +02:00

5.0 KiB

Raw Blame History

🎉 Cheshire Cat Test Environment Setup Complete!

📦 What Was Created

A complete standalone testing environment for evaluating Cheshire Cat AI as a memory system for Miku Bot.

Files Created:

docker-compose.test.yml - Docker services configuration
- Cheshire Cat Core (connected to llama-swap)
- Qdrant vector database
- Connected to your existing bot network
.env - Environment configuration
- Core settings
- Qdrant settings
- Debug mode enabled
test_setup.py - Automated setup script
- Configures Cat to use llama-swap
- Uploads Miku knowledge base
- Runs test queries
benchmark_cat.py - Comprehensive performance testing
- Tests various query types
- Measures latency statistics
- Voice chat simulation
- Generates detailed reports
compare_systems.py - Side-by-side comparison
- Compares Cat vs current system
- Direct performance comparison
- Latency analysis
start.sh - Quick start script
stop.sh - Quick stop script
TEST_README.md - Full documentation

🚀 Next Steps

Step 1: Start Services

./start.sh

Or manually:

docker-compose -f docker-compose.test.yml up -d

Step 2: Configure and Upload Knowledge

python3 test_setup.py

This will:

Wait for Cat to be ready
Configure it to use your llama-swap
Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
Run initial test queries

Step 3: Run Benchmarks

python3 benchmark_cat.py

Expected runtime: ~10-15 minutes

Look for:

Mean latency < 1500ms = Good for voice chat
P95 latency < 2000ms = Acceptable
Success rate > 95% = Reliable

Step 4: Compare Systems

python3 compare_systems.py

This compares Cat directly against your current query_llama() system.

Step 5: Analyze Results

Review the output to decide:

✅ Proceed with integration if:

Latency is acceptable (< 1500ms mean)
RAG retrieval is accurate
Performance is consistent

⚠️ Try optimizations if:

Latency is borderline (1500-2000ms)
Consider GPU embeddings
Try hybrid approach

❌ Stick with current system if:

Latency is too high (> 2000ms)
RAG quality is poor
Too many errors

🔍 Monitoring

Check Service Status

docker ps | grep miku

View Logs

docker logs miku_cheshire_cat_test -f
docker logs miku_qdrant_test -f

Access Interfaces

Admin Panel: http://localhost:1865/admin
API Docs: http://localhost:1865/docs
Qdrant: http://localhost:6333/dashboard

📊 Key Metrics to Watch

From FX-6100 Analysis:

Expected Cat overhead on your CPU:

Embedding generation: ~600ms (CPU-based)
Vector search: ~100-200ms
Total overhead: ~800ms

With GPU embeddings (if spare VRAM):

Total overhead: ~250ms (much better!)

Voice Chat Viability

Your current system: ~500-1500ms Target with Cat: < 1500ms mean latency

If Cat adds ~800ms overhead:

Simple queries: 500ms + 800ms = 1300ms ✅ OK
Complex queries: 1500ms + 800ms = 2300ms ⚠️ Borderline

GPU embeddings would bring this to acceptable range.

🛠️ Troubleshooting

Can't connect to llama-swap?

Edit test_setup.py line 10:

# Try one of these:
LLAMA_SWAP_URL = "http://llama-swap:8080/v1"  # Docker network
LLAMA_SWAP_URL = "http://host.docker.internal:8080/v1"  # Host access
LLAMA_SWAP_URL = "http://YOUR_IP:8080/v1"  # Direct IP

Embeddings too slow?

Try GPU acceleration:

Edit docker-compose.test.yml to add GPU support
Configure embedder to use CUDA in test_setup.py

Knowledge upload fails?

Upload manually:

Go to http://localhost:1865/admin
Click "Rabbit Hole" tab
Drag and drop: miku_lore.txt, miku_prompt.txt, miku_lyrics.txt

🧹 Cleanup

Stop services (keep data):

./stop.sh

Stop and remove all data:

docker-compose -f docker-compose.test.yml down -v

📈 Expected Results

Based on your FX-6100 CPU:

Pessimistic (CPU embeddings):

Mean latency: 1600-2200ms
Suitable for text chat: ✅
Suitable for voice chat: ⚠️ Borderline

Optimistic (GPU embeddings):

Mean latency: 900-1400ms
Suitable for text chat: ✅
Suitable for voice chat: ✅

🎯 Decision Matrix

After benchmarking:

Scenario	Action
Mean < 1500ms, RAG accurate	✅ Integrate fully
Mean 1500-2000ms	⚠️ Try GPU embeddings
Mean > 2000ms	⚠️ Hybrid approach only
Mean > 3000ms	❌ Don't use

📚 Documentation

Full guide: TEST_README.md
Original local-cat docs: README.md
Cheshire Cat docs: https://cheshire-cat-ai.github.io/docs/

✨ Summary

You now have a complete, isolated testing environment to:

✅ Measure real performance on your FX-6100
✅ Compare against your current system
✅ Test RAG accuracy with Miku's knowledge
✅ Simulate voice chat workloads
✅ Make a data-driven decision

Ready to test? Run: ./start.sh

Good luck! 🚀

5.0 KiB Raw Blame History