Files

koko210Serve ae1e0aa144 add: cheshire-cat configuration, tooling, tests, and documentation

Configuration:
- .env.example, .gitignore, compose.yml (main docker compose)
- docker-compose-amd.yml (ROCm), docker-compose-macos.yml
- start.sh, stop.sh convenience scripts
- LICENSE (Apache 2.0, from upstream Cheshire Cat)

Memory management utilities:
- analyze_consolidation.py, manual_consolidation.py, verify_consolidation.py
- check_memories.py, extract_declarative_facts.py, store_declarative_facts.py
- compare_systems.py (system comparison tool)
- benchmark_cat.py, streaming_benchmark.py, streaming_benchmark_v2.py

Test suite:
- quick_test.py, test_setup.py, test_setup_simple.py
- test_consolidation_direct.py, test_declarative_recall.py, test_recall.py
- test_end_to_end.py, test_full_pipeline.py
- test_phase2.py, test_phase2_comprehensive.py

Documentation:
- README.md, QUICK_START.txt, TEST_README.md, SETUP_COMPLETE.md
- PHASE2_IMPLEMENTATION_NOTES.md, PHASE2_TEST_RESULTS.md
- POST_OPTIMIZATION_ANALYSIS.md

2026-03-04 00:51:14 +02:00

4.9 KiB

Raw Permalink Blame History

Cheshire Cat Test Environment for Miku Bot

This is a standalone test environment for evaluating Cheshire Cat AI as a potential memory/context system for the Miku Discord bot.

🎯 Goals

Test performance - Measure latency, overhead, and real-time viability
Evaluate memory - Compare RAG-based context retrieval vs full context loading
Benchmark CPU impact - Assess performance on AMD FX-6100
Make informed decision - Data-driven choice on integration

📁 Directory Structure

cheshire-cat/
├── cat/                    # Cat data (created on first run)
│   ├── data/              # Cat's internal data
│   ├── plugins/           # Custom plugins
│   ├── static/            # Static assets
│   └── long_term_memory/  # Qdrant vector storage
├── .env                    # Environment configuration
├── docker-compose.test.yml # Docker setup
├── test_setup.py          # Initial setup script
├── benchmark_cat.py       # Comprehensive benchmarks
├── compare_systems.py     # Compare Cat vs current system
└── TEST_README.md         # This file

🚀 Quick Start

1. Prerequisites

Docker and Docker Compose installed
Miku bot's llama-swap service running
Python 3.8+ with requests library

pip3 install requests

2. Start Cheshire Cat

# From the cheshire-cat directory
docker-compose -f docker-compose.test.yml up -d

Wait ~30 seconds for services to start.

3. Configure and Test

# Run setup script (configures LLM, uploads knowledge base)
python3 test_setup.py

This will:

✅ Wait for Cat to be ready
✅ Configure Cat to use llama-swap
✅ Upload miku_lore.txt, miku_prompt.txt, miku_lyrics.txt
✅ Run test queries

4. Run Benchmarks

# Comprehensive performance benchmark
python3 benchmark_cat.py

This tests:

Simple greetings (low complexity)
Factual queries (medium complexity)
Memory recall (high complexity)
Voice chat simulation (rapid-fire queries)

5. Compare with Current System

# Side-by-side comparison
python3 compare_systems.py

Compares latency between:

🐱 Cheshire Cat (RAG-based context)
📦 Current system (full context loading)

🔍 What to Look For

✅ Good Signs (Proceed with Integration)

Mean latency < 1500ms
P95 latency < 2000ms
Consistent performance across query types
RAG retrieves relevant context accurately

⚠️ Warning Signs (Reconsider)

Mean latency > 2000ms
High variance (large stdev)
RAG misses important context
Frequent errors or timeouts

❌ Stop Signs (Don't Use)

Mean latency > 3000ms
P95 latency > 5000ms
RAG retrieval quality is poor
System crashes or hangs

📊 Understanding the Results

Latency Metrics

Mean: Average response time
Median: Middle value (less affected by outliers)
P95: 95% of queries are faster than this
P99: 99% of queries are faster than this

Voice Chat Target

For real-time voice chat:

Target: < 2000ms total latency
Acceptable: 1000-1500ms mean
Borderline: 1500-2000ms mean
Too slow: > 2000ms mean

FX-6100 Considerations

Your CPU may add overhead:

Embedding generation: ~600ms
Vector search: ~100-200ms
Total Cat overhead: ~800ms

With GPU embeddings, this drops to ~250ms.

🛠️ Troubleshooting

Cat won't start

# Check logs
docker logs miku_cheshire_cat_test

# Check if ports are in use
sudo netstat -tlnp | grep 1865

Can't connect to llama-swap

The compose file tries to connect via:

External network: miku-discord_default
Host network: host.docker.internal

If both fail, check llama-swap URL in test_setup.py and adjust.

Embeddings are slow

Try GPU acceleration in docker-compose.test.yml (requires spare VRAM).

Knowledge upload fails

Upload files manually via admin panel:

http://localhost:1865/admin
Go to "Rabbit Hole" tab
Drag and drop files

🔗 Useful Endpoints

Admin Panel: http://localhost:1865/admin
API Docs: http://localhost:1865/docs
Qdrant Dashboard: http://localhost:6333/dashboard
Health Check: http://localhost:1865/

📝 Decision Criteria

After running benchmarks, consider:

Metric	Target	Your Result
Mean latency	< 1500ms	_____ ms
P95 latency	< 2000ms	_____ ms
Success rate	> 95%	_____ %
RAG accuracy	Good	_____

Decision:

✅ All targets met → Integrate with bot
⚠️ Some targets met → Try GPU embeddings or hybrid approach
❌ Targets not met → Stick with current system

🧹 Cleanup

# Stop services
docker-compose -f docker-compose.test.yml down

# Remove volumes (deletes all data)
docker-compose -f docker-compose.test.yml down -v

Remember: This is a test environment. Don't integrate with production bot until you're confident in the results!

4.9 KiB Raw Permalink Blame History