Commit Graph

9 Commits

Author SHA1 Message Date
14e1a8df51 Phase 3: Unified Cheshire Cat integration with WebSocket-based per-user isolation
Key changes:
- CatAdapter (bot/utils/cat_client.py): WebSocket /ws/{user_id} for chat
  queries instead of HTTP POST (fixes per-user memory isolation when no
  API keys are configured — HTTP defaults all users to user_id='user')
- Memory management API: 8 endpoints for status, stats, facts, episodic
  memories, consolidation trigger, multi-step delete with confirmation
- Web UI: Memory tab (tab9) with collection stats, fact/episodic browser,
  manual consolidation trigger, and 3-step delete flow requiring exact
  confirmation string
- Bot integration: Cat-first response path with query_llama fallback for
  both text and embed responses, server mood detection
- Discord bridge plugin: fixed .pop() to .get() (UserMessage is a Pydantic
  BaseModelDict, not a raw dict), metadata extraction via extra attributes
- Unified docker-compose: Cat + Qdrant services merged into main compose,
  bot depends_on Cat healthcheck
- All plugins (discord_bridge, memory_consolidation, miku_personality)
  consolidated into cat-plugins/ for volume mount
- query_llama deprecated but functional for compatibility
2026-02-07 20:22:03 +02:00
0f1c30f757 Added verbose logging to llama-swap-rocm. Not sure if does anything... 2026-01-27 19:57:04 +02:00
2934efba22 Implemented experimental real production ready voice chat, relegated old flow to voice debug mode. New Web UI panel for Voice Chat. 2026-01-20 23:06:17 +02:00
362108f4b0 Decided on Parakeet ONNX Runtime. Works pretty great. Realtime voice chat possible now. UX lacking. 2026-01-19 00:29:44 +02:00
d1e6b21508 Phase 4 STT pipeline implemented — Silero VAD + faster-whisper — still not working well at all 2026-01-17 03:14:40 +02:00
9943cecdec Phase 2 implemented and tested. Added warmup to pipeline and Miku queues tokens while the pipeline is warming up 2026-01-16 23:37:34 +02:00
353c9c9583 Face Detector container now able to be created, started and stopped from within miku-bot container 2026-01-11 02:01:41 +02:00
1fc3d74a5b Add dual GPU support with web UI selector
Features:
- Built custom ROCm container for AMD RX 6800 GPU
- Added GPU selection toggle in web UI (NVIDIA/AMD)
- Unified model names across both GPUs for seamless switching
- Vision model always uses NVIDIA GPU (optimal performance)
- Text models (llama3.1, darkidol) can use either GPU
- Added /gpu-status and /gpu-select API endpoints
- Implemented GPU state persistence in memory/gpu_state.json

Technical details:
- Multi-stage Dockerfile.llamaswap-rocm with ROCm 6.2.4
- llama.cpp compiled with GGML_HIP=ON for gfx1030 (RX 6800)
- Proper GPU permissions without root (groups 187/989)
- AMD container on port 8091, NVIDIA on port 8090
- Updated bot/utils/llm.py with get_current_gpu_url() and get_vision_gpu_url()
- Modified bot/utils/image_handling.py to always use NVIDIA for vision
- Enhanced web UI with GPU selector button (blue=NVIDIA, red=AMD)

Files modified:
- docker-compose.yml (added llama-swap-amd service)
- bot/globals.py (added LLAMA_AMD_URL)
- bot/api.py (added GPU selection endpoints and helper function)
- bot/utils/llm.py (GPU routing for text models)
- bot/utils/image_handling.py (GPU routing for vision models)
- bot/static/index.html (GPU selector UI)
- llama-swap-rocm-config.yaml (unified model names)

New files:
- Dockerfile.llamaswap-rocm
- bot/memory/gpu_state.json
- bot/utils/gpu_router.py (load balancing utility)
- setup-dual-gpu.sh (setup verification script)
- DUAL_GPU_*.md (documentation files)
2026-01-09 00:03:59 +02:00
8c74ad5260 Initial commit: Miku Discord Bot 2025-12-07 17:15:09 +02:00