Commit Graph

3 Commits

Author SHA1 Message Date
b9d1f67d70 llama-swap-rocm now uses official image and adjusted accordingly 2026-02-07 23:43:01 +02:00
11b90ebb46 fix: Phase 3 bug fixes - memory APIs, username visibility, web UI layout, Docker
**Critical Bug Fixes:**

1. Per-user memory isolation bug
   - Changed CatAdapter from HTTP POST to WebSocket /ws/{user_id}
   - User_id now comes from URL path parameter (true per-user isolation)
   - Verified: Different users can't see each other's memories

2. Memory API 405 errors
   - Replaced non-existent Cat endpoint calls with Qdrant direct queries
   - get_memory_points(): Now uses POST /collections/{collection}/points/scroll
   - delete_memory_point(): Now uses POST /collections/{collection}/points/delete

3. Memory stats showing null counts
   - Reimplemented get_memory_stats() to query Qdrant directly
   - Now returns accurate counts: episodic: 20, declarative: 6, procedural: 4

4. Miku couldn't see usernames
   - Modified discord_bridge before_cat_reads_message hook
   - Prepends [Username says:] to every message text
   - LLM now knows who is texting: [Alice says:] Hello Miku!

5. Web UI Memory tab layout
   - Tab9 was positioned outside .tab-container div (showed to the right)
   - Moved tab9 HTML inside container, before closing divs
   - Memory tab now displays below tab buttons like other tabs

**Code Changes:**

bot/utils/cat_client.py:
- Line 25: Logger name changed to 'llm' (available component)
- get_memory_stats() (lines 256-285): Query Qdrant directly via HTTP GET
- get_memory_points() (lines 275-310): Use Qdrant POST /points/scroll
- delete_memory_point() (lines 350-370): Use Qdrant POST /points/delete

cat-plugins/discord_bridge/discord_bridge.py:
- Fixed .pop() → .get() (UserMessage is Pydantic BaseModelDict)
- Added before_cat_reads_message logic to prepend [Username says:]
- Message format: [Alice says:] message content

Dockerfile.llamaswap-rocm:
- Lines 37-44: Added conditional check for UI directory
- if [ -d ui ] before npm install && npm run build
- Fixes build failure when llama-swap UI dir doesn't exist

bot/static/index.html:
- Moved tab9 from lines 1554-1688 (outside container)
- To position before container closing divs (now inside)
- Memory tab button at line 673: 🧠 Memories

**Testing & Verification:**
 Per-user isolation verified (Docker exec test)
 Memory stats showing real counts (curl test)
 Memory API working (facts/episodic loading)
 Web UI layout fixed (tab displays correctly)
 All 5 services running (llama-swap, llama-swap-amd, qdrant, cat, bot)
 Username prepending working (message context for LLM)

**Result:** All Phase 3 critical bugs fixed and verified working.
2026-02-07 23:27:15 +02:00
1fc3d74a5b Add dual GPU support with web UI selector
Features:
- Built custom ROCm container for AMD RX 6800 GPU
- Added GPU selection toggle in web UI (NVIDIA/AMD)
- Unified model names across both GPUs for seamless switching
- Vision model always uses NVIDIA GPU (optimal performance)
- Text models (llama3.1, darkidol) can use either GPU
- Added /gpu-status and /gpu-select API endpoints
- Implemented GPU state persistence in memory/gpu_state.json

Technical details:
- Multi-stage Dockerfile.llamaswap-rocm with ROCm 6.2.4
- llama.cpp compiled with GGML_HIP=ON for gfx1030 (RX 6800)
- Proper GPU permissions without root (groups 187/989)
- AMD container on port 8091, NVIDIA on port 8090
- Updated bot/utils/llm.py with get_current_gpu_url() and get_vision_gpu_url()
- Modified bot/utils/image_handling.py to always use NVIDIA for vision
- Enhanced web UI with GPU selector button (blue=NVIDIA, red=AMD)

Files modified:
- docker-compose.yml (added llama-swap-amd service)
- bot/globals.py (added LLAMA_AMD_URL)
- bot/api.py (added GPU selection endpoints and helper function)
- bot/utils/llm.py (GPU routing for text models)
- bot/utils/image_handling.py (GPU routing for vision models)
- bot/static/index.html (GPU selector UI)
- llama-swap-rocm-config.yaml (unified model names)

New files:
- Dockerfile.llamaswap-rocm
- bot/memory/gpu_state.json
- bot/utils/gpu_router.py (load balancing utility)
- setup-dual-gpu.sh (setup verification script)
- DUAL_GPU_*.md (documentation files)
2026-01-09 00:03:59 +02:00