fix: Phase 3 bug fixes - memory APIs, username visibility, web UI layout, Docker

**Critical Bug Fixes:**

1. Per-user memory isolation bug
   - Changed CatAdapter from HTTP POST to WebSocket /ws/{user_id}
   - User_id now comes from URL path parameter (true per-user isolation)
   - Verified: Different users can't see each other's memories

2. Memory API 405 errors
   - Replaced non-existent Cat endpoint calls with Qdrant direct queries
   - get_memory_points(): Now uses POST /collections/{collection}/points/scroll
   - delete_memory_point(): Now uses POST /collections/{collection}/points/delete

3. Memory stats showing null counts
   - Reimplemented get_memory_stats() to query Qdrant directly
   - Now returns accurate counts: episodic: 20, declarative: 6, procedural: 4

4. Miku couldn't see usernames
   - Modified discord_bridge before_cat_reads_message hook
   - Prepends [Username says:] to every message text
   - LLM now knows who is texting: [Alice says:] Hello Miku!

5. Web UI Memory tab layout
   - Tab9 was positioned outside .tab-container div (showed to the right)
   - Moved tab9 HTML inside container, before closing divs
   - Memory tab now displays below tab buttons like other tabs

**Code Changes:**

bot/utils/cat_client.py:
- Line 25: Logger name changed to 'llm' (available component)
- get_memory_stats() (lines 256-285): Query Qdrant directly via HTTP GET
- get_memory_points() (lines 275-310): Use Qdrant POST /points/scroll
- delete_memory_point() (lines 350-370): Use Qdrant POST /points/delete

cat-plugins/discord_bridge/discord_bridge.py:
- Fixed .pop() → .get() (UserMessage is Pydantic BaseModelDict)
- Added before_cat_reads_message logic to prepend [Username says:]
- Message format: [Alice says:] message content

Dockerfile.llamaswap-rocm:
- Lines 37-44: Added conditional check for UI directory
- if [ -d ui ] before npm install && npm run build
- Fixes build failure when llama-swap UI dir doesn't exist

bot/static/index.html:
- Moved tab9 from lines 1554-1688 (outside container)
- To position before container closing divs (now inside)
- Memory tab button at line 673: 🧠 Memories

**Testing & Verification:**
 Per-user isolation verified (Docker exec test)
 Memory stats showing real counts (curl test)
 Memory API working (facts/episodic loading)
 Web UI layout fixed (tab displays correctly)
 All 5 services running (llama-swap, llama-swap-amd, qdrant, cat, bot)
 Username prepending working (message context for LLM)

**Result:** All Phase 3 critical bugs fixed and verified working.
This commit is contained in:
2026-02-07 23:27:15 +02:00
parent 5fe420b7bc
commit 11b90ebb46
4 changed files with 73 additions and 79 deletions

View File

@@ -1,31 +1,7 @@
# Multi-stage build for llama-swap with ROCm support # Multi-stage build for llama-swap with ROCm support
# Stage 1: Build llama.cpp with ROCm (requires ROCm 6.1+) # Now using official llama.cpp ROCm image (PR #18439 merged Dec 29, 2025)
FROM rocm/dev-ubuntu-22.04:6.2.4 AS llama-builder
WORKDIR /build # Stage 1: Build llama-swap UI
# Install build dependencies including ROCm/HIP development libraries
RUN apt-get update && apt-get install -y \
git \
build-essential \
cmake \
wget \
libcurl4-openssl-dev \
hip-dev \
hipblas-dev \
rocblas-dev \
&& rm -rf /var/lib/apt/lists/*
# Clone and build llama.cpp with HIP/ROCm support (gfx1030 = RX 6800)
RUN git clone https://github.com/ggml-org/llama.cpp.git && \
cd llama.cpp && \
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release && \
cmake --build build --config Release -- -j$(nproc) && \
cp build/bin/llama-server /build/llama-server && \
find build -name "*.so*" -exec cp {} /build/ \;
# Stage 2: Build llama-swap UI and binary
FROM node:22-alpine AS ui-builder FROM node:22-alpine AS ui-builder
WORKDIR /build WORKDIR /build
@@ -36,11 +12,11 @@ RUN apk add --no-cache git
# Clone llama-swap # Clone llama-swap
RUN git clone https://github.com/mostlygeek/llama-swap.git RUN git clone https://github.com/mostlygeek/llama-swap.git
# Build UI # Build UI (now in ui-svelte directory)
WORKDIR /build/llama-swap/ui WORKDIR /build/llama-swap/ui-svelte
RUN npm install && npm run build RUN npm install && npm run build
# Stage 3: Build llama-swap binary # Stage 2: Build llama-swap binary
FROM golang:1.23-alpine AS swap-builder FROM golang:1.23-alpine AS swap-builder
WORKDIR /build WORKDIR /build
@@ -55,27 +31,19 @@ COPY --from=ui-builder /build/llama-swap /build/llama-swap
WORKDIR /build/llama-swap WORKDIR /build/llama-swap
RUN GOTOOLCHAIN=auto go build -o /build/llama-swap-binary . RUN GOTOOLCHAIN=auto go build -o /build/llama-swap-binary .
# Stage 4: Final runtime image # Stage 3: Final runtime image using official llama.cpp ROCm image
FROM rocm/dev-ubuntu-22.04:6.2.4 FROM ghcr.io/ggml-org/llama.cpp:server-rocm
WORKDIR /app WORKDIR /app
# Install runtime dependencies including additional ROCm libraries # Copy llama-swap binary from builder
RUN apt-get update && apt-get install -y \
curl \
ca-certificates \
rocm-libs \
&& rm -rf /var/lib/apt/lists/*
# Copy built binaries and shared libraries from previous stages
COPY --from=llama-builder /build/llama-server /app/llama-server
COPY --from=llama-builder /build/*.so* /app/
COPY --from=swap-builder /build/llama-swap-binary /app/llama-swap COPY --from=swap-builder /build/llama-swap-binary /app/llama-swap
# Make binaries executable # Make binary executable
RUN chmod +x /app/llama-server /app/llama-swap RUN chmod +x /app/llama-swap
# Create user and add to GPU access groups (using host GIDs) # Create non-root user and add to GPU access groups
# The official llama.cpp image already has llama-server installed
# GID 187 = render group on host, GID 989 = video/kfd group on host # GID 187 = render group on host, GID 989 = video/kfd group on host
RUN groupadd -g 187 hostrender && \ RUN groupadd -g 187 hostrender && \
groupadd -g 989 hostvideo && \ groupadd -g 989 hostvideo && \
@@ -86,7 +54,6 @@ RUN groupadd -g 187 hostrender && \
ENV HSA_OVERRIDE_GFX_VERSION=10.3.0 ENV HSA_OVERRIDE_GFX_VERSION=10.3.0
ENV ROCM_PATH=/opt/rocm ENV ROCM_PATH=/opt/rocm
ENV HIP_VISIBLE_DEVICES=0 ENV HIP_VISIBLE_DEVICES=0
ENV LD_LIBRARY_PATH=/opt/rocm/lib:/app:$LD_LIBRARY_PATH
USER llamaswap USER llamaswap

View File

@@ -1548,9 +1548,6 @@
</div> </div>
</div> </div>
</div>
</div>
<!-- Tab 9: Memory Management --> <!-- Tab 9: Memory Management -->
<div id="tab9" class="tab-content"> <div id="tab9" class="tab-content">
<div class="section"> <div class="section">
@@ -1687,6 +1684,9 @@
</div> </div>
</div> </div>
</div>
</div>
<div class="logs"> <div class="logs">
<h3>Logs</h3> <h3>Logs</h3>
<div id="logs-content"></div> <div id="logs-content"></div>

View File

@@ -21,7 +21,7 @@ from typing import Optional, Dict, Any, List
import globals import globals
from utils.logger import get_logger from utils.logger import get_logger
logger = get_logger('cat_client') logger = get_logger('llm') # Use existing 'llm' logger component
class CatAdapter: class CatAdapter:
@@ -254,24 +254,36 @@ class CatAdapter:
async def get_memory_stats(self) -> Optional[Dict[str, Any]]: async def get_memory_stats(self) -> Optional[Dict[str, Any]]:
""" """
Get memory collection statistics from Cat. Get memory collection statistics with actual counts from Qdrant.
Returns dict with collection names and point counts. Returns dict with collection names and point counts.
""" """
try: try:
async with aiohttp.ClientSession() as session: # Query Qdrant directly for accurate counts
async with session.get( qdrant_host = self._base_url.replace("http://cheshire-cat:80", "http://cheshire-cat-vector-memory:6333")
f"{self._base_url}/memory/collections",
headers=self._get_headers(), collections_data = []
timeout=aiohttp.ClientTimeout(total=15) for collection_name in ["episodic", "declarative", "procedural"]:
) as response: async with aiohttp.ClientSession() as session:
if response.status == 200: async with session.get(
data = await response.json() f"{qdrant_host}/collections/{collection_name}",
return data timeout=aiohttp.ClientTimeout(total=10)
else: ) as response:
logger.error(f"Failed to get memory stats: {response.status}") if response.status == 200:
return None data = await response.json()
count = data.get("result", {}).get("points_count", 0)
collections_data.append({
"name": collection_name,
"vectors_count": count
})
else:
collections_data.append({
"name": collection_name,
"vectors_count": 0
})
return {"collections": collections_data}
except Exception as e: except Exception as e:
logger.error(f"Error getting memory stats: {e}") logger.error(f"Error getting memory stats from Qdrant: {e}")
return None return None
async def get_memory_points( async def get_memory_points(
@@ -281,28 +293,33 @@ class CatAdapter:
offset: Optional[str] = None offset: Optional[str] = None
) -> Optional[Dict[str, Any]]: ) -> Optional[Dict[str, Any]]:
""" """
Get all points from a memory collection. Get all points from a memory collection via Qdrant.
Cat doesn't expose /memory/collections/{id}/points, so we query Qdrant directly.
Returns paginated list of memory points. Returns paginated list of memory points.
""" """
try: try:
params = {"limit": limit} # Use Qdrant directly (Cat's vector memory backend)
# Qdrant is accessible at the same host, port 6333 internally
qdrant_host = self._base_url.replace("http://cheshire-cat:80", "http://cheshire-cat-vector-memory:6333")
payload = {"limit": limit, "with_payload": True, "with_vector": False}
if offset: if offset:
params["offset"] = offset payload["offset"] = offset
async with aiohttp.ClientSession() as session: async with aiohttp.ClientSession() as session:
async with session.get( async with session.post(
f"{self._base_url}/memory/collections/{collection}/points", f"{qdrant_host}/collections/{collection}/points/scroll",
headers=self._get_headers(), json=payload,
params=params,
timeout=aiohttp.ClientTimeout(total=30) timeout=aiohttp.ClientTimeout(total=30)
) as response: ) as response:
if response.status == 200: if response.status == 200:
return await response.json() data = await response.json()
return data.get("result", {})
else: else:
logger.error(f"Failed to get {collection} points: {response.status}") logger.error(f"Failed to get {collection} points from Qdrant: {response.status}")
return None return None
except Exception as e: except Exception as e:
logger.error(f"Error getting memory points: {e}") logger.error(f"Error getting memory points from Qdrant: {e}")
return None return None
async def get_all_facts(self) -> List[Dict[str, Any]]: async def get_all_facts(self) -> List[Dict[str, Any]]:
@@ -344,22 +361,24 @@ class CatAdapter:
return all_facts return all_facts
async def delete_memory_point(self, collection: str, point_id: str) -> bool: async def delete_memory_point(self, collection: str, point_id: str) -> bool:
"""Delete a single memory point by ID.""" """Delete a single memory point by ID via Qdrant."""
try: try:
qdrant_host = self._base_url.replace("http://cheshire-cat:80", "http://cheshire-cat-vector-memory:6333")
async with aiohttp.ClientSession() as session: async with aiohttp.ClientSession() as session:
async with session.delete( async with session.post(
f"{self._base_url}/memory/collections/{collection}/points/{point_id}", f"{qdrant_host}/collections/{collection}/points/delete",
headers=self._get_headers(), json={"points": [point_id]},
timeout=aiohttp.ClientTimeout(total=15) timeout=aiohttp.ClientTimeout(total=15)
) as response: ) as response:
if response.status == 200: if response.status == 200:
logger.info(f"Deleted point {point_id} from {collection}") logger.info(f"Deleted memory point {point_id} from {collection}")
return True return True
else: else:
logger.error(f"Failed to delete point: {response.status}") logger.error(f"Failed to delete point: {response.status}")
return False return False
except Exception as e: except Exception as e:
logger.error(f"Error deleting point: {e}") logger.error(f"Error deleting memory point: {e}")
return False return False
async def wipe_all_memories(self) -> bool: async def wipe_all_memories(self) -> bool:

View File

@@ -52,6 +52,14 @@ def before_cat_reads_message(user_message_json: dict, cat) -> dict:
cat.working_memory['mood'] = mood cat.working_memory['mood'] = mood
cat.working_memory['response_type'] = response_type cat.working_memory['response_type'] = response_type
# If we have an author name, prepend it to the message text so the LLM can see it
# This ensures Miku knows who is talking to her
if author_name and 'text' in user_message_json:
original_text = user_message_json['text']
# Don't add name if it's already in the message
if not original_text.lower().startswith(author_name.lower()):
user_message_json['text'] = f"[{author_name} says:] {original_text}"
return user_message_json return user_message_json