Fix vision pipeline: route images through Cat, pass user question to vision model

- Fix silent None return in analyze_image_with_vision exception handler
- Add None/empty guards after vision analysis in bot.py (image, video, GIF, Tenor)
- Route all image/video/GIF responses through Cheshire Cat pipeline (was
  calling query_llama directly), enabling episodic memory storage for media
  interactions and correct Last Prompt display in Web UI
- Add media_type parameter to cat_adapter.query() and forward as
  discord_media_type in WebSocket payload
- Update discord_bridge plugin to read media_type from payload and inject
  MEDIA NOTE into system prefix in before_agent_starts hook
- Add _extract_vision_question() helper to strip Discord mentions and bot-name
  triggers from user message; pass cleaned question to vision model so specific
  questions (e.g. 'what is the person wearing?') go directly to the vision model
  instead of the generic 'Describe this image in detail.' fallback
- Pass user_prompt to all analyze_image_with_qwen / analyze_video_with_vision
  call sites in bot.py (image, video, GIF, Tenor, embed paths)
- Fix autonomous reaction loops skipping messages that @mention the bot or have
  media attachments in DMs, preventing duplicate vision model calls for images
  already being processed by the main message handler
- Increase vision max_tokens: images 300->800, video/GIF 400->1000 (no VRAM
  impact; KV cache is pre-allocated at model load time)
This commit is contained in:
2026-03-05 21:59:27 +02:00
parent ae1e0aa144
commit d5b9964ce7
5 changed files with 144 additions and 20 deletions

View File

@@ -107,6 +107,7 @@ class CatAdapter:
author_name: Optional[str] = None,
mood: Optional[str] = None,
response_type: str = "dm_response",
media_type: Optional[str] = None,
) -> Optional[tuple]:
"""
Send a message through the Cat pipeline via WebSocket and get a response.
@@ -123,6 +124,7 @@ class CatAdapter:
author_name: Display name of the user
mood: Current mood name (passed as metadata for Cat hooks)
response_type: Type of response context
media_type: Type of media attachment ("image", "video", "gif", "tenor_gif")
Returns:
Tuple of (response_text, full_prompt) on success, or None if Cat
@@ -156,6 +158,9 @@ class CatAdapter:
payload["discord_response_type"] = response_type
# Pass evil mode flag so discord_bridge stores it in working_memory
payload["discord_evil_mode"] = globals.EVIL_MODE
# Pass media type so discord_bridge can add MEDIA NOTE to the prompt
if media_type:
payload["discord_media_type"] = media_type
try:
# Build WebSocket URL from HTTP base URL