Files
miku-discord/readmes/VISION_FIX_SUMMARY.md

4.4 KiB

Vision Model Dual-GPU Fix - Summary

Problem

Vision model (MiniCPM-V) wasn't working when AMD GPU was set as the primary GPU for text inference.

Root Cause

While get_vision_gpu_url() was correctly hardcoded to always use NVIDIA, there was:

  1. No health checking before attempting requests
  2. No detailed error logging to understand failures
  3. No timeout specification (could hang indefinitely)
  4. No verification that NVIDIA GPU was actually responsive

When AMD became primary, if NVIDIA GPU had issues, vision requests would fail silently with poor error reporting.

Solution Implemented

1. Enhanced GPU Routing (bot/utils/llm.py)

def get_vision_gpu_url():
    """Always use NVIDIA for vision, even when AMD is primary for text"""
    # Added clear documentation
    # Added debug logging when switching occurs
    # Returns NVIDIA URL unconditionally

2. Added Health Check (bot/utils/llm.py)

async def check_vision_endpoint_health():
    """Verify NVIDIA vision endpoint is responsive before use"""
    # Pings http://llama-swap:8080/health
    # Returns (is_healthy: bool, error_message: Optional[str])
    # Logs status for debugging

3. Improved Image Analysis (bot/utils/image_handling.py)

Before request:

  • Health check
  • Detailed logging of endpoint, model, image size

During request:

  • 60-second timeout (was unlimited)
  • Endpoint URL in error messages

After error:

  • Full exception traceback in logs
  • Endpoint information in error response

4. Improved Video Analysis (bot/utils/image_handling.py)

Before request:

  • Health check
  • Logging of media type, frame count

During request:

  • 120-second timeout (longer for multiple frames)
  • Endpoint URL in error messages

After error:

  • Full exception traceback in logs
  • Endpoint information in error response

Key Changes

File Function Changes
bot/utils/llm.py get_vision_gpu_url() Added documentation, debug logging
bot/utils/llm.py check_vision_endpoint_health() NEW: Health check function
bot/utils/image_handling.py analyze_image_with_vision() Added health check, timeouts, detailed logging
bot/utils/image_handling.py analyze_video_with_vision() Added health check, timeouts, detailed logging

Testing

Quick test to verify vision model works when AMD is primary:

# 1. Check GPU state is AMD
cat bot/memory/gpu_state.json
# Should show: {"current_gpu": "amd", ...}

# 2. Send image to Discord
# (bot should analyze with vision model)

# 3. Check logs for success
docker compose logs miku-bot 2>&1 | grep -i "vision"
# Should see: "Vision analysis completed successfully"

Expected Log Output

When Working Correctly

[INFO] Primary GPU is AMD for text, but using NVIDIA for vision model
[INFO] Vision endpoint (http://llama-swap:8080) health check: OK
[INFO] Sending vision request to http://llama-swap:8080 using model: vision
[INFO] Vision analysis completed successfully

If NVIDIA Vision Endpoint Down

[WARNING] Vision endpoint (http://llama-swap:8080) health check failed: status 503
[WARNING] Vision endpoint unhealthy: Status 503
[ERROR] Vision service currently unavailable: Status 503

If Network Timeout

[ERROR] Vision endpoint (http://llama-swap:8080) health check: timeout
[WARNING] Vision endpoint unhealthy: Endpoint timeout
[ERROR] Vision service currently unavailable: Endpoint timeout

Architecture Reminder

  • NVIDIA GPU (port 8090): Vision + text models
  • AMD GPU (port 8091): Text models ONLY
  • When AMD is primary: Text goes to AMD, vision goes to NVIDIA
  • When NVIDIA is primary: Everything goes to NVIDIA

Files Modified

  1. /home/koko210Serve/docker/miku-discord/bot/utils/llm.py
  2. /home/koko210Serve/docker/miku-discord/bot/utils/image_handling.py

Files Created

  1. /home/koko210Serve/docker/miku-discord/VISION_MODEL_DEBUG.md - Complete debugging guide

Deployment Notes

No changes needed to:

  • Docker containers
  • Environment variables
  • Configuration files
  • Database or state files

Just update the code and restart the bot:

docker compose restart miku-bot

Success Criteria

Images are analyzed when AMD GPU is primary Detailed error messages if vision endpoint fails Health check prevents hanging requests Logs show NVIDIA is correctly used for vision No performance degradation compared to before