Koko210/miku-discord

Fork 0

Files

koko210Serve c58b941587 moved AI generated readmes to readme folder (may delete)

2026-01-27 19:57:48 +02:00

4.4 KiB

Raw Blame History

Vision Model Dual-GPU Fix - Summary

Problem

Vision model (MiniCPM-V) wasn't working when AMD GPU was set as the primary GPU for text inference.

Root Cause

While get_vision_gpu_url() was correctly hardcoded to always use NVIDIA, there was:

No health checking before attempting requests
No detailed error logging to understand failures
No timeout specification (could hang indefinitely)
No verification that NVIDIA GPU was actually responsive

When AMD became primary, if NVIDIA GPU had issues, vision requests would fail silently with poor error reporting.

Solution Implemented

1. Enhanced GPU Routing (`bot/utils/llm.py`)

def get_vision_gpu_url():
    """Always use NVIDIA for vision, even when AMD is primary for text"""
    # Added clear documentation
    # Added debug logging when switching occurs
    # Returns NVIDIA URL unconditionally

2. Added Health Check (`bot/utils/llm.py`)

async def check_vision_endpoint_health():
    """Verify NVIDIA vision endpoint is responsive before use"""
    # Pings http://llama-swap:8080/health
    # Returns (is_healthy: bool, error_message: Optional[str])
    # Logs status for debugging

3. Improved Image Analysis (`bot/utils/image_handling.py`)

Before request:

Health check
Detailed logging of endpoint, model, image size

During request:

60-second timeout (was unlimited)
Endpoint URL in error messages

After error:

Full exception traceback in logs
Endpoint information in error response

4. Improved Video Analysis (`bot/utils/image_handling.py`)

Before request:

Health check
Logging of media type, frame count

During request:

120-second timeout (longer for multiple frames)
Endpoint URL in error messages

After error:

Full exception traceback in logs
Endpoint information in error response

Key Changes

File	Function	Changes
`bot/utils/llm.py`	`get_vision_gpu_url()`	Added documentation, debug logging
`bot/utils/llm.py`	`check_vision_endpoint_health()`	NEW: Health check function
`bot/utils/image_handling.py`	`analyze_image_with_vision()`	Added health check, timeouts, detailed logging
`bot/utils/image_handling.py`	`analyze_video_with_vision()`	Added health check, timeouts, detailed logging

Testing

Quick test to verify vision model works when AMD is primary:

# 1. Check GPU state is AMD
cat bot/memory/gpu_state.json
# Should show: {"current_gpu": "amd", ...}

# 2. Send image to Discord
# (bot should analyze with vision model)

# 3. Check logs for success
docker compose logs miku-bot 2>&1 | grep -i "vision"
# Should see: "Vision analysis completed successfully"

Expected Log Output

When Working Correctly

[INFO] Primary GPU is AMD for text, but using NVIDIA for vision model
[INFO] Vision endpoint (http://llama-swap:8080) health check: OK
[INFO] Sending vision request to http://llama-swap:8080 using model: vision
[INFO] Vision analysis completed successfully

If NVIDIA Vision Endpoint Down

[WARNING] Vision endpoint (http://llama-swap:8080) health check failed: status 503
[WARNING] Vision endpoint unhealthy: Status 503
[ERROR] Vision service currently unavailable: Status 503

If Network Timeout

[ERROR] Vision endpoint (http://llama-swap:8080) health check: timeout
[WARNING] Vision endpoint unhealthy: Endpoint timeout
[ERROR] Vision service currently unavailable: Endpoint timeout

Architecture Reminder

NVIDIA GPU (port 8090): Vision + text models
AMD GPU (port 8091): Text models ONLY
When AMD is primary: Text goes to AMD, vision goes to NVIDIA
When NVIDIA is primary: Everything goes to NVIDIA

Files Modified

/home/koko210Serve/docker/miku-discord/bot/utils/llm.py
/home/koko210Serve/docker/miku-discord/bot/utils/image_handling.py

Files Created

/home/koko210Serve/docker/miku-discord/VISION_MODEL_DEBUG.md - Complete debugging guide

Deployment Notes

No changes needed to:

Docker containers
Environment variables
Configuration files
Database or state files

Just update the code and restart the bot:

docker compose restart miku-bot

Success Criteria

✅ Images are analyzed when AMD GPU is primary ✅ Detailed error messages if vision endpoint fails ✅ Health check prevents hanging requests ✅ Logs show NVIDIA is correctly used for vision ✅ No performance degradation compared to before

4.4 KiB Raw Blame History