Files
miku-discord/readmes/VISION_MODEL_DEBUG.md

7.8 KiB

Vision Model Debugging Guide

Issue Summary

Vision model not working when AMD is set as the primary GPU for text inference.

Root Cause Analysis

The vision model (MiniCPM-V) should always run on the NVIDIA GPU, even when AMD is the primary GPU for text models. This is because:

  1. Separate GPU design: Each GPU has its own llama-swap instance

    • llama-swap (NVIDIA) on port 8090 → handles vision, llama3.1, darkidol
    • llama-swap-amd (AMD) on port 8091 → handles llama3.1, darkidol (text models only)
  2. Vision model location: The vision model is ONLY configured on NVIDIA

    • Check: llama-swap-config.yaml (has vision model)
    • Check: llama-swap-rocm-config.yaml (does NOT have vision model)

Fixes Applied

1. Improved GPU Routing (bot/utils/llm.py)

Function: get_vision_gpu_url()

  • Now explicitly returns NVIDIA URL regardless of primary text GPU
  • Added debug logging when text GPU is AMD
  • Added clear documentation about the routing strategy

New Function: check_vision_endpoint_health()

  • Pings the NVIDIA vision endpoint before attempting requests
  • Provides detailed error messages if endpoint is unreachable
  • Logs health status for troubleshooting

2. Enhanced Vision Analysis (bot/utils/image_handling.py)

Function: analyze_image_with_vision()

  • Added health check before processing
  • Increased timeout to 60 seconds (from default)
  • Logs endpoint URL, model name, and detailed error messages
  • Added exception info logging for better debugging

Function: analyze_video_with_vision()

  • Added health check before processing
  • Increased timeout to 120 seconds (from default)
  • Logs media type, frame count, and detailed error messages
  • Added exception info logging for better debugging

Testing the Fix

1. Verify Docker Containers

# Check both llama-swap services are running
docker compose ps

# Expected output:
# llama-swap      (port 8090)
# llama-swap-amd  (port 8091)

2. Test NVIDIA Endpoint Health

# Check if NVIDIA vision endpoint is responsive
curl -f http://llama-swap:8080/health

# Should return 200 OK

3. Test Vision Request to NVIDIA

# Send a simple vision request directly
curl -X POST http://llama-swap:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vision",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image."},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
      ]
    }],
    "max_tokens": 100
  }'

4. Check GPU State File

# Verify which GPU is primary
cat bot/memory/gpu_state.json

# Should show:
# {"current_gpu": "amd", "reason": "..."} when AMD is primary
# {"current_gpu": "nvidia", "reason": "..."} when NVIDIA is primary

5. Monitor Logs During Vision Request

# Watch bot logs during image analysis
docker compose logs -f miku-bot 2>&1 | grep -i vision

# Should see:
# "Sending vision request to http://llama-swap:8080"
# "Vision analysis completed successfully"
# OR detailed error messages if something is wrong

Troubleshooting Steps

Issue: Vision endpoint health check fails

Symptoms: "Vision service currently unavailable: Endpoint timeout"

Solutions:

  1. Verify NVIDIA container is running: docker compose ps llama-swap
  2. Check NVIDIA GPU memory: nvidia-smi (should have free VRAM)
  3. Check if vision model is loaded: docker compose logs llama-swap
  4. Increase timeout if model is loading slowly

Issue: Vision requests timeout (status 408/504)

Symptoms: Requests hang or return timeout errors

Solutions:

  1. Check NVIDIA GPU is not overloaded: nvidia-smi
  2. Check if vision model is already running: Look for MiniCPM processes
  3. Restart llama-swap if model is stuck: docker compose restart llama-swap
  4. Check available VRAM: MiniCPM-V needs ~4-6GB

Issue: Vision model returns "No description"

Symptoms: Image analysis returns empty or generic responses

Solutions:

  1. Check if vision model loaded correctly: docker compose logs llama-swap
  2. Verify model file exists: /models/MiniCPM-V-4_5-Q3_K_S.gguf
  3. Check if mmproj loaded: /models/MiniCPM-V-4_5-mmproj-f16.gguf
  4. Test with direct curl to ensure model works

Issue: AMD GPU affects vision performance

Symptoms: Vision requests are slower when AMD is primary

Solutions:

  1. This is expected behavior - NVIDIA is still processing vision
  2. Could indicate NVIDIA GPU memory pressure
  3. Monitor both GPUs: rocm-smi (AMD) and nvidia-smi (NVIDIA)

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                         Miku Bot                            │
│                                                             │
│  Discord Messages with Images/Videos                       │
└─────────────────────────────────────────────────────────────┘
                    │
                    ▼
        ┌──────────────────────────────┐
        │  Vision Analysis Handler     │
        │  (image_handling.py)         │
        │                              │
        │ 1. Check NVIDIA health       │
        │ 2. Send to NVIDIA vision     │
        └──────────────────────────────┘
                    │
                    ▼
        ┌──────────────────────────────┐
        │    NVIDIA GPU (llama-swap)   │
        │    Port: 8090                │
        │                              │
        │  Available Models:           │
        │  • vision (MiniCPM-V)        │
        │  • llama3.1                  │
        │  • darkidol                  │
        └──────────────────────────────┘
                    │
        ┌───────────┴────────────┐
        │                        │
        ▼ (Vision only)         ▼ (Text only in dual-GPU mode)
    NVIDIA GPU          AMD GPU (llama-swap-amd)
                        Port: 8091
                        
                        Available Models:
                        • llama3.1
                        • darkidol
                        (NO vision model)

Key Files Changed

  1. bot/utils/llm.py

    • Enhanced get_vision_gpu_url() with documentation
    • Added check_vision_endpoint_health() function
  2. bot/utils/image_handling.py

    • analyze_image_with_vision() - added health check and logging
    • analyze_video_with_vision() - added health check and logging

Expected Behavior After Fix

When NVIDIA is Primary (default)

Image received
→ Check NVIDIA health: OK
→ Send to NVIDIA vision model
→ Analysis complete
✓ Works as before

When AMD is Primary (voice session active)

Image received
→ Check NVIDIA health: OK
→ Send to NVIDIA vision model (even though text uses AMD)
→ Analysis complete
✓ Vision now works correctly!

Next Steps if Issues Persist

  1. Enable debug logging: Set AUTONOMOUS_DEBUG=true in docker-compose
  2. Check Docker networking: docker network inspect miku-discord_default
  3. Verify environment variables: docker compose exec miku-bot env | grep LLAMA
  4. Check model file integrity: ls -lah models/MiniCPM*
  5. Review llama-swap logs: docker compose logs llama-swap -n 100