Koko210/miku-discord

Fork 0

Files

koko210Serve c58b941587 moved AI generated readmes to readme folder (may delete)

2026-01-27 19:57:48 +02:00

7.8 KiB

Raw Permalink Blame History

Vision Model Debugging Guide

Issue Summary

Vision model not working when AMD is set as the primary GPU for text inference.

Root Cause Analysis

The vision model (MiniCPM-V) should always run on the NVIDIA GPU, even when AMD is the primary GPU for text models. This is because:

Separate GPU design: Each GPU has its own llama-swap instance
- llama-swap (NVIDIA) on port 8090 → handles vision, llama3.1, darkidol
- llama-swap-amd (AMD) on port 8091 → handles llama3.1, darkidol (text models only)
Vision model location: The vision model is ONLY configured on NVIDIA
- Check: llama-swap-config.yaml (has vision model)
- Check: llama-swap-rocm-config.yaml (does NOT have vision model)

Fixes Applied

1. Improved GPU Routing (`bot/utils/llm.py`)

Function: get_vision_gpu_url()

Now explicitly returns NVIDIA URL regardless of primary text GPU
Added debug logging when text GPU is AMD
Added clear documentation about the routing strategy

New Function: check_vision_endpoint_health()

Pings the NVIDIA vision endpoint before attempting requests
Provides detailed error messages if endpoint is unreachable
Logs health status for troubleshooting

2. Enhanced Vision Analysis (`bot/utils/image_handling.py`)

Function: analyze_image_with_vision()

Added health check before processing
Increased timeout to 60 seconds (from default)
Logs endpoint URL, model name, and detailed error messages
Added exception info logging for better debugging

Function: analyze_video_with_vision()

Added health check before processing
Increased timeout to 120 seconds (from default)
Logs media type, frame count, and detailed error messages
Added exception info logging for better debugging

Testing the Fix

1. Verify Docker Containers

# Check both llama-swap services are running
docker compose ps

# Expected output:
# llama-swap      (port 8090)
# llama-swap-amd  (port 8091)

2. Test NVIDIA Endpoint Health

# Check if NVIDIA vision endpoint is responsive
curl -f http://llama-swap:8080/health

# Should return 200 OK

3. Test Vision Request to NVIDIA

# Send a simple vision request directly
curl -X POST http://llama-swap:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vision",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image."},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
      ]
    }],
    "max_tokens": 100
  }'

4. Check GPU State File

# Verify which GPU is primary
cat bot/memory/gpu_state.json

# Should show:
# {"current_gpu": "amd", "reason": "..."} when AMD is primary
# {"current_gpu": "nvidia", "reason": "..."} when NVIDIA is primary

5. Monitor Logs During Vision Request

# Watch bot logs during image analysis
docker compose logs -f miku-bot 2>&1 | grep -i vision

# Should see:
# "Sending vision request to http://llama-swap:8080"
# "Vision analysis completed successfully"
# OR detailed error messages if something is wrong

Troubleshooting Steps

Issue: Vision endpoint health check fails

Symptoms: "Vision service currently unavailable: Endpoint timeout"

Solutions:

Verify NVIDIA container is running: docker compose ps llama-swap
Check NVIDIA GPU memory: nvidia-smi (should have free VRAM)
Check if vision model is loaded: docker compose logs llama-swap
Increase timeout if model is loading slowly

Issue: Vision requests timeout (status 408/504)

Symptoms: Requests hang or return timeout errors

Solutions:

Check NVIDIA GPU is not overloaded: nvidia-smi
Check if vision model is already running: Look for MiniCPM processes
Restart llama-swap if model is stuck: docker compose restart llama-swap
Check available VRAM: MiniCPM-V needs ~4-6GB

Issue: Vision model returns "No description"

Symptoms: Image analysis returns empty or generic responses

Solutions:

Check if vision model loaded correctly: docker compose logs llama-swap
Verify model file exists: /models/MiniCPM-V-4_5-Q3_K_S.gguf
Check if mmproj loaded: /models/MiniCPM-V-4_5-mmproj-f16.gguf
Test with direct curl to ensure model works

Issue: AMD GPU affects vision performance

Symptoms: Vision requests are slower when AMD is primary

Solutions:

This is expected behavior - NVIDIA is still processing vision
Could indicate NVIDIA GPU memory pressure
Monitor both GPUs: rocm-smi (AMD) and nvidia-smi (NVIDIA)

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                         Miku Bot                            │
│                                                             │
│  Discord Messages with Images/Videos                       │
└─────────────────────────────────────────────────────────────┘
                    │
                    ▼
        ┌──────────────────────────────┐
        │  Vision Analysis Handler     │
        │  (image_handling.py)         │
        │                              │
        │ 1. Check NVIDIA health       │
        │ 2. Send to NVIDIA vision     │
        └──────────────────────────────┘
                    │
                    ▼
        ┌──────────────────────────────┐
        │    NVIDIA GPU (llama-swap)   │
        │    Port: 8090                │
        │                              │
        │  Available Models:           │
        │  • vision (MiniCPM-V)        │
        │  • llama3.1                  │
        │  • darkidol                  │
        └──────────────────────────────┘
                    │
        ┌───────────┴────────────┐
        │                        │
        ▼ (Vision only)         ▼ (Text only in dual-GPU mode)
    NVIDIA GPU          AMD GPU (llama-swap-amd)
                        Port: 8091
                        
                        Available Models:
                        • llama3.1
                        • darkidol
                        (NO vision model)

Key Files Changed

bot/utils/llm.py
- Enhanced get_vision_gpu_url() with documentation
- Added check_vision_endpoint_health() function
bot/utils/image_handling.py
- analyze_image_with_vision() - added health check and logging
- analyze_video_with_vision() - added health check and logging

Expected Behavior After Fix

When NVIDIA is Primary (default)

Image received
→ Check NVIDIA health: OK
→ Send to NVIDIA vision model
→ Analysis complete
✓ Works as before

When AMD is Primary (voice session active)

Image received
→ Check NVIDIA health: OK
→ Send to NVIDIA vision model (even though text uses AMD)
→ Analysis complete
✓ Vision now works correctly!

Next Steps if Issues Persist

Enable debug logging: Set AUTONOMOUS_DEBUG=true in docker-compose
Check Docker networking: docker network inspect miku-discord_default
Verify environment variables: docker compose exec miku-bot env | grep LLAMA
Check model file integrity: ls -lah models/MiniCPM*
Review llama-swap logs: docker compose logs llama-swap -n 100

7.8 KiB Raw Permalink Blame History

Vision Model Debugging Guide

Issue Summary

Root Cause Analysis

Fixes Applied

1. Improved GPU Routing (bot/utils/llm.py)

2. Enhanced Vision Analysis (bot/utils/image_handling.py)

Testing the Fix

1. Verify Docker Containers

2. Test NVIDIA Endpoint Health

3. Test Vision Request to NVIDIA

4. Check GPU State File

5. Monitor Logs During Vision Request

Troubleshooting Steps

Issue: Vision endpoint health check fails

Issue: Vision requests timeout (status 408/504)

Issue: Vision model returns "No description"

Issue: AMD GPU affects vision performance

Architecture Diagram

Key Files Changed

Expected Behavior After Fix

When NVIDIA is Primary (default)

When AMD is Primary (voice session active)

Next Steps if Issues Persist

7.8 KiB

Raw Permalink Blame History

1. Improved GPU Routing (`bot/utils/llm.py`)

2. Enhanced Vision Analysis (`bot/utils/image_handling.py`)