readmes/VISION_FIX_SUMMARY.md

# Vision Model Dual-GPU Fix - Summary

## Problem
Vision model (MiniCPM-V) wasn't working when AMD GPU was set as the primary GPU for text inference.

## Root Cause
While `get_vision_gpu_url()` was correctly hardcoded to always use NVIDIA, there was:
1. No health checking before attempting requests
2. No detailed error logging to understand failures
3. No timeout specification (could hang indefinitely)
4. No verification that NVIDIA GPU was actually responsive

When AMD became primary, if NVIDIA GPU had issues, vision requests would fail silently with poor error reporting.

## Solution Implemented

### 1. Enhanced GPU Routing (`bot/utils/llm.py`)

```python
def get_vision_gpu_url():
    """Always use NVIDIA for vision, even when AMD is primary for text"""
    # Added clear documentation
    # Added debug logging when switching occurs
    # Returns NVIDIA URL unconditionally
```

### 2. Added Health Check (`bot/utils/llm.py`)

```python
async def check_vision_endpoint_health():
    """Verify NVIDIA vision endpoint is responsive before use"""
    # Pings http://llama-swap:8080/health
    # Returns (is_healthy: bool, error_message: Optional[str])
    # Logs status for debugging
```

### 3. Improved Image Analysis (`bot/utils/image_handling.py`)

**Before request:**
- Health check
- Detailed logging of endpoint, model, image size

**During request:**
- 60-second timeout (was unlimited)
- Endpoint URL in error messages

**After error:**
- Full exception traceback in logs
- Endpoint information in error response

### 4. Improved Video Analysis (`bot/utils/image_handling.py`)

**Before request:**
- Health check
- Logging of media type, frame count

**During request:**
- 120-second timeout (longer for multiple frames)
- Endpoint URL in error messages

**After error:**
- Full exception traceback in logs
- Endpoint information in error response

## Key Changes

| File | Function | Changes |
|------|----------|---------|
| `bot/utils/llm.py` | `get_vision_gpu_url()` | Added documentation, debug logging |
| `bot/utils/llm.py` | `check_vision_endpoint_health()` | NEW: Health check function |
| `bot/utils/image_handling.py` | `analyze_image_with_vision()` | Added health check, timeouts, detailed logging |
| `bot/utils/image_handling.py` | `analyze_video_with_vision()` | Added health check, timeouts, detailed logging |

## Testing

Quick test to verify vision model works when AMD is primary:

```bash
# 1. Check GPU state is AMD
cat bot/memory/gpu_state.json
# Should show: {"current_gpu": "amd", ...}

# 2. Send image to Discord
# (bot should analyze with vision model)

# 3. Check logs for success
docker compose logs miku-bot 2>&1 | grep -i "vision"
# Should see: "Vision analysis completed successfully"
```

## Expected Log Output

### When Working Correctly
```
[INFO] Primary GPU is AMD for text, but using NVIDIA for vision model
[INFO] Vision endpoint (http://llama-swap:8080) health check: OK
[INFO] Sending vision request to http://llama-swap:8080 using model: vision
[INFO] Vision analysis completed successfully
```

### If NVIDIA Vision Endpoint Down
```
[WARNING] Vision endpoint (http://llama-swap:8080) health check failed: status 503
[WARNING] Vision endpoint unhealthy: Status 503
[ERROR] Vision service currently unavailable: Status 503
```

### If Network Timeout
```
[ERROR] Vision endpoint (http://llama-swap:8080) health check: timeout
[WARNING] Vision endpoint unhealthy: Endpoint timeout
[ERROR] Vision service currently unavailable: Endpoint timeout
```

## Architecture Reminder

- **NVIDIA GPU** (port 8090): Vision + text models
- **AMD GPU** (port 8091): Text models ONLY
- When AMD is primary: Text goes to AMD, vision goes to NVIDIA
- When NVIDIA is primary: Everything goes to NVIDIA

## Files Modified

1. `/home/koko210Serve/docker/miku-discord/bot/utils/llm.py`
2. `/home/koko210Serve/docker/miku-discord/bot/utils/image_handling.py`

## Files Created

1. `/home/koko210Serve/docker/miku-discord/VISION_MODEL_DEBUG.md` - Complete debugging guide

## Deployment Notes

No changes needed to:
- Docker containers
- Environment variables
- Configuration files
- Database or state files

Just update the code and restart the bot:
```bash
docker compose restart miku-bot
```

## Success Criteria

✅ Images are analyzed when AMD GPU is primary
✅ Detailed error messages if vision endpoint fails
✅ Health check prevents hanging requests
✅ Logs show NVIDIA is correctly used for vision
✅ No performance degradation compared to before
moved AI generated readmes to readme folder (may delete) 2026-01-27 19:57:48 +02:00			`# Vision Model Dual-GPU Fix - Summary`

			`## Problem`
			`Vision model (MiniCPM-V) wasn't working when AMD GPU was set as the primary GPU for text inference.`

			`## Root Cause`
			While `get_vision_gpu_url()` was correctly hardcoded to always use NVIDIA, there was:
			`1. No health checking before attempting requests`
			`2. No detailed error logging to understand failures`
			`3. No timeout specification (could hang indefinitely)`
			`4. No verification that NVIDIA GPU was actually responsive`

			`When AMD became primary, if NVIDIA GPU had issues, vision requests would fail silently with poor error reporting.`

			`## Solution Implemented`

			### 1. Enhanced GPU Routing (`bot/utils/llm.py`)

			```python
			`def get_vision_gpu_url():`
			`"""Always use NVIDIA for vision, even when AMD is primary for text"""`
			`# Added clear documentation`
			`# Added debug logging when switching occurs`
			`# Returns NVIDIA URL unconditionally`
			```

			### 2. Added Health Check (`bot/utils/llm.py`)

			```python
			`async def check_vision_endpoint_health():`
			`"""Verify NVIDIA vision endpoint is responsive before use"""`
			`# Pings http://llama-swap:8080/health`
			`# Returns (is_healthy: bool, error_message: Optional[str])`
			`# Logs status for debugging`
			```

			### 3. Improved Image Analysis (`bot/utils/image_handling.py`)

			`Before request:`
			`- Health check`
			`- Detailed logging of endpoint, model, image size`

			`During request:`
			`- 60-second timeout (was unlimited)`
			`- Endpoint URL in error messages`

			`After error:`
			`- Full exception traceback in logs`
			`- Endpoint information in error response`

			### 4. Improved Video Analysis (`bot/utils/image_handling.py`)

			`Before request:`
			`- Health check`
			`- Logging of media type, frame count`

			`During request:`
			`- 120-second timeout (longer for multiple frames)`
			`- Endpoint URL in error messages`

			`After error:`
			`- Full exception traceback in logs`
			`- Endpoint information in error response`

			`## Key Changes`

			`\| File \| Function \| Changes \|`
			`\|------\|----------\|---------\|`
			\| `bot/utils/llm.py` \| `get_vision_gpu_url()` \| Added documentation, debug logging \|
			\| `bot/utils/llm.py` \| `check_vision_endpoint_health()` \| NEW: Health check function \|
			\| `bot/utils/image_handling.py` \| `analyze_image_with_vision()` \| Added health check, timeouts, detailed logging \|
			\| `bot/utils/image_handling.py` \| `analyze_video_with_vision()` \| Added health check, timeouts, detailed logging \|

			`## Testing`

			`Quick test to verify vision model works when AMD is primary:`

			```bash
			`# 1. Check GPU state is AMD`
			`cat bot/memory/gpu_state.json`
			`# Should show: {"current_gpu": "amd", ...}`

			`# 2. Send image to Discord`
			`# (bot should analyze with vision model)`

			`# 3. Check logs for success`
			`docker compose logs miku-bot 2>&1 \| grep -i "vision"`
			`# Should see: "Vision analysis completed successfully"`
			```

			`## Expected Log Output`

			`### When Working Correctly`
			```
			`[INFO] Primary GPU is AMD for text, but using NVIDIA for vision model`
			`[INFO] Vision endpoint (http://llama-swap:8080) health check: OK`
			`[INFO] Sending vision request to http://llama-swap:8080 using model: vision`
			`[INFO] Vision analysis completed successfully`
			```

			`### If NVIDIA Vision Endpoint Down`
			```
			`[WARNING] Vision endpoint (http://llama-swap:8080) health check failed: status 503`
			`[WARNING] Vision endpoint unhealthy: Status 503`
			`[ERROR] Vision service currently unavailable: Status 503`
			```

			`### If Network Timeout`
			```
			`[ERROR] Vision endpoint (http://llama-swap:8080) health check: timeout`
			`[WARNING] Vision endpoint unhealthy: Endpoint timeout`
			`[ERROR] Vision service currently unavailable: Endpoint timeout`
			```

			`## Architecture Reminder`

			`- NVIDIA GPU (port 8090): Vision + text models`
			`- AMD GPU (port 8091): Text models ONLY`
			`- When AMD is primary: Text goes to AMD, vision goes to NVIDIA`
			`- When NVIDIA is primary: Everything goes to NVIDIA`

			`## Files Modified`

			1. `/home/koko210Serve/docker/miku-discord/bot/utils/llm.py`
			2. `/home/koko210Serve/docker/miku-discord/bot/utils/image_handling.py`

			`## Files Created`

			1. `/home/koko210Serve/docker/miku-discord/VISION_MODEL_DEBUG.md` - Complete debugging guide

			`## Deployment Notes`

			`No changes needed to:`
			`- Docker containers`
			`- Environment variables`
			`- Configuration files`
			`- Database or state files`

			`Just update the code and restart the bot:`
			```bash
			`docker compose restart miku-bot`
			```

			`## Success Criteria`

			`✅ Images are analyzed when AMD GPU is primary`
			`✅ Detailed error messages if vision endpoint fails`
			`✅ Health check prevents hanging requests`
			`✅ Logs show NVIDIA is correctly used for vision`
			`✅ No performance degradation compared to before`