229 lines
7.8 KiB
Markdown
229 lines
7.8 KiB
Markdown
|
|
# Vision Model Debugging Guide
|
||
|
|
|
||
|
|
## Issue Summary
|
||
|
|
Vision model not working when AMD is set as the primary GPU for text inference.
|
||
|
|
|
||
|
|
## Root Cause Analysis
|
||
|
|
|
||
|
|
The vision model (MiniCPM-V) should **always run on the NVIDIA GPU**, even when AMD is the primary GPU for text models. This is because:
|
||
|
|
|
||
|
|
1. **Separate GPU design**: Each GPU has its own llama-swap instance
|
||
|
|
- `llama-swap` (NVIDIA) on port 8090 → handles `vision`, `llama3.1`, `darkidol`
|
||
|
|
- `llama-swap-amd` (AMD) on port 8091 → handles `llama3.1`, `darkidol` (text models only)
|
||
|
|
|
||
|
|
2. **Vision model location**: The vision model is **ONLY configured on NVIDIA**
|
||
|
|
- Check: `llama-swap-config.yaml` (has vision model)
|
||
|
|
- Check: `llama-swap-rocm-config.yaml` (does NOT have vision model)
|
||
|
|
|
||
|
|
## Fixes Applied
|
||
|
|
|
||
|
|
### 1. Improved GPU Routing (`bot/utils/llm.py`)
|
||
|
|
|
||
|
|
**Function**: `get_vision_gpu_url()`
|
||
|
|
- Now explicitly returns NVIDIA URL regardless of primary text GPU
|
||
|
|
- Added debug logging when text GPU is AMD
|
||
|
|
- Added clear documentation about the routing strategy
|
||
|
|
|
||
|
|
**New Function**: `check_vision_endpoint_health()`
|
||
|
|
- Pings the NVIDIA vision endpoint before attempting requests
|
||
|
|
- Provides detailed error messages if endpoint is unreachable
|
||
|
|
- Logs health status for troubleshooting
|
||
|
|
|
||
|
|
### 2. Enhanced Vision Analysis (`bot/utils/image_handling.py`)
|
||
|
|
|
||
|
|
**Function**: `analyze_image_with_vision()`
|
||
|
|
- Added health check before processing
|
||
|
|
- Increased timeout to 60 seconds (from default)
|
||
|
|
- Logs endpoint URL, model name, and detailed error messages
|
||
|
|
- Added exception info logging for better debugging
|
||
|
|
|
||
|
|
**Function**: `analyze_video_with_vision()`
|
||
|
|
- Added health check before processing
|
||
|
|
- Increased timeout to 120 seconds (from default)
|
||
|
|
- Logs media type, frame count, and detailed error messages
|
||
|
|
- Added exception info logging for better debugging
|
||
|
|
|
||
|
|
## Testing the Fix
|
||
|
|
|
||
|
|
### 1. Verify Docker Containers
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check both llama-swap services are running
|
||
|
|
docker compose ps
|
||
|
|
|
||
|
|
# Expected output:
|
||
|
|
# llama-swap (port 8090)
|
||
|
|
# llama-swap-amd (port 8091)
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Test NVIDIA Endpoint Health
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check if NVIDIA vision endpoint is responsive
|
||
|
|
curl -f http://llama-swap:8080/health
|
||
|
|
|
||
|
|
# Should return 200 OK
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Test Vision Request to NVIDIA
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Send a simple vision request directly
|
||
|
|
curl -X POST http://llama-swap:8080/v1/chat/completions \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{
|
||
|
|
"model": "vision",
|
||
|
|
"messages": [{
|
||
|
|
"role": "user",
|
||
|
|
"content": [
|
||
|
|
{"type": "text", "text": "Describe this image."},
|
||
|
|
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
|
||
|
|
]
|
||
|
|
}],
|
||
|
|
"max_tokens": 100
|
||
|
|
}'
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Check GPU State File
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Verify which GPU is primary
|
||
|
|
cat bot/memory/gpu_state.json
|
||
|
|
|
||
|
|
# Should show:
|
||
|
|
# {"current_gpu": "amd", "reason": "..."} when AMD is primary
|
||
|
|
# {"current_gpu": "nvidia", "reason": "..."} when NVIDIA is primary
|
||
|
|
```
|
||
|
|
|
||
|
|
### 5. Monitor Logs During Vision Request
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Watch bot logs during image analysis
|
||
|
|
docker compose logs -f miku-bot 2>&1 | grep -i vision
|
||
|
|
|
||
|
|
# Should see:
|
||
|
|
# "Sending vision request to http://llama-swap:8080"
|
||
|
|
# "Vision analysis completed successfully"
|
||
|
|
# OR detailed error messages if something is wrong
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting Steps
|
||
|
|
|
||
|
|
### Issue: Vision endpoint health check fails
|
||
|
|
|
||
|
|
**Symptoms**: "Vision service currently unavailable: Endpoint timeout"
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Verify NVIDIA container is running: `docker compose ps llama-swap`
|
||
|
|
2. Check NVIDIA GPU memory: `nvidia-smi` (should have free VRAM)
|
||
|
|
3. Check if vision model is loaded: `docker compose logs llama-swap`
|
||
|
|
4. Increase timeout if model is loading slowly
|
||
|
|
|
||
|
|
### Issue: Vision requests timeout (status 408/504)
|
||
|
|
|
||
|
|
**Symptoms**: Requests hang or return timeout errors
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Check NVIDIA GPU is not overloaded: `nvidia-smi`
|
||
|
|
2. Check if vision model is already running: Look for MiniCPM processes
|
||
|
|
3. Restart llama-swap if model is stuck: `docker compose restart llama-swap`
|
||
|
|
4. Check available VRAM: MiniCPM-V needs ~4-6GB
|
||
|
|
|
||
|
|
### Issue: Vision model returns "No description"
|
||
|
|
|
||
|
|
**Symptoms**: Image analysis returns empty or generic responses
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. Check if vision model loaded correctly: `docker compose logs llama-swap`
|
||
|
|
2. Verify model file exists: `/models/MiniCPM-V-4_5-Q3_K_S.gguf`
|
||
|
|
3. Check if mmproj loaded: `/models/MiniCPM-V-4_5-mmproj-f16.gguf`
|
||
|
|
4. Test with direct curl to ensure model works
|
||
|
|
|
||
|
|
### Issue: AMD GPU affects vision performance
|
||
|
|
|
||
|
|
**Symptoms**: Vision requests are slower when AMD is primary
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
1. This is expected behavior - NVIDIA is still processing vision
|
||
|
|
2. Could indicate NVIDIA GPU memory pressure
|
||
|
|
3. Monitor both GPUs: `rocm-smi` (AMD) and `nvidia-smi` (NVIDIA)
|
||
|
|
|
||
|
|
## Architecture Diagram
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────────────────────────────────────────────────┐
|
||
|
|
│ Miku Bot │
|
||
|
|
│ │
|
||
|
|
│ Discord Messages with Images/Videos │
|
||
|
|
└─────────────────────────────────────────────────────────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌──────────────────────────────┐
|
||
|
|
│ Vision Analysis Handler │
|
||
|
|
│ (image_handling.py) │
|
||
|
|
│ │
|
||
|
|
│ 1. Check NVIDIA health │
|
||
|
|
│ 2. Send to NVIDIA vision │
|
||
|
|
└──────────────────────────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌──────────────────────────────┐
|
||
|
|
│ NVIDIA GPU (llama-swap) │
|
||
|
|
│ Port: 8090 │
|
||
|
|
│ │
|
||
|
|
│ Available Models: │
|
||
|
|
│ • vision (MiniCPM-V) │
|
||
|
|
│ • llama3.1 │
|
||
|
|
│ • darkidol │
|
||
|
|
└──────────────────────────────┘
|
||
|
|
│
|
||
|
|
┌───────────┴────────────┐
|
||
|
|
│ │
|
||
|
|
▼ (Vision only) ▼ (Text only in dual-GPU mode)
|
||
|
|
NVIDIA GPU AMD GPU (llama-swap-amd)
|
||
|
|
Port: 8091
|
||
|
|
|
||
|
|
Available Models:
|
||
|
|
• llama3.1
|
||
|
|
• darkidol
|
||
|
|
(NO vision model)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Key Files Changed
|
||
|
|
|
||
|
|
1. **bot/utils/llm.py**
|
||
|
|
- Enhanced `get_vision_gpu_url()` with documentation
|
||
|
|
- Added `check_vision_endpoint_health()` function
|
||
|
|
|
||
|
|
2. **bot/utils/image_handling.py**
|
||
|
|
- `analyze_image_with_vision()` - added health check and logging
|
||
|
|
- `analyze_video_with_vision()` - added health check and logging
|
||
|
|
|
||
|
|
## Expected Behavior After Fix
|
||
|
|
|
||
|
|
### When NVIDIA is Primary (default)
|
||
|
|
```
|
||
|
|
Image received
|
||
|
|
→ Check NVIDIA health: OK
|
||
|
|
→ Send to NVIDIA vision model
|
||
|
|
→ Analysis complete
|
||
|
|
✓ Works as before
|
||
|
|
```
|
||
|
|
|
||
|
|
### When AMD is Primary (voice session active)
|
||
|
|
```
|
||
|
|
Image received
|
||
|
|
→ Check NVIDIA health: OK
|
||
|
|
→ Send to NVIDIA vision model (even though text uses AMD)
|
||
|
|
→ Analysis complete
|
||
|
|
✓ Vision now works correctly!
|
||
|
|
```
|
||
|
|
|
||
|
|
## Next Steps if Issues Persist
|
||
|
|
|
||
|
|
1. Enable debug logging: Set `AUTONOMOUS_DEBUG=true` in docker-compose
|
||
|
|
2. Check Docker networking: `docker network inspect miku-discord_default`
|
||
|
|
3. Verify environment variables: `docker compose exec miku-bot env | grep LLAMA`
|
||
|
|
4. Check model file integrity: `ls -lah models/MiniCPM*`
|
||
|
|
5. Review llama-swap logs: `docker compose logs llama-swap -n 100`
|