moved AI generated readmes to readme folder (may delete)

2026-01-27 19:57:48 +02:00
parent 0f1c30f757
commit c58b941587
34 changed files with 8709 additions and 770 deletions
--- a/readmes/DUAL_GPU_QUICK_REF.md
+++ b/readmes/DUAL_GPU_QUICK_REF.md
@@ -0,0 +1,194 @@
+# Dual GPU Quick Reference
+
+## Quick Start
+
+```bash
+# 1. Run setup check
+./setup-dual-gpu.sh
+
+# 2. Build AMD container
+docker compose build llama-swap-amd
+
+# 3. Start both GPUs
+docker compose up -d llama-swap llama-swap-amd
+
+# 4. Verify
+curl http://localhost:8090/health  # NVIDIA
+curl http://localhost:8091/health  # AMD RX 6800
+```
+
+## Endpoints
+
+| GPU | Container | Port | Internal URL |
+|-----|-----------|------|--------------|
+| NVIDIA | llama-swap | 8090 | http://llama-swap:8080 |
+| AMD RX 6800 | llama-swap-amd | 8091 | http://llama-swap-amd:8080 |
+
+## Models
+
+### NVIDIA GPU (Primary)
+- `llama3.1` - Llama 3.1 8B Instruct
+- `darkidol` - DarkIdol Uncensored 8B
+- `vision` - MiniCPM-V-4.5 (4K context)
+
+### AMD RX 6800 (Secondary)
+- `llama3.1-amd` - Llama 3.1 8B Instruct
+- `darkidol-amd` - DarkIdol Uncensored 8B
+- `moondream-amd` - Moondream2 Vision (2K context)
+
+## Commands
+
+### Start/Stop
+```bash
+# Start both
+docker compose up -d llama-swap llama-swap-amd
+
+# Start only AMD
+docker compose up -d llama-swap-amd
+
+# Stop AMD
+docker compose stop llama-swap-amd
+
+# Restart AMD with logs
+docker compose restart llama-swap-amd && docker compose logs -f llama-swap-amd
+```
+
+### Monitoring
+```bash
+# Container status
+docker compose ps
+
+# Logs
+docker compose logs -f llama-swap-amd
+
+# GPU usage
+watch -n 1 nvidia-smi  # NVIDIA
+watch -n 1 rocm-smi    # AMD
+
+# Resource usage
+docker stats llama-swap llama-swap-amd
+```
+
+### Testing
+```bash
+# List available models
+curl http://localhost:8091/v1/models | jq
+
+# Test text generation (AMD)
+curl -X POST http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "llama3.1-amd",
+    "messages": [{"role": "user", "content": "Say hello!"}],
+    "max_tokens": 20
+  }' | jq
+
+# Test vision model (AMD)
+curl -X POST http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "moondream-amd",
+    "messages": [{
+      "role": "user",
+      "content": [
+        {"type": "text", "text": "Describe this image"},
+        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
+      ]
+    }],
+    "max_tokens": 100
+  }' | jq
+```
+
+## Bot Integration
+
+### Using GPU Router
+```python
+from bot.utils.gpu_router import get_llama_url_with_load_balancing, get_endpoint_for_model
+
+# Load balanced text generation
+url, model = get_llama_url_with_load_balancing(task_type="text")
+
+# Specific model
+url = get_endpoint_for_model("darkidol-amd")
+
+# Vision on AMD
+url, model = get_llama_url_with_load_balancing(task_type="vision", prefer_amd=True)
+```
+
+### Direct Access
+```python
+import globals
+
+# AMD GPU
+amd_url = globals.LLAMA_AMD_URL  # http://llama-swap-amd:8080
+
+# NVIDIA GPU  
+nvidia_url = globals.LLAMA_URL   # http://llama-swap:8080
+```
+
+## Troubleshooting
+
+### AMD Container Won't Start
+```bash
+# Check ROCm
+rocm-smi
+
+# Check permissions
+ls -l /dev/kfd /dev/dri
+
+# Check logs
+docker compose logs llama-swap-amd
+
+# Rebuild
+docker compose build --no-cache llama-swap-amd
+```
+
+### Model Won't Load
+```bash
+# Check VRAM
+rocm-smi --showmeminfo vram
+
+# Lower GPU layers in llama-swap-rocm-config.yaml
+# Change: -ngl 99
+# To:     -ngl 50
+```
+
+### GFX Version Error
+```bash
+# RX 6800 is gfx1030
+# Ensure in docker-compose.yml:
+HSA_OVERRIDE_GFX_VERSION=10.3.0
+```
+
+## Environment Variables
+
+Add to `docker-compose.yml` under `miku-bot` service:
+
+```yaml
+environment:
+  - PREFER_AMD_GPU=true          # Prefer AMD for load balancing
+  - AMD_MODELS_ENABLED=true      # Enable AMD models
+  - LLAMA_AMD_URL=http://llama-swap-amd:8080
+```
+
+## Files
+
+- `Dockerfile.llamaswap-rocm` - ROCm container
+- `llama-swap-rocm-config.yaml` - AMD model config
+- `bot/utils/gpu_router.py` - Load balancing utility
+- `DUAL_GPU_SETUP.md` - Full documentation
+- `setup-dual-gpu.sh` - Setup verification script
+
+## Performance Tips
+
+1. **Model Selection**: Use Q4_K quantization for best size/quality balance
+2. **VRAM**: RX 6800 has 16GB - can run 2-3 Q4 models
+3. **TTL**: Adjust in config files (1800s = 30min default)
+4. **Context**: Lower context size (`-c 8192`) to save VRAM
+5. **GPU Layers**: `-ngl 99` uses full GPU, lower if needed
+
+## Support
+
+- ROCm Docs: https://rocmdocs.amd.com/
+- llama.cpp: https://github.com/ggml-org/llama.cpp
+- llama-swap: https://github.com/mostlygeek/llama-swap