moved AI generated readmes to readme folder (may delete)

2026-01-27 19:57:48 +02:00
parent 0f1c30f757
commit c58b941587
34 changed files with 8709 additions and 770 deletions
--- a/readmes/STT_FIX_COMPLETE.md
+++ b/readmes/STT_FIX_COMPLETE.md
@@ -0,0 +1,192 @@
+# STT Fix Applied - Ready for Testing
+
+## Summary
+
+Fixed all three issues preventing the ONNX-based Parakeet STT from working:
+
+1. ✅ **CUDA Support**: Updated Docker base image to include cuDNN 9
+2. ✅ **Port Configuration**: Fixed bot to connect to port 8766 (found TWO places)
+3. ✅ **Protocol Compatibility**: Updated event handler for new ONNX format
+
+---
+
+## Files Modified
+
+### 1. `stt-parakeet/Dockerfile`
+```diff
+- FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
+ FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04
+```
+
+### 2. `bot/utils/stt_client.py`
+```diff
+- stt_url: str = "ws://miku-stt:8000/ws/stt"
+ stt_url: str = "ws://miku-stt:8766/ws/stt"
+```
+
+Added new methods:
+- `send_final()` - Request final transcription
+- `send_reset()` - Clear audio buffer
+
+Updated `_handle_event()` to support:
+- New ONNX protocol: `{"type": "transcript", "is_final": true/false}`
+- Legacy protocol: `{"type": "partial"}`, `{"type": "final"}` (backward compatibility)
+
+### 3. `bot/utils/voice_receiver.py` ⚠️ **KEY FIX**
+```diff
+- def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8000/ws/stt"):
+ def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8766/ws/stt"):
+```
+
+**This was the missing piece!** The `voice_receiver` was overriding the default URL.
+
+---
+
+## Container Status
+
+### STT Container ✅
+```bash
+$ docker logs miku-stt 2>&1 | tail -10
+```
+```
+CUDA Version 12.6.2
+INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)]
+INFO:asr.asr_pipeline:Model loaded successfully
+INFO:__main__:Server running on ws://0.0.0.0:8766
+INFO:__main__:Active connections: 0
+```
+
+**Status**: ✅ Running with CUDA acceleration
+
+### Bot Container ✅
+- Files copied directly into running container (faster than rebuild)
+- Python bytecode cache cleared
+- Container restarted
+
+---
+
+## Testing Instructions
+
+### Test 1: Basic Connection
+1. Join a voice channel in Discord
+2. Run `!miku listen`
+3. **Expected**: Bot connects without "Connection Refused" error
+4. **Check logs**: `docker logs miku-bot 2>&1 | grep "STT"`
+
+### Test 2: Transcription
+1. After running `!miku listen`, speak into your microphone
+2. **Expected**: Your speech is transcribed
+3. **Check STT logs**: `docker logs miku-stt 2>&1 | tail -20`
+4. **Check bot logs**: Look for "Partial transcript" or "Final transcript" messages
+
+### Test 3: Performance
+1. Monitor GPU usage: `nvidia-smi -l 1`
+2. **Expected**: GPU utilization increases when transcribing
+3. **Expected**: Transcription completes in ~0.5-1 second
+
+---
+
+## Monitoring Commands
+
+### Check Both Containers
+```bash
+docker logs -f --tail=50 miku-bot miku-stt
+```
+
+### Check STT Service Health
+```bash
+docker ps | grep miku-stt
+docker logs miku-stt 2>&1 | grep "CUDA\|Providers\|Server running"
+```
+
+### Check for Errors
+```bash
+# Bot errors
+docker logs miku-bot 2>&1 | grep -i "error\|failed" | tail -20
+
+# STT errors
+docker logs miku-stt 2>&1 | grep -i "error\|failed" | tail -20
+```
+
+### Test WebSocket Connection
+```bash
+# From host machine
+curl -i -N \
+  -H "Connection: Upgrade" \
+  -H "Upgrade: websocket" \
+  -H "Sec-WebSocket-Version: 13" \
+  -H "Sec-WebSocket-Key: test" \
+  http://localhost:8766/
+```
+
+---
+
+## Known Issues & Workarounds
+
+### Issue: Bot Still Shows Old Errors
+**Symptom**: After restart, logs still show port 8000 errors
+
+**Cause**: Python module caching or log entries from before restart
+
+**Solution**: 
+```bash
+# Clear cache and restart
+docker exec miku-bot find /app -name "*.pyc" -delete
+docker restart miku-bot
+
+# Wait 10 seconds for full restart
+sleep 10
+```
+
+### Issue: Container Rebuild Takes 15+ Minutes
+**Cause**: `playwright install` downloads chromium/firefox browsers (~500MB)
+
+**Workaround**: Instead of full rebuild, use `docker cp`:
+```bash
+docker cp bot/utils/stt_client.py miku-bot:/app/utils/stt_client.py
+docker cp bot/utils/voice_receiver.py miku-bot:/app/utils/voice_receiver.py
+docker restart miku-bot
+```
+
+---
+
+## Next Steps
+
+### For Full Deployment (after testing)
+1. Rebuild bot container properly:
+   ```bash
+   docker-compose build miku-bot
+   docker-compose up -d miku-bot
+   ```
+
+2. Remove old STT directory:
+   ```bash
+   mv stt stt.backup
+   ```
+
+3. Update documentation to reflect new architecture
+
+### Optional Enhancements
+1. Add `send_final()` call when user stops speaking (VAD integration)
+2. Implement progressive transcription display
+3. Add transcription quality metrics/logging
+4. Test with multiple simultaneous users
+
+---
+
+## Quick Reference
+
+| Component | Old (NeMo) | New (ONNX) |
+|-----------|------------|------------|
+| **Port** | 8000 | 8766 |
+| **VRAM** | 4-5GB | 2-3GB |
+| **Speed** | 2-3s | 0.5-1s |
+| **cuDNN** | 8 | 9 |
+| **CUDA** | 12.1 | 12.6.2 |
+| **Protocol** | Auto VAD | Manual control |
+
+---
+
+**Status**: ✅ **ALL FIXES APPLIED - READY FOR USER TESTING**
+
+Last Updated: January 18, 2026 20:47 EET