moved AI generated readmes to readme folder (may delete)
This commit is contained in:
192
readmes/STT_FIX_COMPLETE.md
Normal file
192
readmes/STT_FIX_COMPLETE.md
Normal file
@@ -0,0 +1,192 @@
|
||||
# STT Fix Applied - Ready for Testing
|
||||
|
||||
## Summary
|
||||
|
||||
Fixed all three issues preventing the ONNX-based Parakeet STT from working:
|
||||
|
||||
1. ✅ **CUDA Support**: Updated Docker base image to include cuDNN 9
|
||||
2. ✅ **Port Configuration**: Fixed bot to connect to port 8766 (found TWO places)
|
||||
3. ✅ **Protocol Compatibility**: Updated event handler for new ONNX format
|
||||
|
||||
---
|
||||
|
||||
## Files Modified
|
||||
|
||||
### 1. `stt-parakeet/Dockerfile`
|
||||
```diff
|
||||
- FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
|
||||
+ FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04
|
||||
```
|
||||
|
||||
### 2. `bot/utils/stt_client.py`
|
||||
```diff
|
||||
- stt_url: str = "ws://miku-stt:8000/ws/stt"
|
||||
+ stt_url: str = "ws://miku-stt:8766/ws/stt"
|
||||
```
|
||||
|
||||
Added new methods:
|
||||
- `send_final()` - Request final transcription
|
||||
- `send_reset()` - Clear audio buffer
|
||||
|
||||
Updated `_handle_event()` to support:
|
||||
- New ONNX protocol: `{"type": "transcript", "is_final": true/false}`
|
||||
- Legacy protocol: `{"type": "partial"}`, `{"type": "final"}` (backward compatibility)
|
||||
|
||||
### 3. `bot/utils/voice_receiver.py` ⚠️ **KEY FIX**
|
||||
```diff
|
||||
- def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8000/ws/stt"):
|
||||
+ def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8766/ws/stt"):
|
||||
```
|
||||
|
||||
**This was the missing piece!** The `voice_receiver` was overriding the default URL.
|
||||
|
||||
---
|
||||
|
||||
## Container Status
|
||||
|
||||
### STT Container ✅
|
||||
```bash
|
||||
$ docker logs miku-stt 2>&1 | tail -10
|
||||
```
|
||||
```
|
||||
CUDA Version 12.6.2
|
||||
INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)]
|
||||
INFO:asr.asr_pipeline:Model loaded successfully
|
||||
INFO:__main__:Server running on ws://0.0.0.0:8766
|
||||
INFO:__main__:Active connections: 0
|
||||
```
|
||||
|
||||
**Status**: ✅ Running with CUDA acceleration
|
||||
|
||||
### Bot Container ✅
|
||||
- Files copied directly into running container (faster than rebuild)
|
||||
- Python bytecode cache cleared
|
||||
- Container restarted
|
||||
|
||||
---
|
||||
|
||||
## Testing Instructions
|
||||
|
||||
### Test 1: Basic Connection
|
||||
1. Join a voice channel in Discord
|
||||
2. Run `!miku listen`
|
||||
3. **Expected**: Bot connects without "Connection Refused" error
|
||||
4. **Check logs**: `docker logs miku-bot 2>&1 | grep "STT"`
|
||||
|
||||
### Test 2: Transcription
|
||||
1. After running `!miku listen`, speak into your microphone
|
||||
2. **Expected**: Your speech is transcribed
|
||||
3. **Check STT logs**: `docker logs miku-stt 2>&1 | tail -20`
|
||||
4. **Check bot logs**: Look for "Partial transcript" or "Final transcript" messages
|
||||
|
||||
### Test 3: Performance
|
||||
1. Monitor GPU usage: `nvidia-smi -l 1`
|
||||
2. **Expected**: GPU utilization increases when transcribing
|
||||
3. **Expected**: Transcription completes in ~0.5-1 second
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Commands
|
||||
|
||||
### Check Both Containers
|
||||
```bash
|
||||
docker logs -f --tail=50 miku-bot miku-stt
|
||||
```
|
||||
|
||||
### Check STT Service Health
|
||||
```bash
|
||||
docker ps | grep miku-stt
|
||||
docker logs miku-stt 2>&1 | grep "CUDA\|Providers\|Server running"
|
||||
```
|
||||
|
||||
### Check for Errors
|
||||
```bash
|
||||
# Bot errors
|
||||
docker logs miku-bot 2>&1 | grep -i "error\|failed" | tail -20
|
||||
|
||||
# STT errors
|
||||
docker logs miku-stt 2>&1 | grep -i "error\|failed" | tail -20
|
||||
```
|
||||
|
||||
### Test WebSocket Connection
|
||||
```bash
|
||||
# From host machine
|
||||
curl -i -N \
|
||||
-H "Connection: Upgrade" \
|
||||
-H "Upgrade: websocket" \
|
||||
-H "Sec-WebSocket-Version: 13" \
|
||||
-H "Sec-WebSocket-Key: test" \
|
||||
http://localhost:8766/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Known Issues & Workarounds
|
||||
|
||||
### Issue: Bot Still Shows Old Errors
|
||||
**Symptom**: After restart, logs still show port 8000 errors
|
||||
|
||||
**Cause**: Python module caching or log entries from before restart
|
||||
|
||||
**Solution**:
|
||||
```bash
|
||||
# Clear cache and restart
|
||||
docker exec miku-bot find /app -name "*.pyc" -delete
|
||||
docker restart miku-bot
|
||||
|
||||
# Wait 10 seconds for full restart
|
||||
sleep 10
|
||||
```
|
||||
|
||||
### Issue: Container Rebuild Takes 15+ Minutes
|
||||
**Cause**: `playwright install` downloads chromium/firefox browsers (~500MB)
|
||||
|
||||
**Workaround**: Instead of full rebuild, use `docker cp`:
|
||||
```bash
|
||||
docker cp bot/utils/stt_client.py miku-bot:/app/utils/stt_client.py
|
||||
docker cp bot/utils/voice_receiver.py miku-bot:/app/utils/voice_receiver.py
|
||||
docker restart miku-bot
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### For Full Deployment (after testing)
|
||||
1. Rebuild bot container properly:
|
||||
```bash
|
||||
docker-compose build miku-bot
|
||||
docker-compose up -d miku-bot
|
||||
```
|
||||
|
||||
2. Remove old STT directory:
|
||||
```bash
|
||||
mv stt stt.backup
|
||||
```
|
||||
|
||||
3. Update documentation to reflect new architecture
|
||||
|
||||
### Optional Enhancements
|
||||
1. Add `send_final()` call when user stops speaking (VAD integration)
|
||||
2. Implement progressive transcription display
|
||||
3. Add transcription quality metrics/logging
|
||||
4. Test with multiple simultaneous users
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Component | Old (NeMo) | New (ONNX) |
|
||||
|-----------|------------|------------|
|
||||
| **Port** | 8000 | 8766 |
|
||||
| **VRAM** | 4-5GB | 2-3GB |
|
||||
| **Speed** | 2-3s | 0.5-1s |
|
||||
| **cuDNN** | 8 | 9 |
|
||||
| **CUDA** | 12.1 | 12.6.2 |
|
||||
| **Protocol** | Auto VAD | Manual control |
|
||||
|
||||
---
|
||||
|
||||
**Status**: ✅ **ALL FIXES APPLIED - READY FOR USER TESTING**
|
||||
|
||||
Last Updated: January 18, 2026 20:47 EET
|
||||
Reference in New Issue
Block a user