Implemented experimental real production ready voice chat, relegated old flow to voice debug mode. New Web UI panel for Voice Chat.
This commit is contained in:
261
VOICE_CALL_AUTOMATION.md
Normal file
261
VOICE_CALL_AUTOMATION.md
Normal file
@@ -0,0 +1,261 @@
|
||||
# Voice Call Automation System
|
||||
|
||||
## Overview
|
||||
|
||||
Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.
|
||||
|
||||
## Features
|
||||
|
||||
### 1. Voice Debug Mode Toggle
|
||||
- **Environment Variable**: `VOICE_DEBUG_MODE` (default: `false`)
|
||||
- When `true`: Shows manual commands, text notifications, transcripts in chat
|
||||
- When `false` (field deployment): Silent operation, no command notifications
|
||||
|
||||
### 2. Automated Voice Call Flow
|
||||
|
||||
#### Initiation (Web UI → API)
|
||||
```
|
||||
POST /api/voice/call
|
||||
{
|
||||
"user_id": 123456789,
|
||||
"voice_channel_id": 987654321
|
||||
}
|
||||
```
|
||||
|
||||
#### What Happens:
|
||||
1. **Container Startup**: Starts `miku-stt` and `miku-rvc-api` containers
|
||||
2. **Warmup Wait**: Monitors containers until fully warmed up
|
||||
- STT: WebSocket connection check (30s timeout)
|
||||
- TTS: Health endpoint check for `warmed_up: true` (60s timeout)
|
||||
3. **Join Voice Channel**: Creates voice session with full resource locking
|
||||
4. **Send DM**: Generates personalized LLM invitation and sends with voice channel invite link
|
||||
5. **Auto-Listen**: Automatically starts listening when user joins
|
||||
|
||||
#### User Join Detection:
|
||||
- Monitors `on_voice_state_update` events
|
||||
- When target user joins:
|
||||
- Marks `user_has_joined = True`
|
||||
- Cancels 30min timeout
|
||||
- Auto-starts STT for that user
|
||||
|
||||
#### Auto-Leave After User Disconnect:
|
||||
- **45 second timer** starts when user leaves voice channel
|
||||
- If user doesn't rejoin within 45s:
|
||||
- Ends voice session
|
||||
- Stops STT and TTS containers
|
||||
- Releases all resources
|
||||
- Returns to normal operation
|
||||
- If user rejoins before 45s, timer is cancelled
|
||||
|
||||
#### 30-Minute Join Timeout:
|
||||
- If user never joins within 30 minutes:
|
||||
- Ends voice session
|
||||
- Stops containers
|
||||
- Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"
|
||||
|
||||
### 3. Container Management
|
||||
|
||||
**File**: `bot/utils/container_manager.py`
|
||||
|
||||
#### Methods:
|
||||
- `start_voice_containers()`: Starts STT & TTS, waits for warmup
|
||||
- `stop_voice_containers()`: Stops both containers
|
||||
- `are_containers_running()`: Check container status
|
||||
- `_wait_for_stt_warmup()`: WebSocket connection check
|
||||
- `_wait_for_tts_warmup()`: Health endpoint check
|
||||
|
||||
#### Warmup Detection:
|
||||
```python
|
||||
# STT Warmup: Try WebSocket connection
|
||||
ws://miku-stt:8765
|
||||
|
||||
# TTS Warmup: Check health endpoint
|
||||
GET http://miku-rvc-api:8765/health
|
||||
Response: {"status": "ready", "warmed_up": true}
|
||||
```
|
||||
|
||||
### 4. Voice Session Tracking
|
||||
|
||||
**File**: `bot/utils/voice_manager.py`
|
||||
|
||||
#### New VoiceSession Fields:
|
||||
```python
|
||||
call_user_id: Optional[int] # User ID that was called
|
||||
call_timeout_task: Optional[asyncio.Task] # 30min timeout
|
||||
user_has_joined: bool # Track if user joined
|
||||
auto_leave_task: Optional[asyncio.Task] # 45s auto-leave
|
||||
user_leave_time: Optional[float] # When user left
|
||||
```
|
||||
|
||||
#### Methods:
|
||||
- `on_user_join(user_id)`: Handle user joining voice channel
|
||||
- `on_user_leave(user_id)`: Start 45s auto-leave timer
|
||||
- `_auto_leave_after_user_disconnect()`: Execute auto-leave
|
||||
|
||||
### 5. LLM Context Update
|
||||
|
||||
Miku's voice chat prompt now includes:
|
||||
```
|
||||
NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,
|
||||
so you can mention this if asked about leaving
|
||||
```
|
||||
|
||||
### 6. Debug Mode Integration
|
||||
|
||||
#### With `VOICE_DEBUG_MODE=true`:
|
||||
- Shows "🎤 User said: ..." in text chat
|
||||
- Shows "💬 Miku: ..." responses
|
||||
- Shows interruption messages
|
||||
- Manual commands work (`!miku join`, `!miku listen`, etc.)
|
||||
|
||||
#### With `VOICE_DEBUG_MODE=false` (field deployment):
|
||||
- No text notifications
|
||||
- No command outputs
|
||||
- Silent operation
|
||||
- Only log files show activity
|
||||
|
||||
## API Endpoint
|
||||
|
||||
### POST `/api/voice/call`
|
||||
|
||||
**Request Body**:
|
||||
```json
|
||||
{
|
||||
"user_id": 123456789,
|
||||
"voice_channel_id": 987654321
|
||||
}
|
||||
```
|
||||
|
||||
**Success Response**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"user_id": 123456789,
|
||||
"channel_id": 987654321,
|
||||
"invite_url": "https://discord.gg/abc123"
|
||||
}
|
||||
```
|
||||
|
||||
**Error Response**:
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": "Failed to start voice containers"
|
||||
}
|
||||
```
|
||||
|
||||
## File Changes
|
||||
|
||||
### New Files:
|
||||
1. `bot/utils/container_manager.py` - Docker container management
|
||||
2. `VOICE_CALL_AUTOMATION.md` - This documentation
|
||||
|
||||
### Modified Files:
|
||||
1. `bot/globals.py` - Added `VOICE_DEBUG_MODE` flag
|
||||
2. `bot/api.py` - Added `/api/voice/call` endpoint and timeout handler
|
||||
3. `bot/bot.py` - Added `on_voice_state_update` event handler
|
||||
4. `bot/utils/voice_manager.py`:
|
||||
- Added call tracking fields to VoiceSession
|
||||
- Added `on_user_join()` and `on_user_leave()` methods
|
||||
- Added `_auto_leave_after_user_disconnect()` method
|
||||
- Updated LLM prompt with auto-disconnect context
|
||||
- Gated debug messages behind `VOICE_DEBUG_MODE`
|
||||
5. `bot/utils/voice_receiver.py` - Removed Discord VAD events (rely on RealtimeSTT only)
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
### Web UI Integration:
|
||||
- [ ] Create voice call trigger UI with user ID and channel ID inputs
|
||||
- [ ] Display call status (starting containers, waiting for warmup, joined VC, waiting for user)
|
||||
- [ ] Show timeout countdown
|
||||
- [ ] Handle errors gracefully
|
||||
|
||||
### Flow Testing:
|
||||
- [ ] Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)
|
||||
- [ ] Test 30min timeout (user never joins)
|
||||
- [ ] Test user rejoin within 45s (cancels auto-leave)
|
||||
- [ ] Test container failure handling
|
||||
- [ ] Test warmup timeout handling
|
||||
- [ ] Test DM failure (should continue anyway)
|
||||
|
||||
### Debug Mode:
|
||||
- [ ] Test with `VOICE_DEBUG_MODE=true` (should see all notifications)
|
||||
- [ ] Test with `VOICE_DEBUG_MODE=false` (should be silent)
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Add to `.env` or `docker-compose.yml`:
|
||||
```bash
|
||||
VOICE_DEBUG_MODE=false # Set to true for debugging
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Web UI**: Create voice call interface with:
|
||||
- User ID input
|
||||
- Voice channel ID dropdown (fetch from Discord)
|
||||
- "Call User" button
|
||||
- Status display
|
||||
- Active call management
|
||||
|
||||
2. **Monitoring**: Add voice call metrics:
|
||||
- Call duration
|
||||
- User join time
|
||||
- Auto-leave triggers
|
||||
- Container startup times
|
||||
|
||||
3. **Enhancements**:
|
||||
- Multiple simultaneous calls (different channels)
|
||||
- Call history logging
|
||||
- User preferences (auto-answer, DND mode)
|
||||
- Scheduled voice calls
|
||||
|
||||
## Technical Notes
|
||||
|
||||
### Container Warmup Times:
|
||||
- **STT** (`miku-stt`): ~5-15 seconds (model loading)
|
||||
- **TTS** (`miku-rvc-api`): ~30-60 seconds (RVC model loading, synthesis warmup)
|
||||
- **Total**: ~35-75 seconds from API call to ready
|
||||
|
||||
### Resource Management:
|
||||
- Voice sessions use `VoiceSessionManager` singleton
|
||||
- Only one voice session active at a time
|
||||
- Full resource locking during voice:
|
||||
- AMD GPU for text inference
|
||||
- Vision model blocked
|
||||
- Image generation disabled
|
||||
- Bipolar mode disabled
|
||||
- Autonomous engine paused
|
||||
|
||||
### Cleanup Guarantees:
|
||||
- 45s auto-leave ensures no orphaned sessions
|
||||
- 30min timeout prevents indefinite container running
|
||||
- All cleanup paths stop containers
|
||||
- Voice session end releases all resources
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Containers won't start:
|
||||
- Check Docker daemon status
|
||||
- Check `docker compose ps` for existing containers
|
||||
- Check logs: `docker logs miku-stt` / `docker logs miku-rvc-api`
|
||||
|
||||
### Warmup timeout:
|
||||
- STT: Check WebSocket is accepting connections on port 8765
|
||||
- TTS: Check health endpoint returns `{"warmed_up": true}`
|
||||
- Increase timeout values if needed (slow hardware)
|
||||
|
||||
### User never joins:
|
||||
- Verify invite URL is valid
|
||||
- Check user has permission to join voice channel
|
||||
- Verify DM was delivered (may be blocked)
|
||||
|
||||
### Auto-leave not triggering:
|
||||
- Check `on_voice_state_update` events are firing
|
||||
- Verify user ID matches `call_user_id`
|
||||
- Check logs for timer creation/cancellation
|
||||
|
||||
### Containers not stopping:
|
||||
- Manual stop: `docker compose stop miku-stt miku-rvc-api`
|
||||
- Check for orphaned containers: `docker ps`
|
||||
- Force remove: `docker rm -f miku-stt miku-rvc-api`
|
||||
Reference in New Issue
Block a user