Files
miku-discord/readmes/VOICE_CALL_AUTOMATION.md

262 lines
7.7 KiB
Markdown
Raw Permalink Normal View History

# Voice Call Automation System
## Overview
Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.
## Features
### 1. Voice Debug Mode Toggle
- **Environment Variable**: `VOICE_DEBUG_MODE` (default: `false`)
- When `true`: Shows manual commands, text notifications, transcripts in chat
- When `false` (field deployment): Silent operation, no command notifications
### 2. Automated Voice Call Flow
#### Initiation (Web UI → API)
```
POST /api/voice/call
{
"user_id": 123456789,
"voice_channel_id": 987654321
}
```
#### What Happens:
1. **Container Startup**: Starts `miku-stt` and `miku-rvc-api` containers
2. **Warmup Wait**: Monitors containers until fully warmed up
- STT: WebSocket connection check (30s timeout)
- TTS: Health endpoint check for `warmed_up: true` (60s timeout)
3. **Join Voice Channel**: Creates voice session with full resource locking
4. **Send DM**: Generates personalized LLM invitation and sends with voice channel invite link
5. **Auto-Listen**: Automatically starts listening when user joins
#### User Join Detection:
- Monitors `on_voice_state_update` events
- When target user joins:
- Marks `user_has_joined = True`
- Cancels 30min timeout
- Auto-starts STT for that user
#### Auto-Leave After User Disconnect:
- **45 second timer** starts when user leaves voice channel
- If user doesn't rejoin within 45s:
- Ends voice session
- Stops STT and TTS containers
- Releases all resources
- Returns to normal operation
- If user rejoins before 45s, timer is cancelled
#### 30-Minute Join Timeout:
- If user never joins within 30 minutes:
- Ends voice session
- Stops containers
- Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"
### 3. Container Management
**File**: `bot/utils/container_manager.py`
#### Methods:
- `start_voice_containers()`: Starts STT & TTS, waits for warmup
- `stop_voice_containers()`: Stops both containers
- `are_containers_running()`: Check container status
- `_wait_for_stt_warmup()`: WebSocket connection check
- `_wait_for_tts_warmup()`: Health endpoint check
#### Warmup Detection:
```python
# STT Warmup: Try WebSocket connection
ws://miku-stt:8765
# TTS Warmup: Check health endpoint
GET http://miku-rvc-api:8765/health
Response: {"status": "ready", "warmed_up": true}
```
### 4. Voice Session Tracking
**File**: `bot/utils/voice_manager.py`
#### New VoiceSession Fields:
```python
call_user_id: Optional[int] # User ID that was called
call_timeout_task: Optional[asyncio.Task] # 30min timeout
user_has_joined: bool # Track if user joined
auto_leave_task: Optional[asyncio.Task] # 45s auto-leave
user_leave_time: Optional[float] # When user left
```
#### Methods:
- `on_user_join(user_id)`: Handle user joining voice channel
- `on_user_leave(user_id)`: Start 45s auto-leave timer
- `_auto_leave_after_user_disconnect()`: Execute auto-leave
### 5. LLM Context Update
Miku's voice chat prompt now includes:
```
NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,
so you can mention this if asked about leaving
```
### 6. Debug Mode Integration
#### With `VOICE_DEBUG_MODE=true`:
- Shows "🎤 User said: ..." in text chat
- Shows "💬 Miku: ..." responses
- Shows interruption messages
- Manual commands work (`!miku join`, `!miku listen`, etc.)
#### With `VOICE_DEBUG_MODE=false` (field deployment):
- No text notifications
- No command outputs
- Silent operation
- Only log files show activity
## API Endpoint
### POST `/api/voice/call`
**Request Body**:
```json
{
"user_id": 123456789,
"voice_channel_id": 987654321
}
```
**Success Response**:
```json
{
"success": true,
"user_id": 123456789,
"channel_id": 987654321,
"invite_url": "https://discord.gg/abc123"
}
```
**Error Response**:
```json
{
"success": false,
"error": "Failed to start voice containers"
}
```
## File Changes
### New Files:
1. `bot/utils/container_manager.py` - Docker container management
2. `VOICE_CALL_AUTOMATION.md` - This documentation
### Modified Files:
1. `bot/globals.py` - Added `VOICE_DEBUG_MODE` flag
2. `bot/api.py` - Added `/api/voice/call` endpoint and timeout handler
3. `bot/bot.py` - Added `on_voice_state_update` event handler
4. `bot/utils/voice_manager.py`:
- Added call tracking fields to VoiceSession
- Added `on_user_join()` and `on_user_leave()` methods
- Added `_auto_leave_after_user_disconnect()` method
- Updated LLM prompt with auto-disconnect context
- Gated debug messages behind `VOICE_DEBUG_MODE`
5. `bot/utils/voice_receiver.py` - Removed Discord VAD events (rely on RealtimeSTT only)
## Testing Checklist
### Web UI Integration:
- [ ] Create voice call trigger UI with user ID and channel ID inputs
- [ ] Display call status (starting containers, waiting for warmup, joined VC, waiting for user)
- [ ] Show timeout countdown
- [ ] Handle errors gracefully
### Flow Testing:
- [ ] Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)
- [ ] Test 30min timeout (user never joins)
- [ ] Test user rejoin within 45s (cancels auto-leave)
- [ ] Test container failure handling
- [ ] Test warmup timeout handling
- [ ] Test DM failure (should continue anyway)
### Debug Mode:
- [ ] Test with `VOICE_DEBUG_MODE=true` (should see all notifications)
- [ ] Test with `VOICE_DEBUG_MODE=false` (should be silent)
## Environment Variables
Add to `.env` or `docker-compose.yml`:
```bash
VOICE_DEBUG_MODE=false # Set to true for debugging
```
## Next Steps
1. **Web UI**: Create voice call interface with:
- User ID input
- Voice channel ID dropdown (fetch from Discord)
- "Call User" button
- Status display
- Active call management
2. **Monitoring**: Add voice call metrics:
- Call duration
- User join time
- Auto-leave triggers
- Container startup times
3. **Enhancements**:
- Multiple simultaneous calls (different channels)
- Call history logging
- User preferences (auto-answer, DND mode)
- Scheduled voice calls
## Technical Notes
### Container Warmup Times:
- **STT** (`miku-stt`): ~5-15 seconds (model loading)
- **TTS** (`miku-rvc-api`): ~30-60 seconds (RVC model loading, synthesis warmup)
- **Total**: ~35-75 seconds from API call to ready
### Resource Management:
- Voice sessions use `VoiceSessionManager` singleton
- Only one voice session active at a time
- Full resource locking during voice:
- AMD GPU for text inference
- Vision model blocked
- Image generation disabled
- Bipolar mode disabled
- Autonomous engine paused
### Cleanup Guarantees:
- 45s auto-leave ensures no orphaned sessions
- 30min timeout prevents indefinite container running
- All cleanup paths stop containers
- Voice session end releases all resources
## Troubleshooting
### Containers won't start:
- Check Docker daemon status
- Check `docker compose ps` for existing containers
- Check logs: `docker logs miku-stt` / `docker logs miku-rvc-api`
### Warmup timeout:
- STT: Check WebSocket is accepting connections on port 8765
- TTS: Check health endpoint returns `{"warmed_up": true}`
- Increase timeout values if needed (slow hardware)
### User never joins:
- Verify invite URL is valid
- Check user has permission to join voice channel
- Verify DM was delivered (may be blocked)
### Auto-leave not triggering:
- Check `on_voice_state_update` events are firing
- Verify user ID matches `call_user_id`
- Check logs for timer creation/cancellation
### Containers not stopping:
- Manual stop: `docker compose stop miku-stt miku-rvc-api`
- Check for orphaned containers: `docker ps`
- Force remove: `docker rm -f miku-stt miku-rvc-api`