diff --git a/API_REFERENCE.md b/API_REFERENCE.md deleted file mode 100644 index 44ffd6d..0000000 --- a/API_REFERENCE.md +++ /dev/null @@ -1,460 +0,0 @@ -# Miku Discord Bot API Reference - -The Miku bot exposes a FastAPI REST API on port 3939 for controlling and monitoring the bot. - -## Base URL -``` -http://localhost:3939 -``` - -## API Endpoints - -### 📊 Status & Information - -#### `GET /status` -Get current bot status and overview. - -**Response:** -```json -{ - "status": "online", - "mood": "neutral", - "servers": 2, - "active_schedulers": 2, - "server_moods": { - "123456789": "bubbly", - "987654321": "excited" - } -} -``` - -#### `GET /logs` -Get the last 100 lines of bot logs. - -**Response:** Plain text log output - -#### `GET /prompt` -Get the last full prompt sent to the LLM. - -**Response:** -```json -{ - "prompt": "Last prompt text..." -} -``` - ---- - -### 😊 Mood Management - -#### `GET /mood` -Get current DM mood. - -**Response:** -```json -{ - "mood": "neutral", - "description": "Mood description text..." -} -``` - -#### `POST /mood` -Set DM mood. - -**Request Body:** -```json -{ - "mood": "bubbly" -} -``` - -**Response:** -```json -{ - "status": "ok", - "new_mood": "bubbly" -} -``` - -#### `POST /mood/reset` -Reset DM mood to neutral. - -#### `POST /mood/calm` -Calm Miku down (set to neutral). - -#### `GET /servers/{guild_id}/mood` -Get mood for specific server. - -#### `POST /servers/{guild_id}/mood` -Set mood for specific server. - -**Request Body:** -```json -{ - "mood": "excited" -} -``` - -#### `POST /servers/{guild_id}/mood/reset` -Reset server mood to neutral. - -#### `GET /servers/{guild_id}/mood/state` -Get complete mood state for server. - -#### `GET /moods/available` -List all available moods. - -**Response:** -```json -{ - "moods": { - "neutral": "😊", - "bubbly": "🥰", - "excited": "🤩", - "sleepy": "😴", - ... - } -} -``` - ---- - -### 😴 Sleep Management - -#### `POST /sleep` -Force Miku to sleep. - -#### `POST /wake` -Wake Miku up. - -#### `POST /bedtime?guild_id={guild_id}` -Send bedtime reminder. If `guild_id` is provided, sends only to that server. - ---- - -### 🤖 Autonomous Actions - -#### `POST /autonomous/general?guild_id={guild_id}` -Trigger autonomous general message. - -#### `POST /autonomous/engage?guild_id={guild_id}` -Trigger autonomous user engagement. - -#### `POST /autonomous/tweet?guild_id={guild_id}` -Trigger autonomous tweet sharing. - -#### `POST /autonomous/reaction?guild_id={guild_id}` -Trigger autonomous reaction to a message. - -#### `POST /autonomous/custom?guild_id={guild_id}` -Send custom autonomous message. - -**Request Body:** -```json -{ - "prompt": "Say something funny about cats" -} -``` - -#### `GET /autonomous/stats` -Get autonomous engine statistics for all servers. - -**Response:** Detailed stats including message counts, activity, mood profiles, etc. - -#### `GET /autonomous/v2/stats/{guild_id}` -Get autonomous V2 stats for specific server. - -#### `GET /autonomous/v2/check/{guild_id}` -Check if autonomous action should happen for server. - -#### `GET /autonomous/v2/status` -Get autonomous V2 status across all servers. - ---- - -### 🌐 Server Management - -#### `GET /servers` -List all configured servers. - -**Response:** -```json -{ - "servers": [ - { - "guild_id": 123456789, - "guild_name": "My Server", - "autonomous_channel_id": 987654321, - "autonomous_channel_name": "general", - "bedtime_channel_ids": [111111111], - "enabled_features": ["autonomous", "bedtime"] - } - ] -} -``` - -#### `POST /servers` -Add a new server configuration. - -**Request Body:** -```json -{ - "guild_id": 123456789, - "guild_name": "My Server", - "autonomous_channel_id": 987654321, - "autonomous_channel_name": "general", - "bedtime_channel_ids": [111111111], - "enabled_features": ["autonomous", "bedtime"] -} -``` - -#### `DELETE /servers/{guild_id}` -Remove server configuration. - -#### `PUT /servers/{guild_id}` -Update server configuration. - -#### `POST /servers/{guild_id}/bedtime-range` -Set bedtime range for server. - -#### `POST /servers/{guild_id}/memory` -Update server memory/context. - -#### `GET /servers/{guild_id}/memory` -Get server memory/context. - -#### `POST /servers/repair` -Repair server configurations. - ---- - -### 💬 DM Management - -#### `GET /dms/users` -List all users with DM history. - -**Response:** -```json -{ - "users": [ - { - "user_id": "123456789", - "username": "User#1234", - "total_messages": 42, - "last_message_date": "2025-12-10T12:34:56", - "is_blocked": false - } - ] -} -``` - -#### `GET /dms/users/{user_id}` -Get details for specific user. - -#### `GET /dms/users/{user_id}/conversations` -Get conversation history for user. - -#### `GET /dms/users/{user_id}/search?query={query}` -Search user's DM history. - -#### `GET /dms/users/{user_id}/export` -Export user's DM history. - -#### `DELETE /dms/users/{user_id}` -Delete user's DM data. - -#### `POST /dm/{user_id}/custom` -Send custom DM (LLM-generated). - -**Request Body:** -```json -{ - "prompt": "Ask about their day" -} -``` - -#### `POST /dm/{user_id}/manual` -Send manual DM (direct message). - -**Form Data:** -- `message`: Message text - -#### `GET /dms/blocked-users` -List blocked users. - -#### `POST /dms/users/{user_id}/block` -Block a user. - -#### `POST /dms/users/{user_id}/unblock` -Unblock a user. - -#### `POST /dms/users/{user_id}/conversations/{conversation_id}/delete` -Delete specific conversation. - -#### `POST /dms/users/{user_id}/conversations/delete-all` -Delete all conversations for user. - -#### `POST /dms/users/{user_id}/delete-completely` -Completely delete user data. - ---- - -### 📊 DM Analysis - -#### `POST /dms/analysis/run` -Run analysis on all DM conversations. - -#### `POST /dms/users/{user_id}/analyze` -Analyze specific user's DMs. - -#### `GET /dms/analysis/reports` -Get all analysis reports. - -#### `GET /dms/analysis/reports/{user_id}` -Get analysis report for specific user. - ---- - -### 🖼️ Profile Picture Management - -#### `POST /profile-picture/change?guild_id={guild_id}` -Change profile picture. Optionally upload custom image. - -**Form Data:** -- `file`: Image file (optional) - -**Response:** -```json -{ - "status": "ok", - "message": "Profile picture changed successfully", - "source": "danbooru", - "metadata": { - "url": "https://...", - "tags": ["hatsune_miku", "...] - } -} -``` - -#### `GET /profile-picture/metadata` -Get current profile picture metadata. - -#### `POST /profile-picture/restore-fallback` -Restore original fallback profile picture. - ---- - -### 🎨 Role Color Management - -#### `POST /role-color/custom` -Set custom role color. - -**Form Data:** -- `hex_color`: Hex color code (e.g., "#FF0000") - -#### `POST /role-color/reset-fallback` -Reset role color to fallback (#86cecb). - ---- - -### 💬 Conversation Management - -#### `GET /conversation/{user_id}` -Get conversation history for user. - -#### `POST /conversation/reset` -Reset conversation history. - -**Request Body:** -```json -{ - "user_id": "123456789" -} -``` - ---- - -### 📨 Manual Messaging - -#### `POST /manual/send` -Send manual message to channel. - -**Form Data:** -- `message`: Message text -- `channel_id`: Channel ID -- `files`: Files to attach (optional, multiple) - ---- - -### 🎁 Figurine Notifications - -#### `GET /figurines/subscribers` -List figurine subscribers. - -#### `POST /figurines/subscribers` -Add figurine subscriber. - -#### `DELETE /figurines/subscribers/{user_id}` -Remove figurine subscriber. - -#### `POST /figurines/send_now` -Send figurine notification to all subscribers. - -#### `POST /figurines/send_to_user` -Send figurine notification to specific user. - ---- - -### 🖼️ Image Generation - -#### `POST /image/generate` -Generate image using image generation service. - -#### `GET /image/status` -Get image generation service status. - -#### `POST /image/test-detection` -Test face detection on uploaded image. - ---- - -### 😀 Message Reactions - -#### `POST /messages/react` -Add reaction to a message. - -**Request Body:** -```json -{ - "channel_id": "123456789", - "message_id": "987654321", - "emoji": "😊" -} -``` - ---- - -## Error Responses - -All endpoints return errors in the following format: - -```json -{ - "status": "error", - "message": "Error description" -} -``` - -HTTP status codes: -- `200` - Success -- `400` - Bad request -- `404` - Not found -- `500` - Internal server error - -## Authentication - -Currently, the API does not require authentication. It's designed to run on localhost within a Docker network. - -## Rate Limiting - -No rate limiting is currently implemented. diff --git a/CHAT_INTERFACE_FEATURE.md b/CHAT_INTERFACE_FEATURE.md deleted file mode 100644 index 86bf0a5..0000000 --- a/CHAT_INTERFACE_FEATURE.md +++ /dev/null @@ -1,296 +0,0 @@ -# Chat Interface Feature Documentation - -## Overview -A new **"Chat with LLM"** tab has been added to the Miku bot Web UI, allowing you to chat directly with the language models with full streaming support (similar to ChatGPT). - -## Features - -### 1. Model Selection -- **💬 Text Model (Fast)**: Chat with the text-based LLM for quick conversations -- **👁️ Vision Model (Images)**: Use the vision model to analyze and discuss images - -### 2. System Prompt Options -- **✅ Use Miku Personality**: Attach the standard Miku personality system prompt - - Text model: Gets the full Miku character prompt (same as `query_llama`) - - Vision model: Gets a simplified Miku-themed image analysis prompt -- **❌ Raw LLM (No Prompt)**: Chat directly with the base LLM without any personality - - Great for testing raw model responses - - No character constraints - -### 3. Real-time Streaming -- Messages stream in character-by-character like ChatGPT -- Shows typing indicator while waiting for response -- Smooth, responsive interface - -### 4. Vision Model Support -- Upload images when using the vision model -- Image preview before sending -- Analyze images with Miku's personality or raw vision capabilities - -### 5. Chat Management -- Clear chat history button -- Timestamps on all messages -- Color-coded messages (user vs assistant) -- Auto-scroll to latest message -- Keyboard shortcut: **Ctrl+Enter** to send messages - -## Technical Implementation - -### Backend (api.py) - -#### New Endpoint: `POST /chat/stream` -```python -# Accepts: -{ - "message": "Your chat message", - "model_type": "text" | "vision", - "use_system_prompt": true | false, - "image_data": "base64_encoded_image" (optional, for vision model) -} - -# Returns: Server-Sent Events (SSE) stream -data: {"content": "streamed text chunk"} -data: {"done": true} -data: {"error": "error message"} -``` - -**Key Features:** -- Uses Server-Sent Events (SSE) for streaming -- Supports both `TEXT_MODEL` and `VISION_MODEL` from globals -- Dynamically switches system prompts based on configuration -- Integrates with llama.cpp's streaming API - -### Frontend (index.html) - -#### New Tab: "💬 Chat with LLM" -Located in the main navigation tabs (tab6) - -**Components:** -1. **Configuration Panel** - - Radio buttons for model selection - - Radio buttons for system prompt toggle - - Image upload section (shows/hides based on model) - - Clear chat history button - -2. **Chat Messages Container** - - Scrollable message history - - Animated message appearance - - Typing indicator during streaming - - Color-coded messages with timestamps - -3. **Input Area** - - Multi-line text input - - Send button with loading state - - Keyboard shortcuts - -**JavaScript Functions:** -- `sendChatMessage()`: Handles message sending and streaming reception -- `toggleChatImageUpload()`: Shows/hides image upload for vision model -- `addChatMessage()`: Adds messages to chat display -- `showTypingIndicator()` / `hideTypingIndicator()`: Typing animation -- `clearChatHistory()`: Clears all messages -- `handleChatKeyPress()`: Keyboard shortcuts - -## Usage Guide - -### Basic Text Chat with Miku -1. Go to "💬 Chat with LLM" tab -2. Ensure "💬 Text Model" is selected -3. Ensure "✅ Use Miku Personality" is selected -4. Type your message and click "📤 Send" (or press Ctrl+Enter) -5. Watch as Miku's response streams in real-time! - -### Raw LLM Testing -1. Select "💬 Text Model" -2. Select "❌ Raw LLM (No Prompt)" -3. Chat directly with the base language model without personality constraints - -### Vision Model Chat -1. Select "👁️ Vision Model" -2. Click "Upload Image" and select an image -3. Type a message about the image (e.g., "What do you see in this image?") -4. Click "📤 Send" -5. The vision model will analyze the image and respond - -### Vision Model with Miku Personality -1. Select "👁️ Vision Model" -2. Keep "✅ Use Miku Personality" selected -3. Upload an image -4. Miku will analyze and comment on the image with her cheerful personality! - -## System Prompts - -### Text Model (with Miku personality) -Uses the same comprehensive system prompt as `query_llama()`: -- Full Miku character context -- Current mood integration -- Character consistency rules -- Natural conversation guidelines - -### Vision Model (with Miku personality) -Simplified prompt optimized for image analysis: -``` -You are Hatsune Miku analyzing an image. Describe what you see naturally -and enthusiastically as Miku would. Be detailed but conversational. -React to what you see with Miku's cheerful, playful personality. -``` - -### No System Prompt -Both models respond without personality constraints when this option is selected. - -## Streaming Technology - -The interface uses **Server-Sent Events (SSE)** for real-time streaming: -- Backend sends chunked responses from llama.cpp -- Frontend receives and displays chunks as they arrive -- Smooth, ChatGPT-like experience -- Works with both text and vision models - -## UI/UX Features - -### Message Styling -- **User messages**: Green accent, right-aligned feel -- **Assistant messages**: Blue accent, left-aligned feel -- **Error messages**: Red accent with error icon -- **Fade-in animation**: Smooth appearance for new messages - -### Responsive Design -- Chat container scrolls automatically -- Image preview for vision model -- Loading states on buttons -- Typing indicators -- Custom scrollbar styling - -### Keyboard Shortcuts -- **Ctrl+Enter**: Send message quickly -- **Tab**: Navigate between input fields - -## Configuration Options - -All settings are preserved during the chat session: -- Model type (text/vision) -- System prompt toggle (Miku/Raw) -- Uploaded image (for vision model) - -Settings do NOT persist after page refresh (fresh session each time). - -## Error Handling - -The interface handles various errors gracefully: -- Connection failures -- Model errors -- Invalid image files -- Empty messages -- Timeout issues - -All errors are displayed in the chat with clear error messages. - -## Performance Considerations - -### Text Model -- Fast responses (typically 1-3 seconds) -- Streaming starts almost immediately -- Low latency - -### Vision Model -- Slower due to image processing -- First token may take 3-10 seconds -- Streaming continues once started -- Image is sent as base64 (efficient) - -## Development Notes - -### File Changes -1. **`bot/api.py`** - - Added `from fastapi.responses import StreamingResponse` - - Added `ChatMessage` Pydantic model - - Added `POST /chat/stream` endpoint with SSE support - -2. **`bot/static/index.html`** - - Added tab6 button in navigation - - Added complete chat interface HTML - - Added CSS styles for chat messages and animations - - Added JavaScript functions for chat functionality - -### Dependencies -- Uses existing `aiohttp` for HTTP streaming -- Uses existing `globals.TEXT_MODEL` and `globals.VISION_MODEL` -- Uses existing `globals.LLAMA_URL` for llama.cpp connection -- No new dependencies required! - -## Future Enhancements (Ideas) - -Potential improvements for future versions: -- [ ] Save/load chat sessions -- [ ] Export chat history to file -- [ ] Multi-user chat history (separate sessions per user) -- [ ] Temperature and max_tokens controls -- [ ] Model selection dropdown (if multiple models available) -- [ ] Token count display -- [ ] Voice input support -- [ ] Markdown rendering in responses -- [ ] Code syntax highlighting -- [ ] Copy message button -- [ ] Regenerate response button - -## Troubleshooting - -### "No response received from LLM" -- Check if llama.cpp server is running -- Verify `LLAMA_URL` in globals is correct -- Check bot logs for connection errors - -### "Failed to read image file" -- Ensure image is valid format (JPEG, PNG, GIF) -- Check file size (large images may cause issues) -- Try a different image - -### Streaming not working -- Check browser console for JavaScript errors -- Verify SSE is not blocked by proxy/firewall -- Try refreshing the page - -### Model not responding -- Check if correct model is loaded in llama.cpp -- Verify model type matches what's configured -- Check llama.cpp logs for errors - -## API Reference - -### POST /chat/stream - -**Request Body:** -```json -{ - "message": "string", // Required: User's message - "model_type": "text|vision", // Required: Which model to use - "use_system_prompt": boolean, // Required: Whether to add system prompt - "image_data": "string|null" // Optional: Base64 image for vision model -} -``` - -**Response:** -``` -Content-Type: text/event-stream - -data: {"content": "Hello"} -data: {"content": " there"} -data: {"content": "!"} -data: {"done": true} -``` - -**Error Response:** -``` -data: {"error": "Error message here"} -``` - -## Conclusion - -The Chat Interface provides a powerful, user-friendly way to: -- Test LLM responses interactively -- Experiment with different prompting strategies -- Analyze images with vision models -- Chat with Miku's personality in real-time -- Debug and understand model behavior - -All with a smooth, modern streaming interface that feels like ChatGPT! 🎉 diff --git a/CHAT_QUICK_START.md b/CHAT_QUICK_START.md deleted file mode 100644 index 48dae12..0000000 --- a/CHAT_QUICK_START.md +++ /dev/null @@ -1,148 +0,0 @@ -# Chat Interface - Quick Start Guide - -## 🚀 Quick Start - -### Access the Chat Interface -1. Open the Miku Control Panel in your browser -2. Click on the **"💬 Chat with LLM"** tab -3. Start chatting! - -## 📋 Configuration Options - -### Model Selection -- **💬 Text Model**: Fast text conversations -- **👁️ Vision Model**: Image analysis - -### System Prompt -- **✅ Use Miku Personality**: Chat with Miku's character -- **❌ Raw LLM**: Direct LLM without personality - -## 💡 Common Use Cases - -### 1. Chat with Miku -``` -Model: Text Model -System Prompt: Use Miku Personality -Message: "Hi Miku! How are you feeling today?" -``` - -### 2. Test Raw LLM -``` -Model: Text Model -System Prompt: Raw LLM -Message: "Explain quantum physics" -``` - -### 3. Analyze Images with Miku -``` -Model: Vision Model -System Prompt: Use Miku Personality -Upload: [your image] -Message: "What do you think of this image?" -``` - -### 4. Raw Image Analysis -``` -Model: Vision Model -System Prompt: Raw LLM -Upload: [your image] -Message: "Describe this image in detail" -``` - -## ⌨️ Keyboard Shortcuts -- **Ctrl+Enter**: Send message - -## 🎨 Features -- ✅ Real-time streaming (like ChatGPT) -- ✅ Image upload for vision model -- ✅ Color-coded messages -- ✅ Timestamps -- ✅ Typing indicators -- ✅ Auto-scroll -- ✅ Clear chat history - -## 🔧 System Prompts - -### Text Model with Miku -- Full Miku personality -- Current mood awareness -- Character consistency - -### Vision Model with Miku -- Miku analyzing images -- Cheerful, playful descriptions - -### No System Prompt -- Direct LLM responses -- No character constraints - -## 📊 Message Types - -### User Messages (Green) -- Your input -- Right-aligned appearance - -### Assistant Messages (Blue) -- Miku/LLM responses -- Left-aligned appearance -- Streams in real-time - -### Error Messages (Red) -- Connection errors -- Model errors -- Clear error descriptions - -## 🎯 Tips - -1. **Use Ctrl+Enter** for quick sending -2. **Select model first** before uploading images -3. **Clear history** to start fresh conversations -4. **Toggle system prompt** to compare responses -5. **Wait for streaming** to complete before sending next message - -## 🐛 Troubleshooting - -### No response? -- Check if llama.cpp is running -- Verify network connection -- Check browser console - -### Image not working? -- Switch to Vision Model -- Use valid image format (JPG, PNG) -- Check file size - -### Slow responses? -- Vision model is slower than text -- Wait for streaming to complete -- Check llama.cpp load - -## 📝 Examples - -### Example 1: Personality Test -**With Miku Personality:** -> User: "What's your favorite song?" -> Miku: "Oh, I love so many songs! But if I had to choose, I'd say 'World is Mine' holds a special place in my heart! It really captures that fun, playful energy that I love! ✨" - -**Without System Prompt:** -> User: "What's your favorite song?" -> LLM: "I don't have personal preferences as I'm an AI language model..." - -### Example 2: Image Analysis -**With Miku Personality:** -> User: [uploads sunset image] "What do you see?" -> Miku: "Wow! What a beautiful sunset! The sky is painted with such gorgeous oranges and pinks! It makes me want to write a song about it! The way the colors blend together is so dreamy and romantic~ 🌅💕" - -**Without System Prompt:** -> User: [uploads sunset image] "What do you see?" -> LLM: "This image shows a sunset landscape. The sky displays orange and pink hues. The sun is setting on the horizon. There are silhouettes of trees in the foreground." - -## 🎉 Enjoy Chatting! - -Have fun experimenting with different combinations of: -- Text vs Vision models -- With vs Without system prompts -- Different types of questions -- Various images (for vision model) - -The streaming interface makes it feel just like ChatGPT! 🚀 diff --git a/CLI_README.md b/CLI_README.md deleted file mode 100644 index d2b66f5..0000000 --- a/CLI_README.md +++ /dev/null @@ -1,347 +0,0 @@ -# Miku CLI - Command Line Interface - -A powerful command-line interface for controlling and monitoring the Miku Discord bot. - -## Installation - -1. Make the script executable: -```bash -chmod +x miku-cli.py -``` - -2. Install dependencies: -```bash -pip install requests -``` - -3. (Optional) Create a symlink for easier access: -```bash -sudo ln -s $(pwd)/miku-cli.py /usr/local/bin/miku -``` - -## Quick Start - -```bash -# Check bot status -./miku-cli.py status - -# Get current mood -./miku-cli.py mood --get - -# Set mood to bubbly -./miku-cli.py mood --set bubbly - -# List available moods -./miku-cli.py mood --list - -# Trigger autonomous message -./miku-cli.py autonomous general - -# List servers -./miku-cli.py servers - -# View logs -./miku-cli.py logs -``` - -## Configuration - -By default, the CLI connects to `http://localhost:3939`. To use a different URL: - -```bash -./miku-cli.py --url http://your-server:3939 status -``` - -## Commands - -### Status & Information - -```bash -# Get bot status -./miku-cli.py status - -# View recent logs -./miku-cli.py logs - -# Get last LLM prompt -./miku-cli.py prompt -``` - -### Mood Management - -```bash -# Get current DM mood -./miku-cli.py mood --get - -# Get server mood -./miku-cli.py mood --get --server 123456789 - -# Set mood -./miku-cli.py mood --set bubbly -./miku-cli.py mood --set excited --server 123456789 - -# Reset mood to neutral -./miku-cli.py mood --reset -./miku-cli.py mood --reset --server 123456789 - -# List available moods -./miku-cli.py mood --list -``` - -### Sleep Management - -```bash -# Put Miku to sleep -./miku-cli.py sleep - -# Wake Miku up -./miku-cli.py wake - -# Send bedtime reminder -./miku-cli.py bedtime -./miku-cli.py bedtime --server 123456789 -``` - -### Autonomous Actions - -```bash -# Trigger general autonomous message -./miku-cli.py autonomous general -./miku-cli.py autonomous general --server 123456789 - -# Trigger user engagement -./miku-cli.py autonomous engage -./miku-cli.py autonomous engage --server 123456789 - -# Share a tweet -./miku-cli.py autonomous tweet -./miku-cli.py autonomous tweet --server 123456789 - -# Trigger reaction -./miku-cli.py autonomous reaction -./miku-cli.py autonomous reaction --server 123456789 - -# Send custom autonomous message -./miku-cli.py autonomous custom --prompt "Tell a joke about programming" -./miku-cli.py autonomous custom --prompt "Say hello" --server 123456789 - -# Get autonomous stats -./miku-cli.py autonomous stats -``` - -### Server Management - -```bash -# List all configured servers -./miku-cli.py servers -``` - -### DM Management - -```bash -# List users with DM history -./miku-cli.py dm-users - -# Send custom DM (LLM-generated) -./miku-cli.py dm-custom 123456789 "Ask them how their day was" - -# Send manual DM (direct message) -./miku-cli.py dm-manual 123456789 "Hello! How are you?" - -# Block a user -./miku-cli.py block 123456789 - -# Unblock a user -./miku-cli.py unblock 123456789 - -# List blocked users -./miku-cli.py blocked-users -``` - -### Profile Picture - -```bash -# Change profile picture (search Danbooru based on mood) -./miku-cli.py change-pfp - -# Change to custom image -./miku-cli.py change-pfp --image /path/to/image.png - -# Change for specific server mood -./miku-cli.py change-pfp --server 123456789 - -# Get current profile picture metadata -./miku-cli.py pfp-metadata -``` - -### Conversation Management - -```bash -# Reset conversation history for a user -./miku-cli.py reset-conversation 123456789 -``` - -### Manual Messaging - -```bash -# Send message to channel -./miku-cli.py send 987654321 "Hello everyone!" - -# Send message with file attachments -./miku-cli.py send 987654321 "Check this out!" --files image.png document.pdf -``` - -## Available Moods - -- 😊 neutral -- 🥰 bubbly -- 🤩 excited -- 😴 sleepy -- 😡 angry -- 🙄 irritated -- 😏 flirty -- 💕 romantic -- 🤔 curious -- 😳 shy -- 🤪 silly -- 😢 melancholy -- 😤 serious -- 💤 asleep - -## Examples - -### Morning Routine -```bash -# Wake up Miku -./miku-cli.py wake - -# Set a bubbly mood -./miku-cli.py mood --set bubbly - -# Send a general message to all servers -./miku-cli.py autonomous general - -# Change profile picture to match mood -./miku-cli.py change-pfp -``` - -### Server-Specific Control -```bash -# Get server list -./miku-cli.py servers - -# Set mood for specific server -./miku-cli.py mood --set excited --server 123456789 - -# Trigger engagement on that server -./miku-cli.py autonomous engage --server 123456789 -``` - -### DM Interaction -```bash -# List users -./miku-cli.py dm-users - -# Send custom message -./miku-cli.py dm-custom 123456789 "Ask them about their favorite anime" - -# If user is spamming, block them -./miku-cli.py block 123456789 -``` - -### Monitoring -```bash -# Check status -./miku-cli.py status - -# View logs -./miku-cli.py logs - -# Get autonomous stats -./miku-cli.py autonomous stats - -# Check last prompt -./miku-cli.py prompt -``` - -## Output Format - -The CLI uses emoji and colored output for better readability: - -- ✅ Success messages -- ❌ Error messages -- 😊 Mood indicators -- 🌐 Server information -- 💬 DM information -- 📊 Statistics -- 🖼️ Media information - -## Scripting - -The CLI is designed to be script-friendly: - -```bash -#!/bin/bash - -# Morning routine script -./miku-cli.py wake -./miku-cli.py mood --set bubbly -./miku-cli.py autonomous general - -# Wait 5 minutes -sleep 300 - -# Engage users -./miku-cli.py autonomous engage -``` - -## Error Handling - -The CLI exits with status code 1 on errors and 0 on success, making it suitable for use in scripts: - -```bash -if ./miku-cli.py mood --set bubbly; then - echo "Mood set successfully" -else - echo "Failed to set mood" -fi -``` - -## API Reference - -For complete API documentation, see [API_REFERENCE.md](./API_REFERENCE.md). - -## Troubleshooting - -### Connection Refused -If you get "Connection refused" errors: -1. Check that the bot API is running on port 3939 -2. Verify the URL with `--url` parameter -3. Check Docker container status: `docker-compose ps` - -### Permission Denied -Make the script executable: -```bash -chmod +x miku-cli.py -``` - -### Import Errors -Install required dependencies: -```bash -pip install requests -``` - -## Future Enhancements - -Planned features: -- Configuration file support (~/.miku-cli.conf) -- Interactive mode -- Tab completion -- Color output control -- JSON output mode for scripting -- Batch operations -- Watch mode for real-time monitoring - -## Contributing - -Feel free to extend the CLI with additional commands and features! diff --git a/DUAL_GPU_BUILD_SUMMARY.md b/DUAL_GPU_BUILD_SUMMARY.md deleted file mode 100644 index acf7430..0000000 --- a/DUAL_GPU_BUILD_SUMMARY.md +++ /dev/null @@ -1,184 +0,0 @@ -# Dual GPU Setup Summary - -## What We Built - -A secondary llama-swap container optimized for your **AMD RX 6800** GPU using ROCm. - -### Architecture - -``` -Primary GPU (NVIDIA GTX 1660) Secondary GPU (AMD RX 6800) - ↓ ↓ - llama-swap (CUDA) llama-swap-amd (ROCm) - Port: 8090 Port: 8091 - ↓ ↓ - NVIDIA models AMD models - - llama3.1 - llama3.1-amd - - darkidol - darkidol-amd - - vision (MiniCPM) - moondream-amd -``` - -## Files Created - -1. **Dockerfile.llamaswap-rocm** - Custom multi-stage build: - - Stage 1: Builds llama.cpp with ROCm from source - - Stage 2: Builds llama-swap from source - - Stage 3: Runtime image with both binaries - -2. **llama-swap-rocm-config.yaml** - Model configuration for AMD GPU - -3. **docker-compose.yml** - Updated with `llama-swap-amd` service - -4. **bot/utils/gpu_router.py** - Load balancing utility - -5. **bot/globals.py** - Updated with `LLAMA_AMD_URL` - -6. **setup-dual-gpu.sh** - Setup verification script - -7. **DUAL_GPU_SETUP.md** - Comprehensive documentation - -8. **DUAL_GPU_QUICK_REF.md** - Quick reference guide - -## Why Custom Build? - -- llama.cpp doesn't publish ROCm Docker images (yet) -- llama-swap doesn't provide ROCm variants -- Building from source ensures latest ROCm compatibility -- Full control over compilation flags and optimization - -## Build Time - -The initial build takes 15-30 minutes depending on your system: -- llama.cpp compilation: ~10-20 minutes -- llama-swap compilation: ~1-2 minutes -- Image layering: ~2-5 minutes - -Subsequent builds are much faster due to Docker layer caching. - -## Next Steps - -Once the build completes: - -```bash -# 1. Start both GPU services -docker compose up -d llama-swap llama-swap-amd - -# 2. Verify both are running -docker compose ps - -# 3. Test NVIDIA GPU -curl http://localhost:8090/health - -# 4. Test AMD GPU -curl http://localhost:8091/health - -# 5. Monitor logs -docker compose logs -f llama-swap-amd - -# 6. Test model loading on AMD -curl -X POST http://localhost:8091/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "llama3.1-amd", - "messages": [{"role": "user", "content": "Hello!"}], - "max_tokens": 50 - }' -``` - -## Device Access - -The AMD container has access to: -- `/dev/kfd` - AMD GPU kernel driver -- `/dev/dri` - Direct Rendering Infrastructure -- Groups: `video`, `render` - -## Environment Variables - -RX 6800 specific settings: -```yaml -HSA_OVERRIDE_GFX_VERSION=10.3.0 # Navi 21 (gfx1030) compatibility -ROCM_PATH=/opt/rocm -HIP_VISIBLE_DEVICES=0 # Use first AMD GPU -``` - -## Bot Integration - -Your bot now has two endpoints available: - -```python -import globals - -# NVIDIA GPU (primary) -nvidia_url = globals.LLAMA_URL # http://llama-swap:8080 - -# AMD GPU (secondary) -amd_url = globals.LLAMA_AMD_URL # http://llama-swap-amd:8080 -``` - -Use the `gpu_router` utility for automatic load balancing: - -```python -from bot.utils.gpu_router import get_llama_url_with_load_balancing - -# Round-robin between GPUs -url, model = get_llama_url_with_load_balancing(task_type="text") - -# Prefer AMD for vision -url, model = get_llama_url_with_load_balancing( - task_type="vision", - prefer_amd=True -) -``` - -## Troubleshooting - -If the AMD container fails to start: - -1. **Check build logs:** - ```bash - docker compose build --no-cache llama-swap-amd - ``` - -2. **Verify GPU access:** - ```bash - ls -l /dev/kfd /dev/dri - ``` - -3. **Check container logs:** - ```bash - docker compose logs llama-swap-amd - ``` - -4. **Test GPU from host:** - ```bash - lspci | grep -i amd - # Should show: Radeon RX 6800 - ``` - -## Performance Notes - -**RX 6800 Specs:** -- VRAM: 16GB -- Architecture: RDNA 2 (Navi 21) -- Compute: gfx1030 - -**Recommended Models:** -- Q4_K_M quantization: 5-6GB per model -- Can load 2-3 models simultaneously -- Good for: Llama 3.1 8B, DarkIdol 8B, Moondream2 - -## Future Improvements - -1. **Automatic failover:** Route to AMD if NVIDIA is busy -2. **Health monitoring:** Track GPU utilization -3. **Dynamic routing:** Use least-busy GPU -4. **VRAM monitoring:** Alert before OOM -5. **Model preloading:** Keep common models loaded - -## Resources - -- [ROCm Documentation](https://rocmdocs.amd.com/) -- [llama.cpp ROCm Build](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm) -- [llama-swap GitHub](https://github.com/mostlygeek/llama-swap) -- [Full Setup Guide](./DUAL_GPU_SETUP.md) -- [Quick Reference](./DUAL_GPU_QUICK_REF.md) diff --git a/DUAL_GPU_QUICK_REF.md b/DUAL_GPU_QUICK_REF.md deleted file mode 100644 index 0439379..0000000 --- a/DUAL_GPU_QUICK_REF.md +++ /dev/null @@ -1,194 +0,0 @@ -# Dual GPU Quick Reference - -## Quick Start - -```bash -# 1. Run setup check -./setup-dual-gpu.sh - -# 2. Build AMD container -docker compose build llama-swap-amd - -# 3. Start both GPUs -docker compose up -d llama-swap llama-swap-amd - -# 4. Verify -curl http://localhost:8090/health # NVIDIA -curl http://localhost:8091/health # AMD RX 6800 -``` - -## Endpoints - -| GPU | Container | Port | Internal URL | -|-----|-----------|------|--------------| -| NVIDIA | llama-swap | 8090 | http://llama-swap:8080 | -| AMD RX 6800 | llama-swap-amd | 8091 | http://llama-swap-amd:8080 | - -## Models - -### NVIDIA GPU (Primary) -- `llama3.1` - Llama 3.1 8B Instruct -- `darkidol` - DarkIdol Uncensored 8B -- `vision` - MiniCPM-V-4.5 (4K context) - -### AMD RX 6800 (Secondary) -- `llama3.1-amd` - Llama 3.1 8B Instruct -- `darkidol-amd` - DarkIdol Uncensored 8B -- `moondream-amd` - Moondream2 Vision (2K context) - -## Commands - -### Start/Stop -```bash -# Start both -docker compose up -d llama-swap llama-swap-amd - -# Start only AMD -docker compose up -d llama-swap-amd - -# Stop AMD -docker compose stop llama-swap-amd - -# Restart AMD with logs -docker compose restart llama-swap-amd && docker compose logs -f llama-swap-amd -``` - -### Monitoring -```bash -# Container status -docker compose ps - -# Logs -docker compose logs -f llama-swap-amd - -# GPU usage -watch -n 1 nvidia-smi # NVIDIA -watch -n 1 rocm-smi # AMD - -# Resource usage -docker stats llama-swap llama-swap-amd -``` - -### Testing -```bash -# List available models -curl http://localhost:8091/v1/models | jq - -# Test text generation (AMD) -curl -X POST http://localhost:8091/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "llama3.1-amd", - "messages": [{"role": "user", "content": "Say hello!"}], - "max_tokens": 20 - }' | jq - -# Test vision model (AMD) -curl -X POST http://localhost:8091/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "moondream-amd", - "messages": [{ - "role": "user", - "content": [ - {"type": "text", "text": "Describe this image"}, - {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} - ] - }], - "max_tokens": 100 - }' | jq -``` - -## Bot Integration - -### Using GPU Router -```python -from bot.utils.gpu_router import get_llama_url_with_load_balancing, get_endpoint_for_model - -# Load balanced text generation -url, model = get_llama_url_with_load_balancing(task_type="text") - -# Specific model -url = get_endpoint_for_model("darkidol-amd") - -# Vision on AMD -url, model = get_llama_url_with_load_balancing(task_type="vision", prefer_amd=True) -``` - -### Direct Access -```python -import globals - -# AMD GPU -amd_url = globals.LLAMA_AMD_URL # http://llama-swap-amd:8080 - -# NVIDIA GPU -nvidia_url = globals.LLAMA_URL # http://llama-swap:8080 -``` - -## Troubleshooting - -### AMD Container Won't Start -```bash -# Check ROCm -rocm-smi - -# Check permissions -ls -l /dev/kfd /dev/dri - -# Check logs -docker compose logs llama-swap-amd - -# Rebuild -docker compose build --no-cache llama-swap-amd -``` - -### Model Won't Load -```bash -# Check VRAM -rocm-smi --showmeminfo vram - -# Lower GPU layers in llama-swap-rocm-config.yaml -# Change: -ngl 99 -# To: -ngl 50 -``` - -### GFX Version Error -```bash -# RX 6800 is gfx1030 -# Ensure in docker-compose.yml: -HSA_OVERRIDE_GFX_VERSION=10.3.0 -``` - -## Environment Variables - -Add to `docker-compose.yml` under `miku-bot` service: - -```yaml -environment: - - PREFER_AMD_GPU=true # Prefer AMD for load balancing - - AMD_MODELS_ENABLED=true # Enable AMD models - - LLAMA_AMD_URL=http://llama-swap-amd:8080 -``` - -## Files - -- `Dockerfile.llamaswap-rocm` - ROCm container -- `llama-swap-rocm-config.yaml` - AMD model config -- `bot/utils/gpu_router.py` - Load balancing utility -- `DUAL_GPU_SETUP.md` - Full documentation -- `setup-dual-gpu.sh` - Setup verification script - -## Performance Tips - -1. **Model Selection**: Use Q4_K quantization for best size/quality balance -2. **VRAM**: RX 6800 has 16GB - can run 2-3 Q4 models -3. **TTL**: Adjust in config files (1800s = 30min default) -4. **Context**: Lower context size (`-c 8192`) to save VRAM -5. **GPU Layers**: `-ngl 99` uses full GPU, lower if needed - -## Support - -- ROCm Docs: https://rocmdocs.amd.com/ -- llama.cpp: https://github.com/ggml-org/llama.cpp -- llama-swap: https://github.com/mostlygeek/llama-swap diff --git a/DUAL_GPU_SETUP.md b/DUAL_GPU_SETUP.md deleted file mode 100644 index 9ac9749..0000000 --- a/DUAL_GPU_SETUP.md +++ /dev/null @@ -1,321 +0,0 @@ -# Dual GPU Setup - NVIDIA + AMD RX 6800 - -This document describes the dual-GPU configuration for running two llama-swap instances simultaneously: -- **Primary GPU (NVIDIA)**: Runs main models via CUDA -- **Secondary GPU (AMD RX 6800)**: Runs additional models via ROCm - -## Architecture Overview - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Miku Bot │ -│ │ -│ LLAMA_URL=http://llama-swap:8080 (NVIDIA) │ -│ LLAMA_AMD_URL=http://llama-swap-amd:8080 (AMD RX 6800) │ -└─────────────────────────────────────────────────────────────┘ - │ │ - │ │ - ▼ ▼ - ┌──────────────────┐ ┌──────────────────┐ - │ llama-swap │ │ llama-swap-amd │ - │ (CUDA) │ │ (ROCm) │ - │ Port: 8090 │ │ Port: 8091 │ - └──────────────────┘ └──────────────────┘ - │ │ - ▼ ▼ - ┌──────────────────┐ ┌──────────────────┐ - │ NVIDIA GPU │ │ AMD RX 6800 │ - │ - llama3.1 │ │ - llama3.1-amd │ - │ - darkidol │ │ - darkidol-amd │ - │ - vision │ │ - moondream-amd │ - └──────────────────┘ └──────────────────┘ -``` - -## Files Created - -1. **Dockerfile.llamaswap-rocm** - ROCm-enabled Docker image for AMD GPU -2. **llama-swap-rocm-config.yaml** - Model configuration for AMD models -3. **docker-compose.yml** - Updated with `llama-swap-amd` service - -## Configuration Details - -### llama-swap-amd Service - -```yaml -llama-swap-amd: - build: - context: . - dockerfile: Dockerfile.llamaswap-rocm - container_name: llama-swap-amd - ports: - - "8091:8080" # External access on port 8091 - volumes: - - ./models:/models - - ./llama-swap-rocm-config.yaml:/app/config.yaml - devices: - - /dev/kfd:/dev/kfd # AMD GPU kernel driver - - /dev/dri:/dev/dri # Direct Rendering Infrastructure - group_add: - - video - - render - environment: - - HSA_OVERRIDE_GFX_VERSION=10.3.0 # RX 6800 (Navi 21) compatibility -``` - -### Available Models on AMD GPU - -From `llama-swap-rocm-config.yaml`: - -- **llama3.1-amd** - Llama 3.1 8B text model -- **darkidol-amd** - DarkIdol uncensored model -- **moondream-amd** - Moondream2 vision model (smaller, AMD-optimized) - -### Model Aliases - -You can access AMD models using these aliases: -- `llama3.1-amd`, `text-model-amd`, `amd-text` -- `darkidol-amd`, `evil-model-amd`, `uncensored-amd` -- `moondream-amd`, `vision-amd`, `moondream` - -## Usage - -### Building and Starting Services - -```bash -# Build the AMD ROCm container -docker compose build llama-swap-amd - -# Start both GPU services -docker compose up -d llama-swap llama-swap-amd - -# Check logs -docker compose logs -f llama-swap-amd -``` - -### Accessing AMD Models from Bot Code - -In your bot code, you can now use either endpoint: - -```python -import globals - -# Use NVIDIA GPU (primary) -nvidia_response = requests.post( - f"{globals.LLAMA_URL}/v1/chat/completions", - json={"model": "llama3.1", ...} -) - -# Use AMD GPU (secondary) -amd_response = requests.post( - f"{globals.LLAMA_AMD_URL}/v1/chat/completions", - json={"model": "llama3.1-amd", ...} -) -``` - -### Load Balancing Strategy - -You can implement load balancing by: - -1. **Round-robin**: Alternate between GPUs for text generation -2. **Task-specific**: - - NVIDIA: Primary text + MiniCPM vision (heavy) - - AMD: Secondary text + Moondream vision (lighter) -3. **Failover**: Use AMD as backup if NVIDIA is busy - -Example load balancing function: - -```python -import random -import globals - -def get_llama_url(prefer_amd=False): - """Get llama URL with optional load balancing""" - if prefer_amd: - return globals.LLAMA_AMD_URL - - # Random load balancing for text models - return random.choice([globals.LLAMA_URL, globals.LLAMA_AMD_URL]) -``` - -## Testing - -### Test NVIDIA GPU (Port 8090) -```bash -curl http://localhost:8090/health -curl http://localhost:8090/v1/models -``` - -### Test AMD GPU (Port 8091) -```bash -curl http://localhost:8091/health -curl http://localhost:8091/v1/models -``` - -### Test Model Loading (AMD) -```bash -curl -X POST http://localhost:8091/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "llama3.1-amd", - "messages": [{"role": "user", "content": "Hello from AMD GPU!"}], - "max_tokens": 50 - }' -``` - -## Monitoring - -### Check GPU Usage - -**AMD GPU:** -```bash -# ROCm monitoring -rocm-smi - -# Or from host -watch -n 1 rocm-smi -``` - -**NVIDIA GPU:** -```bash -nvidia-smi -watch -n 1 nvidia-smi -``` - -### Check Container Resource Usage -```bash -docker stats llama-swap llama-swap-amd -``` - -## Troubleshooting - -### AMD GPU Not Detected - -1. Verify ROCm is installed on host: - ```bash - rocm-smi --version - ``` - -2. Check device permissions: - ```bash - ls -l /dev/kfd /dev/dri - ``` - -3. Verify RX 6800 compatibility: - ```bash - rocminfo | grep "Name:" - ``` - -### Model Loading Issues - -If models fail to load on AMD: - -1. Check VRAM availability: - ```bash - rocm-smi --showmeminfo vram - ``` - -2. Adjust `-ngl` (GPU layers) in config if needed: - ```yaml - # Reduce GPU layers for smaller VRAM - cmd: /app/llama-server ... -ngl 50 ... # Instead of 99 - ``` - -3. Check container logs: - ```bash - docker compose logs llama-swap-amd - ``` - -### GFX Version Mismatch - -RX 6800 is Navi 21 (gfx1030). If you see GFX errors: - -```bash -# Set in docker-compose.yml environment: -HSA_OVERRIDE_GFX_VERSION=10.3.0 -``` - -### llama-swap Build Issues - -If the ROCm container fails to build: - -1. The Dockerfile attempts to build llama-swap from source -2. Alternative: Use pre-built binary or simpler proxy setup -3. Check build logs: `docker compose build --no-cache llama-swap-amd` - -## Performance Considerations - -### Memory Usage - -- **RX 6800**: 16GB VRAM - - Q4_K_M/Q4_K_XL models: ~5-6GB each - - Can run 2 models simultaneously or 1 with long context - -### Model Selection - -**Best for AMD RX 6800:** -- ✅ Q4_K_M/Q4_K_S quantized models (5-6GB) -- ✅ Moondream2 vision (smaller, efficient) -- ⚠️ MiniCPM-V-4.5 (possible but may be tight on VRAM) - -### TTL Configuration - -Adjust model TTL in `llama-swap-rocm-config.yaml`: -- Lower TTL = more aggressive unloading = more VRAM available -- Higher TTL = less model swapping = faster response times - -## Advanced: Model-Specific Routing - -Create a helper function to route models automatically: - -```python -# bot/utils/gpu_router.py -import globals - -MODEL_TO_GPU = { - # NVIDIA models - "llama3.1": globals.LLAMA_URL, - "darkidol": globals.LLAMA_URL, - "vision": globals.LLAMA_URL, - - # AMD models - "llama3.1-amd": globals.LLAMA_AMD_URL, - "darkidol-amd": globals.LLAMA_AMD_URL, - "moondream-amd": globals.LLAMA_AMD_URL, -} - -def get_endpoint_for_model(model_name): - """Get the correct llama-swap endpoint for a model""" - return MODEL_TO_GPU.get(model_name, globals.LLAMA_URL) - -def is_amd_model(model_name): - """Check if model runs on AMD GPU""" - return model_name.endswith("-amd") -``` - -## Environment Variables - -Add these to control GPU selection: - -```yaml -# In docker-compose.yml -environment: - - LLAMA_URL=http://llama-swap:8080 - - LLAMA_AMD_URL=http://llama-swap-amd:8080 - - PREFER_AMD_GPU=false # Set to true to prefer AMD for general tasks - - AMD_MODELS_ENABLED=true # Enable/disable AMD models -``` - -## Future Enhancements - -1. **Automatic load balancing**: Monitor GPU utilization and route requests -2. **Health checks**: Fallback to primary GPU if AMD fails -3. **Model distribution**: Automatically assign models to GPUs based on VRAM -4. **Performance metrics**: Track response times per GPU -5. **Dynamic routing**: Use least-busy GPU for new requests - -## References - -- [ROCm Documentation](https://rocmdocs.amd.com/) -- [llama.cpp ROCm Support](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm) -- [llama-swap GitHub](https://github.com/mostlygeek/llama-swap) -- [AMD GPU Compatibility Matrix](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html) diff --git a/ERROR_HANDLING_QUICK_REF.md b/ERROR_HANDLING_QUICK_REF.md deleted file mode 100644 index 6a9342e..0000000 --- a/ERROR_HANDLING_QUICK_REF.md +++ /dev/null @@ -1,78 +0,0 @@ -# Error Handling Quick Reference - -## What Changed - -When Miku encounters an error (like "Error 502" from llama-swap), she now says: -``` -"Someone tell Koko-nii there is a problem with my AI." -``` - -And sends you a webhook notification with full error details. - -## Webhook Details - -**Webhook URL**: `https://discord.com/api/webhooks/1462216811293708522/...` -**Mentions**: @Koko-nii (User ID: 344584170839236608) - -## Error Notification Format - -``` -🚨 Miku Bot Error -━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ - -Error Message: - Error: 502 - -User: username#1234 -Channel: #general -Server: Guild ID: 123456789 -User Prompt: - Hi Miku! How are you? - -Exception Type: HTTPError -Traceback: - [Full Python traceback] -``` - -## Files Changed - -1. **NEW**: `bot/utils/error_handler.py` - - Main error handling logic - - Webhook notifications - - Error detection - -2. **MODIFIED**: `bot/utils/llm.py` - - Added error handling to `query_llama()` - - Prevents errors in conversation history - - Catches all exceptions and HTTP errors - -3. **NEW**: `bot/test_error_handler.py` - - Test suite for error detection - - 26 test cases - -4. **NEW**: `ERROR_HANDLING_SYSTEM.md` - - Full documentation - -## Testing - -```bash -cd /home/koko210Serve/docker/miku-discord/bot -python test_error_handler.py -``` - -Expected: ✓ All 26 tests passed! - -## Coverage - -✅ Works with both llama-swap (NVIDIA) and llama-swap-rocm (AMD) -✅ Handles all message types (DMs, server messages, autonomous) -✅ Catches connection errors, timeouts, HTTP errors -✅ Prevents errors from polluting conversation history - -## No Changes Required - -No configuration changes needed. The system is automatically active for: -- All direct messages to Miku -- All server messages mentioning Miku -- All autonomous messages -- All LLM queries via `query_llama()` diff --git a/ERROR_HANDLING_SYSTEM.md b/ERROR_HANDLING_SYSTEM.md deleted file mode 100644 index 11b75a9..0000000 --- a/ERROR_HANDLING_SYSTEM.md +++ /dev/null @@ -1,131 +0,0 @@ -# Error Handling System - -## Overview - -The Miku bot now includes a comprehensive error handling system that catches errors from the llama-swap containers (both NVIDIA and AMD) and provides user-friendly responses while notifying the bot administrator. - -## Features - -### 1. Error Detection -The system automatically detects various types of errors including: -- HTTP error codes (502, 500, 503, etc.) -- Connection errors (refused, timeout, failed) -- LLM server errors -- Timeout errors -- Generic error messages - -### 2. User-Friendly Responses -When an error is detected, instead of showing technical error messages like "Error 502" or "Sorry, there was an error", Miku will respond with: - -> **"Someone tell Koko-nii there is a problem with my AI."** - -This keeps Miku in character and provides a better user experience. - -### 3. Administrator Notifications -When an error occurs, a webhook notification is automatically sent to Discord with: -- **Error Message**: The full error text from the container -- **Context Information**: - - User who triggered the error - - Channel/Server where the error occurred - - User's prompt that caused the error - - Exception type (if applicable) - - Full traceback (if applicable) -- **Mention**: Automatically mentions Koko-nii for immediate attention - -### 4. Conversation History Protection -Error messages are NOT saved to conversation history, preventing errors from polluting the context for future interactions. - -## Implementation Details - -### Files Modified - -1. **`bot/utils/error_handler.py`** (NEW) - - Core error detection and webhook notification logic - - `is_error_response()`: Detects error messages using regex patterns - - `handle_llm_error()`: Handles exceptions from the LLM - - `handle_response_error()`: Handles error responses from the LLM - - `send_error_webhook()`: Sends formatted error notifications - -2. **`bot/utils/llm.py`** - - Integrated error handling into `query_llama()` function - - Catches all exceptions and HTTP errors - - Filters responses to detect error messages - - Prevents error messages from being saved to history - -### Webhook URL -``` -https://discord.com/api/webhooks/1462216811293708522/4kdGenpxZFsP0z3VBgebYENODKmcRrmEzoIwCN81jCirnAxuU2YvxGgwGCNBb6TInA9Z -``` - -## Error Detection Patterns - -The system detects errors using the following patterns: -- `Error: XXX` or `Error XXX` (with HTTP status codes) -- `XXX Error` format -- "Sorry, there was an error" -- "Sorry, the response took too long" -- Connection-related errors (refused, timeout, failed) -- Server errors (service unavailable, internal server error, bad gateway) -- HTTP status codes >= 400 - -## Coverage - -The error handler is automatically applied to: -- ✅ Direct messages to Miku -- ✅ Server messages mentioning Miku -- ✅ Autonomous messages (general, engaging users, tweets) -- ✅ Conversation joining -- ✅ All responses using `query_llama()` -- ✅ Both NVIDIA and AMD GPU containers - -## Testing - -A test suite is included in `bot/test_error_handler.py` that validates the error detection logic with 26 test cases covering: -- Various error message formats -- Normal responses (should NOT be detected as errors) -- HTTP status codes -- Edge cases - -Run tests with: -```bash -cd /home/koko210Serve/docker/miku-discord/bot -python test_error_handler.py -``` - -## Example Scenarios - -### Scenario 1: llama-swap Container Down -**User**: "Hi Miku!" -**Without Error Handler**: "Error: 502" -**With Error Handler**: "Someone tell Koko-nii there is a problem with my AI." -**Webhook Notification**: Sent with full error details - -### Scenario 2: Connection Timeout -**User**: "Tell me a story" -**Without Error Handler**: "Sorry, the response took too long. Please try again." -**With Error Handler**: "Someone tell Koko-nii there is a problem with my AI." -**Webhook Notification**: Sent with timeout exception details - -### Scenario 3: LLM Server Error -**User**: "How are you?" -**Without Error Handler**: "Error: Internal server error" -**With Error Handler**: "Someone tell Koko-nii there is a problem with my AI." -**Webhook Notification**: Sent with HTTP 500 error details - -## Benefits - -1. **Better User Experience**: Users see a friendly, in-character message instead of technical errors -2. **Immediate Notifications**: Administrator is notified immediately via Discord webhook -3. **Detailed Context**: Full error information is provided for debugging -4. **Clean History**: Errors don't pollute conversation history -5. **Consistent Handling**: All error types are handled uniformly -6. **Container Agnostic**: Works with both NVIDIA and AMD containers - -## Future Enhancements - -Potential improvements: -- Add retry logic for transient errors -- Track error frequency to detect systemic issues -- Automatic container restart if errors persist -- Error categorization (transient vs. critical) -- Rate limiting on webhook notifications to prevent spam diff --git a/INTERRUPTION_DETECTION.md b/INTERRUPTION_DETECTION.md deleted file mode 100644 index f6e7ae5..0000000 --- a/INTERRUPTION_DETECTION.md +++ /dev/null @@ -1,311 +0,0 @@ -# Intelligent Interruption Detection System - -## Implementation Complete ✅ - -Added sophisticated interruption detection that prevents response queueing and allows natural conversation flow. - ---- - -## Features - -### 1. **Intelligent Interruption Detection** -Detects when user speaks over Miku with configurable thresholds: -- **Time threshold**: 0.8 seconds of continuous speech -- **Chunk threshold**: 8+ audio chunks (160ms worth) -- **Smart calculation**: Both conditions must be met to prevent false positives - -### 2. **Graceful Cancellation** -When interruption is detected: -- ✅ Stops LLM streaming immediately (`miku_speaking = False`) -- ✅ Cancels TTS playback -- ✅ Flushes audio buffers -- ✅ Ready for next input within milliseconds - -### 3. **History Tracking** -Maintains conversation context: -- Adds `[INTERRUPTED - user started speaking]` marker to history -- **Does NOT** add incomplete response to history -- LLM sees the interruption in context for next response -- Prevents confusion about what was actually said - -### 4. **Queue Prevention** -- If user speaks while Miku is talking **but not long enough to interrupt**: - - Input is **ignored** (not queued) - - User sees: `"(talk over Miku longer to interrupt)"` - - Prevents "yeah" x5 = 5 responses problem - ---- - -## How It Works - -### Detection Algorithm - -``` -User speaks during Miku's turn - ↓ -Track: start_time, chunk_count - ↓ -Each audio chunk increments counter - ↓ -Check thresholds: - - Duration >= 0.8s? - - Chunks >= 8? - ↓ - Both YES → INTERRUPT! - ↓ -Stop LLM stream, cancel TTS, mark history -``` - -### Threshold Calculation - -**Audio chunks**: Discord sends 20ms chunks @ 16kHz (320 samples) -- 8 chunks = 160ms of actual audio -- But over 800ms timespan = sustained speech - -**Why both conditions?** -- Time only: Background noise could trigger -- Chunks only: Gaps in speech could fail -- Both together: Reliable detection of intentional speech - ---- - -## Configuration - -### Interruption Thresholds - -Edit `bot/utils/voice_receiver.py`: - -```python -# Interruption detection -self.interruption_threshold_time = 0.8 # seconds -self.interruption_threshold_chunks = 8 # minimum chunks -``` - -**Recommendations**: -- **More sensitive** (interrupt faster): `0.5s / 6 chunks` -- **Current** (balanced): `0.8s / 8 chunks` -- **Less sensitive** (only clear interruptions): `1.2s / 12 chunks` - -### Silence Timeout - -The silence detection (when to finalize transcript) was also adjusted: - -```python -self.silence_timeout = 1.0 # seconds (was 1.5s) -``` - -Faster silence detection = more responsive conversations! - ---- - -## Conversation History Format - -### Before Interruption -```python -[ - {"role": "user", "content": "koko210: Tell me a long story"}, - {"role": "assistant", "content": "Once upon a time in a digital world..."}, -] -``` - -### After Interruption -```python -[ - {"role": "user", "content": "koko210: Tell me a long story"}, - {"role": "assistant", "content": "[INTERRUPTED - user started speaking]"}, - {"role": "user", "content": "koko210: Actually, tell me something else"}, - {"role": "assistant", "content": "Sure! What would you like to hear about?"}, -] -``` - -The `[INTERRUPTED]` marker gives the LLM context that the conversation was cut off. - ---- - -## Testing Scenarios - -### Test 1: Basic Interruption -1. `!miku listen` -2. Say: "Tell me a very long story about your concerts" -3. **While Miku is speaking**, talk over her for 1+ second -4. **Expected**: TTS stops, LLM stops, Miku listens to your new input - -### Test 2: Short Talk-Over (No Interruption) -1. Miku is speaking -2. Say a quick "yeah" or "uh-huh" (< 0.8s) -3. **Expected**: Ignored, Miku continues speaking, message: "(talk over Miku longer to interrupt)" - -### Test 3: Multiple Queued Inputs (PREVENTED) -1. Miku is speaking -2. Say "yeah" 5 times quickly -3. **Expected**: All ignored except one that might interrupt -4. **OLD BEHAVIOR**: Would queue 5 responses ❌ -5. **NEW BEHAVIOR**: Ignores them ✅ - -### Test 4: Conversation History -1. Start conversation -2. Interrupt Miku mid-sentence -3. Ask: "What were you saying?" -4. **Expected**: Miku should acknowledge she was interrupted - ---- - -## User Experience - -### What Users See - -**Normal conversation:** -``` -🎤 koko210: "Hey Miku, how are you?" -💭 Miku is thinking... -🎤 Miku: "I'm doing great! How about you?" -``` - -**Quick talk-over (ignored):** -``` -🎤 Miku: "I'm doing great! How about..." -💬 koko210 said: "yeah" (talk over Miku longer to interrupt) -🎤 Miku: "...you? I hope you're having a good day!" -``` - -**Successful interruption:** -``` -🎤 Miku: "I'm doing great! How about..." -⚠️ koko210 interrupted Miku -🎤 koko210: "Actually, can you sing something?" -💭 Miku is thinking... -``` - ---- - -## Technical Details - -### Interruption Detection Flow - -```python -# In voice_receiver.py _send_audio_chunk() - -if miku_speaking: - if user_id not in interruption_start_time: - # First chunk during Miku's speech - interruption_start_time[user_id] = current_time - interruption_audio_count[user_id] = 1 - else: - # Increment chunk count - interruption_audio_count[user_id] += 1 - - # Calculate duration - duration = current_time - interruption_start_time[user_id] - chunks = interruption_audio_count[user_id] - - # Check threshold - if duration >= 0.8 and chunks >= 8: - # INTERRUPT! - trigger_interruption(user_id) -``` - -### Cancellation Flow - -```python -# In voice_manager.py on_user_interruption() - -1. Set miku_speaking = False - → LLM streaming loop checks this and breaks - -2. Call _cancel_tts() - → Stops voice_client playback - → Sends /interrupt to RVC server - -3. Add history marker - → {"role": "assistant", "content": "[INTERRUPTED]"} - -4. Ready for next input! -``` - ---- - -## Performance - -- **Detection latency**: ~20-40ms (1-2 audio chunks) -- **Cancellation latency**: ~50-100ms (TTS stop + buffer clear) -- **Total response time**: ~100-150ms from speech start to Miku stopping -- **False positive rate**: Very low with dual threshold system - ---- - -## Monitoring - -### Check Interruption Logs -```bash -docker logs -f miku-bot | grep "interrupted" -``` - -**Expected output**: -``` -🛑 User 209381657369772032 interrupted Miku (duration=1.2s, chunks=15) -✓ Interruption handled, ready for next input -``` - -### Debug Interruption Detection -```bash -docker logs -f miku-bot | grep "interruption" -``` - -### Check for Queued Responses (should be none!) -```bash -docker logs -f miku-bot | grep "Ignoring new input" -``` - ---- - -## Edge Cases Handled - -1. **Multiple users interrupting**: Each user tracked independently -2. **Rapid speech then silence**: Interruption tracking resets when Miku stops -3. **Network packet loss**: Opus decode errors don't affect tracking -4. **Container restart**: Tracking state cleaned up properly -5. **Miku finishes naturally**: Interruption tracking cleared - ---- - -## Files Modified - -1. **bot/utils/voice_receiver.py** - - Added interruption tracking dictionaries - - Added detection logic in `_send_audio_chunk()` - - Cleanup interruption state in `stop_listening()` - - Configurable thresholds at init - -2. **bot/utils/voice_manager.py** - - Updated `on_user_interruption()` to handle graceful cancel - - Added history marker for interruptions - - Modified `_generate_voice_response()` to not save incomplete responses - - Added queue prevention in `on_final_transcript()` - - Reduced silence timeout to 1.0s - ---- - -## Benefits - -✅ **Natural conversation flow**: No more awkward queued responses -✅ **Responsive**: Miku stops quickly when interrupted -✅ **Context-aware**: History tracks interruptions -✅ **False-positive resistant**: Dual threshold prevents accidental triggers -✅ **User-friendly**: Clear feedback about what's happening -✅ **Performant**: Minimal latency, efficient tracking - ---- - -## Future Enhancements - -- [ ] **Adaptive thresholds** based on user speech patterns -- [ ] **Volume-based detection** (interrupt faster if user speaks loudly) -- [ ] **Context-aware responses** (Miku acknowledges interruption more naturally) -- [ ] **User preferences** (some users may want different sensitivity) -- [ ] **Multi-turn interruption** (handle rapid back-and-forth better) - ---- - -**Status**: ✅ **DEPLOYED AND READY FOR TESTING** - -Try interrupting Miku mid-sentence - she should stop gracefully and listen to your new input! diff --git a/README.md b/README.md deleted file mode 100644 index 5296d38..0000000 --- a/README.md +++ /dev/null @@ -1,535 +0,0 @@ -# 🎤 Miku Discord Bot 💙 - -
- -![Miku Banner](https://img.shields.io/badge/Virtual_Idol-Hatsune_Miku-00CED1?style=for-the-badge&logo=discord&logoColor=white) -[![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/) -[![Python](https://img.shields.io/badge/python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/) -[![Discord.py](https://img.shields.io/badge/discord.py-2.0+-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discordpy.readthedocs.io/) - -*The world's #1 Virtual Idol, now in your Discord server! 🌱✨* - -[Features](#-features) • [Quick Start](#-quick-start) • [Architecture](#️-architecture) • [API](#-api-endpoints) • [Contributing](#-contributing) - -
- ---- - -## 🌟 About - -Meet **Hatsune Miku** - a fully-featured, AI-powered Discord bot that brings the cheerful, energetic virtual idol to life in your server! Powered by local LLMs (Llama 3.1), vision models (MiniCPM-V), and a sophisticated autonomous behavior system, Miku can chat, react, share content, and even change her profile picture based on her mood! - -### Why This Bot? - -- 🎭 **14 Dynamic Moods** - From bubbly to melancholy, Miku's personality adapts -- 🤖 **Smart Autonomous Behavior** - Context-aware decisions without spamming -- 👁️ **Vision Capabilities** - Analyzes images, videos, and GIFs in conversations -- 🎨 **Auto Profile Pictures** - Fetches & crops anime faces from Danbooru based on mood -- 💬 **DM Support** - Personal conversations with mood tracking -- 🐦 **Twitter Integration** - Shares Miku-related tweets and figurine announcements -- 🎮 **ComfyUI Integration** - Natural language image generation requests -- 🔊 **Voice Chat Ready** - Fish.audio TTS integration (docs included) -- 📊 **RESTful API** - Full control via HTTP endpoints -- 🐳 **Production Ready** - Docker Compose with GPU support - ---- - -## ✨ Features - -### 🧠 AI & LLM Integration - -- **Local LLM Processing** with Llama 3.1 8B (via llama.cpp + llama-swap) -- **Automatic Model Switching** - Text ↔️ Vision models swap on-demand -- **OpenAI-Compatible API** - Easy migration and integration -- **Conversation History** - Per-user context with RAG-style retrieval -- **Smart Prompting** - Mood-aware system prompts with personality profiles - -### 🎭 Mood & Personality System - -
-14 Available Moods (click to expand) - -- 😊 **Neutral** - Classic cheerful Miku -- 😴 **Asleep** - Sleepy and minimally responsive -- 😪 **Sleepy** - Getting tired, simple responses -- 🎉 **Excited** - Extra energetic and enthusiastic -- 💫 **Bubbly** - Playful and giggly -- 🤔 **Curious** - Inquisitive and wondering -- 😳 **Shy** - Blushing and hesitant -- 🤪 **Silly** - Goofy and fun-loving -- 😠 **Angry** - Frustrated or upset -- 😤 **Irritated** - Mildly annoyed -- 😢 **Melancholy** - Sad and reflective -- 😏 **Flirty** - Playful and teasing -- 💕 **Romantic** - Sweet and affectionate -- 🎯 **Serious** - Focused and thoughtful - -
- -- **Per-Server Mood Tracking** - Different moods in different servers -- **DM Mood Persistence** - Separate mood state for private conversations -- **Automatic Mood Shifts** - Responds to conversation sentiment - -### 🤖 Autonomous Behavior System V2 - -The bot features a sophisticated **context-aware decision engine** that makes Miku feel alive: - -- **Intelligent Activity Detection** - Tracks message frequency, user presence, and channel activity -- **Non-Intrusive** - Won't spam or interrupt important conversations -- **Mood-Based Personality** - Behavioral patterns change with mood -- **Multiple Action Types**: - - 💬 General conversation starters - - 👋 Engaging specific users - - 🐦 Sharing Miku tweets - - 💬 Joining ongoing conversations - - 🎨 Changing profile pictures - - 😊 Reacting to messages - -**Rate Limiting**: Minimum 30-second cooldown between autonomous actions to prevent spam. - -### 👁️ Vision & Media Processing - -- **Image Analysis** - Describe images shared in chat using MiniCPM-V 4.5 -- **Video Understanding** - Extracts frames and analyzes video content -- **GIF Support** - Processes animated GIFs (converts to MP4 if needed) -- **Embed Content Extraction** - Reads Twitter/X embeds without API -- **Face Detection** - On-demand anime face detection service (GPU-accelerated) - -### 🎨 Dynamic Profile Picture System - -- **Danbooru Integration** - Searches for Miku artwork -- **Smart Cropping** - Automatic face detection and 1:1 crop -- **Mood-Based Selection** - Filters by tags matching current mood -- **Quality Filtering** - Only uses high-quality, safe-rated images -- **Fallback System** - Graceful degradation if detection fails - -### 🐦 Twitter Features - -- **Tweet Sharing** - Automatically fetches and shares Miku-related tweets -- **Figurine Notifications** - DM subscribers about new Miku figurine releases -- **Embed Compatibility** - Uses fxtwitter for better Discord previews -- **Duplicate Prevention** - Tracks sent tweets to avoid repeats - -### 🎮 ComfyUI Image Generation - -- **Natural Language Detection** - "Draw me as Miku swimming in a pool" -- **Workflow Integration** - Connects to external ComfyUI instance -- **Smart Prompting** - Enhances user requests with context - -### 📡 REST API Dashboard - -Full-featured FastAPI server with endpoints for: -- Mood management (get/set/reset) -- Conversation history -- Autonomous actions (trigger manually) -- Profile picture updates -- Server configuration -- DM analysis reports - -### 🔧 Developer Features - -- **Docker Compose Setup** - One command deployment -- **GPU Acceleration** - NVIDIA runtime for models and face detection -- **Health Checks** - Automatic service monitoring -- **Volume Persistence** - Conversation history and settings saved -- **Hot Reload** - Update without restarting (for development) - ---- - -## 🚀 Quick Start - -### Prerequisites - -- **Docker** & **Docker Compose** installed -- **NVIDIA GPU** with CUDA support (for model inference) -- **Discord Bot Token** ([Create one here](https://discord.com/developers/applications)) -- At least **8GB VRAM** recommended (4GB minimum) - -### Installation - -1. **Clone the repository** - ```bash - git clone https://github.com/yourusername/miku-discord.git - cd miku-discord - ``` - -2. **Set up your bot token** - - Edit `docker-compose.yml` and replace the `DISCORD_BOT_TOKEN`: - ```yaml - environment: - - DISCORD_BOT_TOKEN=your_token_here - - OWNER_USER_ID=your_discord_user_id # For DM reports - ``` - -3. **Add your models** - - Place these GGUF models in the `models/` directory: - - `Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf` (text model) - - `MiniCPM-V-4_5-Q3_K_S.gguf` (vision model) - - `MiniCPM-V-4_5-mmproj-f16.gguf` (vision projector) - -4. **Launch the bot** - ```bash - docker-compose up -d - ``` - -5. **Check logs** - ```bash - docker-compose logs -f miku-bot - ``` - -6. **Access the dashboard** - - Open http://localhost:3939 in your browser - -### Optional: ComfyUI Integration - -If you have ComfyUI running, update the path in `docker-compose.yml`: -```yaml -volumes: - - /path/to/your/ComfyUI/output:/app/ComfyUI/output:ro -``` - -### Optional: Face Detection Service - -Start the anime face detector when needed: -```bash -docker-compose --profile tools up -d anime-face-detector -``` - -Access Gradio UI at http://localhost:7860 - ---- - -## 🏗️ Architecture - -### Service Overview - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Discord API │ -└───────────────────────┬─────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────┐ -│ Miku Bot (Python) │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ -│ │ Discord │ │ FastAPI │ │ Autonomous │ │ -│ │ Event Loop │ │ Server │ │ Engine │ │ -│ └──────────────┘ └──────────────┘ └──────────────┘ │ -└───────────┬────────────────┬────────────────┬──────────────┘ - │ │ │ - ▼ ▼ ▼ -┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ -│ llama-swap │ │ ComfyUI │ │ Face Detector│ -│ (Model Server) │ │ (Image Gen) │ │ (On-Demand) │ -│ │ │ │ │ │ -│ • Llama 3.1 │ │ • Workflows │ │ • Gradio UI │ -│ • MiniCPM-V │ │ • GPU Accel │ │ • FastAPI │ -│ • Auto-swap │ │ │ │ │ -└─────────────────┘ └─────────────────┘ └──────────────┘ - │ - ▼ - ┌──────────┐ - │ Models │ - │ (GGUF) │ - └──────────┘ -``` - -### Tech Stack - -| Component | Technology | -|-----------|-----------| -| **Bot Framework** | Discord.py 2.0+ | -| **LLM Backend** | llama.cpp + llama-swap | -| **Text Model** | Llama 3.1 8B Instruct | -| **Vision Model** | MiniCPM-V 4.5 | -| **API Server** | FastAPI + Uvicorn | -| **Image Gen** | ComfyUI (external) | -| **Face Detection** | Anime-Face-Detector (Gradio) | -| **Database** | JSON files (conversation history, settings) | -| **Containerization** | Docker + Docker Compose | -| **GPU Runtime** | NVIDIA Container Toolkit | - -### Key Components - -#### 1. **llama-swap** (Model Server) -- Automatically loads/unloads models based on requests -- Prevents VRAM exhaustion by swapping between text and vision models -- OpenAI-compatible `/v1/chat/completions` endpoint -- Configurable TTL (time-to-live) per model - -#### 2. **Autonomous Engine V2** -- Tracks message activity, user presence, and channel engagement -- Calculates "engagement scores" per server -- Makes context-aware decisions without LLM overhead -- Personality profiles per mood (e.g., shy mood = less engaging) - -#### 3. **Server Manager** -- Per-guild configuration (mood, sleep state, autonomous settings) -- Scheduled tasks (bedtime reminders, autonomous ticks) -- Persistent storage in `servers_config.json` - -#### 4. **Conversation History** -- Vector-based RAG (Retrieval Augmented Generation) -- Stores last 50 messages per user -- Semantic search using FAISS -- Context injection for continuity - ---- - -## 📡 API Endpoints - -The bot runs a FastAPI server on port **3939** with the following endpoints: - -### Mood Management - -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/servers/{guild_id}/mood` | GET | Get current mood for server | -| `/servers/{guild_id}/mood` | POST | Set mood (body: `{"mood": "excited"}`) | -| `/servers/{guild_id}/mood/reset` | POST | Reset to neutral mood | -| `/mood` | GET | Get DM mood (deprecated, use server-specific) | - -### Autonomous Actions - -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/autonomous/general` | POST | Make Miku say something random | -| `/autonomous/engage` | POST | Engage a random user | -| `/autonomous/tweet` | POST | Share a Miku tweet | -| `/autonomous/reaction` | POST | React to a recent message | -| `/autonomous/custom` | POST | Custom prompt (body: `{"prompt": "..."}`) | - -### Profile Pictures - -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/profile-picture/change` | POST | Change profile picture (body: `{"mood": "happy"}`) | -| `/profile-picture/revert` | POST | Revert to previous picture | -| `/profile-picture/current` | GET | Get current picture metadata | - -### Utilities - -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/conversation/reset` | POST | Clear conversation history for user | -| `/logs` | GET | View bot logs (last 1000 lines) | -| `/prompt` | GET | View current system prompt | -| `/` | GET | Dashboard HTML page | - -### Example Usage - -```bash -# Set mood to excited -curl -X POST http://localhost:3939/servers/123456789/mood \ - -H "Content-Type: application/json" \ - -d '{"mood": "excited"}' - -# Make Miku say something -curl -X POST http://localhost:3939/autonomous/general - -# Change profile picture -curl -X POST http://localhost:3939/profile-picture/change \ - -H "Content-Type: application/json" \ - -d '{"mood": "flirty"}' -``` - ---- - -## 🎮 Usage Examples - -### Basic Interaction - -``` -User: Hey Miku! How are you today? -Miku: Miku's doing great! 💙 Thanks for asking! ✨ - -User: Can you see this? [uploads image] -Miku: Ooh! 👀 I see a cute cat sitting on a keyboard! So fluffy! 🐱 -``` - -### Mood Changes - -``` -User: /mood excited -Miku: YAYYY!!! 🎉✨ Miku is SO EXCITED right now!!! Let's have fun! 💙🎶 - -User: What's your favorite food? -Miku: NEGI!! 🌱🌱🌱 Green onions are THE BEST! Want some?! ✨ -``` - -### Image Generation - -``` -User: Draw yourself swimming in a pool -Miku: Ooh! Let me create that for you! 🎨✨ [generates image] -``` - -### Autonomous Behavior - -``` -[After detecting activity in #general] -Miku: Hey everyone! 👋 What are you all talking about? 💙 -``` - ---- - -## 🛠️ Configuration - -### Model Configuration (`llama-swap-config.yaml`) - -```yaml -models: - llama3.1: - cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99 - ttl: 1800 # 30 minutes - - vision: - cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf - ttl: 900 # 15 minutes -``` - -### Environment Variables - -| Variable | Default | Description | -|----------|---------|-------------| -| `DISCORD_BOT_TOKEN` | *Required* | Your Discord bot token | -| `OWNER_USER_ID` | *Optional* | Your Discord user ID (for DM reports) | -| `LLAMA_URL` | `http://llama-swap:8080` | Model server endpoint | -| `TEXT_MODEL` | `llama3.1` | Text generation model name | -| `VISION_MODEL` | `vision` | Vision model name | - -### Persistent Storage - -All data is stored in `bot/memory/`: -- `servers_config.json` - Per-server settings -- `autonomous_config.json` - Autonomous behavior settings -- `conversation_history/` - User conversation data -- `profile_pictures/` - Downloaded profile pictures -- `dms/` - DM conversation logs -- `figurine_subscribers.json` - Figurine notification subscribers - ---- - -## 📚 Documentation - -Detailed documentation available in the `readmes/` directory: - -- **[AUTONOMOUS_V2_IMPLEMENTED.md](readmes/AUTONOMOUS_V2_IMPLEMENTED.md)** - Autonomous system V2 details -- **[VOICE_CHAT_IMPLEMENTATION.md](readmes/VOICE_CHAT_IMPLEMENTATION.md)** - Fish.audio TTS integration guide -- **[PROFILE_PICTURE_FEATURE.md](readmes/PROFILE_PICTURE_FEATURE.md)** - Profile picture system -- **[FACE_DETECTION_API_MIGRATION.md](readmes/FACE_DETECTION_API_MIGRATION.md)** - Face detection setup -- **[DM_ANALYSIS_FEATURE.md](readmes/DM_ANALYSIS_FEATURE.md)** - DM interaction analytics -- **[MOOD_SYSTEM_ANALYSIS.md](readmes/MOOD_SYSTEM_ANALYSIS.md)** - Mood system deep dive -- **[QUICK_REFERENCE.md](readmes/QUICK_REFERENCE.md)** - llama.cpp setup and migration guide - ---- - -## 🐛 Troubleshooting - -### Bot won't start - -**Check if models are loaded:** -```bash -docker-compose logs llama-swap -``` - -**Verify GPU access:** -```bash -docker run --rm --runtime=nvidia nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi -``` - -### High VRAM usage - -- Lower the `-ngl` parameter in `llama-swap-config.yaml` (reduce GPU layers) -- Reduce context size with `-c` parameter -- Use smaller quantization (Q3 instead of Q4) - -### Autonomous actions not triggering - -- Check `autonomous_config.json` - ensure enabled and cooldown settings -- Verify activity in server (bot tracks engagement) -- Check logs for decision engine output - -### Face detection not working - -- Ensure GPU is available: `docker-compose --profile tools up -d anime-face-detector` -- Check API health: `curl http://localhost:6078/health` -- View Gradio UI: http://localhost:7860 - -### Models switching too frequently - -Increase TTL in `llama-swap-config.yaml`: -```yaml -ttl: 3600 # 1 hour instead of 30 minutes -``` - - -### Development Setup - -For local development without Docker: - -```bash -# Install dependencies -cd bot -pip install -r requirements.txt - -# Set environment variables -export DISCORD_BOT_TOKEN="your_token" -export LLAMA_URL="http://localhost:8080" - -# Run the bot -python bot.py -``` - -### Code Style - -- Use type hints where possible -- Follow PEP 8 conventions -- Add docstrings to functions -- Comment complex logic - ---- - -## 📝 License - -This project is provided as-is for educational and personal use. Please respect: -- Discord's [Terms of Service](https://discord.com/terms) -- Crypton Future Media's [Piapro Character License](https://piapro.net/intl/en_for_creators.html) -- Model licenses (Llama 3.1, MiniCPM-V) - ---- - -## 🙏 Acknowledgments - -- **Crypton Future Media** - For creating Hatsune Miku -- **llama.cpp** - For efficient local LLM inference -- **mostlygeek/llama-swap** - For brilliant model management -- **Discord.py** - For the excellent Discord API wrapper -- **OpenAI** - For the API standard -- **MiniCPM-V Team** - For the amazing vision model -- **Danbooru** - For the artwork API - ---- - -## 💙 Support - -If you enjoy this project: -- ⭐ Star this repository -- 🐛 Report bugs via [Issues](https://github.com/yourusername/miku-discord/issues) -- 💬 Share your Miku bot setup in [Discussions](https://github.com/yourusername/miku-discord/discussions) -- 🎤 Listen to some Miku songs! - ---- - -
- -**Made with 💙 by a Miku fan, for Miku fans** - -*"The future begins now!" - Hatsune Miku* 🎶✨ - -[⬆ Back to Top](#-miku-discord-bot-) - -
diff --git a/SILENCE_DETECTION.md b/SILENCE_DETECTION.md deleted file mode 100644 index 74b391d..0000000 --- a/SILENCE_DETECTION.md +++ /dev/null @@ -1,222 +0,0 @@ -# Silence Detection Implementation - -## What Was Added - -Implemented automatic silence detection to trigger final transcriptions in the new ONNX-based STT system. - -### Problem -The new ONNX server requires manually sending a `{"type": "final"}` command to get the complete transcription. Without this, partial transcripts would appear but never be finalized and sent to LlamaCPP. - -### Solution -Added silence tracking in `voice_receiver.py`: - -1. **Track audio timestamps**: Record when the last audio chunk was sent -2. **Detect silence**: Start a timer after each audio chunk -3. **Send final command**: If no new audio arrives within 1.5 seconds, send `{"type": "final"}` -4. **Cancel on new audio**: Reset the timer if more audio arrives - ---- - -## Implementation Details - -### New Attributes -```python -self.last_audio_time: Dict[int, float] = {} # Track last audio per user -self.silence_tasks: Dict[int, asyncio.Task] = {} # Silence detection tasks -self.silence_timeout = 1.5 # Seconds of silence before "final" -``` - -### New Method -```python -async def _detect_silence(self, user_id: int): - """ - Wait for silence timeout and send 'final' command to STT. - Called after each audio chunk. - """ - await asyncio.sleep(self.silence_timeout) - stt_client = self.stt_clients.get(user_id) - if stt_client and stt_client.is_connected(): - await stt_client.send_final() -``` - -### Integration -- Called after sending each audio chunk -- Cancels previous silence task if new audio arrives -- Automatically cleaned up when stopping listening - ---- - -## Testing - -### Test 1: Basic Transcription -1. Join voice channel -2. Run `!miku listen` -3. **Speak a sentence** and wait 1.5 seconds -4. **Expected**: Final transcript appears and is sent to LlamaCPP - -### Test 2: Continuous Speech -1. Start listening -2. **Speak multiple sentences** with pauses < 1.5s between them -3. **Expected**: Partial transcripts update, final sent after last sentence - -### Test 3: Multiple Users -1. Have 2+ users in voice channel -2. Each runs `!miku listen` -3. Both speak (taking turns or simultaneously) -4. **Expected**: Each user's speech is transcribed independently - ---- - -## Configuration - -### Silence Timeout -Default: `1.5` seconds - -**To adjust**, edit `voice_receiver.py`: -```python -self.silence_timeout = 1.5 # Change this value -``` - -**Recommendations**: -- **Too short (< 1.0s)**: May cut off during natural pauses in speech -- **Too long (> 3.0s)**: User waits too long for response -- **Sweet spot**: 1.5-2.0s works well for conversational speech - ---- - -## Monitoring - -### Check Logs for Silence Detection -```bash -docker logs miku-bot 2>&1 | grep "Silence detected" -``` - -**Expected output**: -``` -[DEBUG] Silence detected for user 209381657369772032, requesting final transcript -``` - -### Check Final Transcripts -```bash -docker logs miku-bot 2>&1 | grep "FINAL TRANSCRIPT" -``` - -### Check STT Processing -```bash -docker logs miku-stt 2>&1 | grep "Final transcription" -``` - ---- - -## Debugging - -### Issue: No Final Transcript -**Symptoms**: Partial transcripts appear but never finalize - -**Debug steps**: -1. Check if silence detection is triggering: - ```bash - docker logs miku-bot 2>&1 | grep "Silence detected" - ``` - -2. Check if final command is being sent: - ```bash - docker logs miku-stt 2>&1 | grep "type.*final" - ``` - -3. Increase log level in stt_client.py: - ```python - logger.setLevel(logging.DEBUG) - ``` - -### Issue: Cuts Off Mid-Sentence -**Symptoms**: Final transcript triggers during natural pauses - -**Solution**: Increase silence timeout: -```python -self.silence_timeout = 2.0 # or 2.5 -``` - -### Issue: Too Slow to Respond -**Symptoms**: Long wait after user stops speaking - -**Solution**: Decrease silence timeout: -```python -self.silence_timeout = 1.0 # or 1.2 -``` - ---- - -## Architecture - -``` -Discord Voice → voice_receiver.py - ↓ - [Audio Chunk Received] - ↓ - ┌─────────────────────┐ - │ send_audio() │ - │ to STT server │ - └─────────────────────┘ - ↓ - ┌─────────────────────┐ - │ Start silence │ - │ detection timer │ - │ (1.5s countdown) │ - └─────────────────────┘ - ↓ - ┌──────┴──────┐ - │ │ - More audio No more audio - arrives for 1.5s - │ │ - ↓ ↓ - Cancel timer ┌──────────────┐ - Start new │ send_final() │ - │ to STT │ - └──────────────┘ - ↓ - ┌─────────────────┐ - │ Final transcript│ - │ → LlamaCPP │ - └─────────────────┘ -``` - ---- - -## Files Modified - -1. **bot/utils/voice_receiver.py** - - Added `last_audio_time` tracking - - Added `silence_tasks` management - - Added `_detect_silence()` method - - Integrated silence detection in `_send_audio_chunk()` - - Added cleanup in `stop_listening()` - -2. **bot/utils/stt_client.py** (previously) - - Added `send_final()` method - - Added `send_reset()` method - - Updated protocol handler - ---- - -## Next Steps - -1. **Test thoroughly** with different speech patterns -2. **Tune silence timeout** based on user feedback -3. **Consider VAD integration** for more accurate speech end detection -4. **Add metrics** to track transcription latency - ---- - -**Status**: ✅ **READY FOR TESTING** - -The system now: -- ✅ Connects to ONNX STT server (port 8766) -- ✅ Uses CUDA GPU acceleration (cuDNN 9) -- ✅ Receives partial transcripts -- ✅ Automatically detects silence -- ✅ Sends final command after 1.5s silence -- ✅ Forwards final transcript to LlamaCPP - -**Test it now with `!miku listen`!** diff --git a/STT_DEBUG_SUMMARY.md b/STT_DEBUG_SUMMARY.md deleted file mode 100644 index 88e40d4..0000000 --- a/STT_DEBUG_SUMMARY.md +++ /dev/null @@ -1,207 +0,0 @@ -# STT Debug Summary - January 18, 2026 - -## Issues Identified & Fixed ✅ - -### 1. **CUDA Not Being Used** ❌ → ✅ -**Problem:** Container was falling back to CPU, causing slow transcription. - -**Root Cause:** -``` -libcudnn.so.9: cannot open shared object file: No such file or directory -``` -The ONNX Runtime requires cuDNN 9, but the base image `nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04` only had cuDNN 8. - -**Fix Applied:** -```dockerfile -# Changed from: -FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 - -# To: -FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04 -``` - -**Verification:** -```bash -$ docker logs miku-stt 2>&1 | grep "Providers" -INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', {'device_id': 0, ...}), 'CPUExecutionProvider'] -``` -✅ CUDAExecutionProvider is now loaded successfully! - ---- - -### 2. **Connection Refused Error** ❌ → ✅ -**Problem:** Bot couldn't connect to STT service. - -**Error:** -``` -ConnectionRefusedError: [Errno 111] Connect call failed ('172.20.0.5', 8000) -``` - -**Root Cause:** Port mismatch between bot and STT server. -- Bot was connecting to: `ws://miku-stt:8000` -- STT server was running on: `ws://miku-stt:8766` - -**Fix Applied:** -Updated `bot/utils/stt_client.py`: -```python -def __init__( - self, - user_id: str, - stt_url: str = "ws://miku-stt:8766/ws/stt", # ← Changed from 8000 - ... -) -``` - ---- - -### 3. **Protocol Mismatch** ❌ → ✅ -**Problem:** Bot and STT server were using incompatible protocols. - -**Old NeMo Protocol:** -- Automatic VAD detection -- Events: `vad`, `partial`, `final`, `interruption` -- No manual control needed - -**New ONNX Protocol:** -- Manual transcription control -- Events: `transcript` (with `is_final` flag), `info`, `error` -- Requires sending `{"type": "final"}` command to get final transcript - -**Fix Applied:** - -1. **Updated event handler** in `stt_client.py`: -```python -async def _handle_event(self, event: dict): - event_type = event.get('type') - - if event_type == 'transcript': - # New ONNX protocol - text = event.get('text', '') - is_final = event.get('is_final', False) - - if is_final: - if self.on_final_transcript: - await self.on_final_transcript(text, timestamp) - else: - if self.on_partial_transcript: - await self.on_partial_transcript(text, timestamp) - - # Also maintains backward compatibility with old protocol - elif event_type == 'partial' or event_type == 'final': - # Legacy support... -``` - -2. **Added new methods** for manual control: -```python -async def send_final(self): - """Request final transcription from STT server.""" - command = json.dumps({"type": "final"}) - await self.websocket.send_str(command) - -async def send_reset(self): - """Reset the STT server's audio buffer.""" - command = json.dumps({"type": "reset"}) - await self.websocket.send_str(command) -``` - ---- - -## Current Status - -### Containers -- ✅ `miku-stt`: Running with CUDA 12.6.2 + cuDNN 9 -- ✅ `miku-bot`: Rebuilt with updated STT client -- ✅ Both containers healthy and communicating on correct port - -### STT Container Logs -``` -CUDA Version 12.6.2 -INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)] -INFO:asr.asr_pipeline:Model loaded successfully -INFO:__main__:Server running on ws://0.0.0.0:8766 -INFO:__main__:Active connections: 0 -``` - -### Files Modified -1. `stt-parakeet/Dockerfile` - Updated base image to CUDA 12.6.2 -2. `bot/utils/stt_client.py` - Fixed port, protocol, added new methods -3. `docker-compose.yml` - Already updated to use new STT service -4. `STT_MIGRATION.md` - Added troubleshooting section - ---- - -## Testing Checklist - -### Ready to Test ✅ -- [x] CUDA GPU acceleration enabled -- [x] Port configuration fixed -- [x] Protocol compatibility updated -- [x] Containers rebuilt and running - -### Next Steps for User 🧪 -1. **Test voice commands**: Use `!miku listen` in Discord -2. **Verify transcription**: Check if audio is transcribed correctly -3. **Monitor performance**: Check transcription speed and quality -4. **Check logs**: Monitor `docker logs miku-bot` and `docker logs miku-stt` for errors - -### Expected Behavior -- Bot connects to STT server successfully -- Audio is streamed to STT server -- Progressive transcripts appear (optional, may need VAD integration) -- Final transcript is returned when user stops speaking -- No more CUDA/cuDNN errors -- No more connection refused errors - ---- - -## Technical Notes - -### GPU Utilization -- **Before:** CPU fallback (0% GPU usage) -- **After:** CUDA acceleration (~85-95% GPU usage on GTX 1660) - -### Performance Expectations -- **Transcription Speed:** ~0.5-1 second per utterance (down from 2-3 seconds) -- **VRAM Usage:** ~2-3GB (down from 4-5GB with NeMo) -- **Model:** Parakeet TDT 0.6B (ONNX optimized) - -### Known Limitations -- No word-level timestamps (ONNX model doesn't provide them) -- Progressive transcription requires sending audio chunks regularly -- Must call `send_final()` to get final transcript (not automatic) - ---- - -## Additional Information - -### Container Network -- Network: `miku-discord_default` -- STT Service: `miku-stt:8766` -- Bot Service: `miku-bot` - -### Health Check -```bash -# Check STT container health -docker inspect miku-stt | grep -A5 Health - -# Test WebSocket connection -curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \ - -H "Sec-WebSocket-Version: 13" -H "Sec-WebSocket-Key: test" \ - http://localhost:8766/ -``` - -### Logs Monitoring -```bash -# Follow both containers -docker-compose logs -f miku-bot miku-stt - -# Just STT -docker logs -f miku-stt - -# Search for errors -docker logs miku-bot 2>&1 | grep -i "error\|failed\|exception" -``` - ---- - -**Migration Status:** ✅ **COMPLETE - READY FOR TESTING** diff --git a/STT_FIX_COMPLETE.md b/STT_FIX_COMPLETE.md deleted file mode 100644 index a6605bd..0000000 --- a/STT_FIX_COMPLETE.md +++ /dev/null @@ -1,192 +0,0 @@ -# STT Fix Applied - Ready for Testing - -## Summary - -Fixed all three issues preventing the ONNX-based Parakeet STT from working: - -1. ✅ **CUDA Support**: Updated Docker base image to include cuDNN 9 -2. ✅ **Port Configuration**: Fixed bot to connect to port 8766 (found TWO places) -3. ✅ **Protocol Compatibility**: Updated event handler for new ONNX format - ---- - -## Files Modified - -### 1. `stt-parakeet/Dockerfile` -```diff -- FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 -+ FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04 -``` - -### 2. `bot/utils/stt_client.py` -```diff -- stt_url: str = "ws://miku-stt:8000/ws/stt" -+ stt_url: str = "ws://miku-stt:8766/ws/stt" -``` - -Added new methods: -- `send_final()` - Request final transcription -- `send_reset()` - Clear audio buffer - -Updated `_handle_event()` to support: -- New ONNX protocol: `{"type": "transcript", "is_final": true/false}` -- Legacy protocol: `{"type": "partial"}`, `{"type": "final"}` (backward compatibility) - -### 3. `bot/utils/voice_receiver.py` ⚠️ **KEY FIX** -```diff -- def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8000/ws/stt"): -+ def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8766/ws/stt"): -``` - -**This was the missing piece!** The `voice_receiver` was overriding the default URL. - ---- - -## Container Status - -### STT Container ✅ -```bash -$ docker logs miku-stt 2>&1 | tail -10 -``` -``` -CUDA Version 12.6.2 -INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)] -INFO:asr.asr_pipeline:Model loaded successfully -INFO:__main__:Server running on ws://0.0.0.0:8766 -INFO:__main__:Active connections: 0 -``` - -**Status**: ✅ Running with CUDA acceleration - -### Bot Container ✅ -- Files copied directly into running container (faster than rebuild) -- Python bytecode cache cleared -- Container restarted - ---- - -## Testing Instructions - -### Test 1: Basic Connection -1. Join a voice channel in Discord -2. Run `!miku listen` -3. **Expected**: Bot connects without "Connection Refused" error -4. **Check logs**: `docker logs miku-bot 2>&1 | grep "STT"` - -### Test 2: Transcription -1. After running `!miku listen`, speak into your microphone -2. **Expected**: Your speech is transcribed -3. **Check STT logs**: `docker logs miku-stt 2>&1 | tail -20` -4. **Check bot logs**: Look for "Partial transcript" or "Final transcript" messages - -### Test 3: Performance -1. Monitor GPU usage: `nvidia-smi -l 1` -2. **Expected**: GPU utilization increases when transcribing -3. **Expected**: Transcription completes in ~0.5-1 second - ---- - -## Monitoring Commands - -### Check Both Containers -```bash -docker logs -f --tail=50 miku-bot miku-stt -``` - -### Check STT Service Health -```bash -docker ps | grep miku-stt -docker logs miku-stt 2>&1 | grep "CUDA\|Providers\|Server running" -``` - -### Check for Errors -```bash -# Bot errors -docker logs miku-bot 2>&1 | grep -i "error\|failed" | tail -20 - -# STT errors -docker logs miku-stt 2>&1 | grep -i "error\|failed" | tail -20 -``` - -### Test WebSocket Connection -```bash -# From host machine -curl -i -N \ - -H "Connection: Upgrade" \ - -H "Upgrade: websocket" \ - -H "Sec-WebSocket-Version: 13" \ - -H "Sec-WebSocket-Key: test" \ - http://localhost:8766/ -``` - ---- - -## Known Issues & Workarounds - -### Issue: Bot Still Shows Old Errors -**Symptom**: After restart, logs still show port 8000 errors - -**Cause**: Python module caching or log entries from before restart - -**Solution**: -```bash -# Clear cache and restart -docker exec miku-bot find /app -name "*.pyc" -delete -docker restart miku-bot - -# Wait 10 seconds for full restart -sleep 10 -``` - -### Issue: Container Rebuild Takes 15+ Minutes -**Cause**: `playwright install` downloads chromium/firefox browsers (~500MB) - -**Workaround**: Instead of full rebuild, use `docker cp`: -```bash -docker cp bot/utils/stt_client.py miku-bot:/app/utils/stt_client.py -docker cp bot/utils/voice_receiver.py miku-bot:/app/utils/voice_receiver.py -docker restart miku-bot -``` - ---- - -## Next Steps - -### For Full Deployment (after testing) -1. Rebuild bot container properly: - ```bash - docker-compose build miku-bot - docker-compose up -d miku-bot - ``` - -2. Remove old STT directory: - ```bash - mv stt stt.backup - ``` - -3. Update documentation to reflect new architecture - -### Optional Enhancements -1. Add `send_final()` call when user stops speaking (VAD integration) -2. Implement progressive transcription display -3. Add transcription quality metrics/logging -4. Test with multiple simultaneous users - ---- - -## Quick Reference - -| Component | Old (NeMo) | New (ONNX) | -|-----------|------------|------------| -| **Port** | 8000 | 8766 | -| **VRAM** | 4-5GB | 2-3GB | -| **Speed** | 2-3s | 0.5-1s | -| **cuDNN** | 8 | 9 | -| **CUDA** | 12.1 | 12.6.2 | -| **Protocol** | Auto VAD | Manual control | - ---- - -**Status**: ✅ **ALL FIXES APPLIED - READY FOR USER TESTING** - -Last Updated: January 18, 2026 20:47 EET diff --git a/STT_MIGRATION.md b/STT_MIGRATION.md deleted file mode 100644 index 344c87e..0000000 --- a/STT_MIGRATION.md +++ /dev/null @@ -1,237 +0,0 @@ -# STT Migration: NeMo → ONNX Runtime - -## What Changed - -**Old Implementation** (`stt/`): -- Used NVIDIA NeMo toolkit with PyTorch -- Heavy memory usage (~4-5GB VRAM) -- Complex dependency tree (NeMo, transformers, huggingface-hub conflicts) -- Slow transcription (~2-3 seconds per utterance) -- Custom VAD + FastAPI WebSocket server - -**New Implementation** (`stt-parakeet/`): -- Uses `onnx-asr` library with ONNX Runtime -- Optimized VRAM usage (~2-3GB VRAM) -- Simple dependencies (onnxruntime-gpu, onnx-asr, numpy) -- **Much faster transcription** (~0.5-1 second per utterance) -- Clean architecture with modular ASR pipeline - -## Architecture - -``` -stt-parakeet/ -├── Dockerfile # CUDA 12.1 + Python 3.11 + ONNX Runtime -├── requirements-stt.txt # Exact pinned dependencies -├── asr/ -│ └── asr_pipeline.py # ONNX ASR wrapper with GPU acceleration -├── server/ -│ └── ws_server.py # WebSocket server (port 8766) -├── vad/ -│ └── silero_vad.py # Voice Activity Detection -└── models/ # Model cache (auto-downloaded) -``` - -## Docker Setup - -### Build -```bash -docker-compose build miku-stt -``` - -### Run -```bash -docker-compose up -d miku-stt -``` - -### Check Logs -```bash -docker logs -f miku-stt -``` - -### Verify CUDA -```bash -docker exec miku-stt python3.11 -c "import onnxruntime as ort; print('CUDA:', 'CUDAExecutionProvider' in ort.get_available_providers())" -``` - -## API Changes - -### Old Protocol (port 8001) -```python -# FastAPI with /ws/stt/{user_id} endpoint -ws://localhost:8001/ws/stt/123456 - -# Events: -{ - "type": "vad", - "event": "speech_start" | "speaking" | "speech_end", - "probability": 0.95 -} -{ - "type": "partial", - "text": "Hello", - "words": [] -} -{ - "type": "final", - "text": "Hello world", - "words": [{"word": "Hello", "start_time": 0.0, "end_time": 0.5}] -} -``` - -### New Protocol (port 8766) -```python -# Direct WebSocket connection -ws://localhost:8766 - -# Send audio (binary): -# - int16 PCM, 16kHz mono -# - Send as raw bytes - -# Send commands (JSON): -{"type": "final"} # Trigger final transcription -{"type": "reset"} # Clear audio buffer - -# Receive transcripts: -{ - "type": "transcript", - "text": "Hello world", - "is_final": false # Progressive transcription -} -{ - "type": "transcript", - "text": "Hello world", - "is_final": true # Final transcription after "final" command -} -``` - -## Bot Integration Changes Needed - -### 1. Update WebSocket URL -```python -# Old -ws://miku-stt:8000/ws/stt/{user_id} - -# New -ws://miku-stt:8766 -``` - -### 2. Update Message Format -```python -# Old: Send audio with metadata -await websocket.send_bytes(audio_data) - -# New: Send raw audio bytes (same) -await websocket.send(audio_data) # bytes - -# Old: Listen for VAD events -if msg["type"] == "vad": - # Handle VAD - -# New: No VAD events (handled internally) -# Just send final command when user stops speaking -await websocket.send(json.dumps({"type": "final"})) -``` - -### 3. Update Response Handling -```python -# Old -if msg["type"] == "partial": - text = msg["text"] - words = msg["words"] - -if msg["type"] == "final": - text = msg["text"] - words = msg["words"] - -# New -if msg["type"] == "transcript": - text = msg["text"] - is_final = msg["is_final"] - # No word-level timestamps in ONNX version -``` - -## Performance Comparison - -| Metric | Old (NeMo) | New (ONNX) | -|--------|-----------|-----------| -| **VRAM Usage** | 4-5GB | 2-3GB | -| **Transcription Speed** | 2-3s | 0.5-1s | -| **Build Time** | ~10 min | ~5 min | -| **Dependencies** | 50+ packages | 15 packages | -| **GPU Utilization** | 60-70% | 85-95% | -| **OOM Crashes** | Frequent | None | - -## Migration Steps - -1. ✅ Build new container: `docker-compose build miku-stt` -2. ✅ Update bot WebSocket client (`bot/utils/stt_client.py`) -3. ✅ Update voice receiver to send "final" command -4. ⏳ Test transcription quality -5. ⏳ Remove old `stt/` directory - -## Troubleshooting - -### Issue 1: CUDA Not Working (Falling Back to CPU) -**Symptoms:** -``` -[E:onnxruntime:Default] Failed to load library libonnxruntime_providers_cuda.so -with error: libcudnn.so.9: cannot open shared object file -``` - -**Cause:** ONNX Runtime GPU requires cuDNN 9, but CUDA 12.1 base image only has cuDNN 8. - -**Fix:** Update Dockerfile base image: -```dockerfile -FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04 -``` - -**Verify:** -```bash -docker logs miku-stt 2>&1 | grep "Providers" -# Should show: CUDAExecutionProvider (not just CPUExecutionProvider) -``` - -### Issue 2: Connection Refused (Port 8000) -**Symptoms:** -``` -ConnectionRefusedError: [Errno 111] Connect call failed ('172.20.0.5', 8000) -``` - -**Cause:** New ONNX server runs on port 8766, not 8000. - -**Fix:** Update `bot/utils/stt_client.py`: -```python -stt_url: str = "ws://miku-stt:8766/ws/stt" # Changed from 8000 -``` - -### Issue 3: Protocol Mismatch -**Symptoms:** Bot doesn't receive transcripts, or transcripts are empty. - -**Cause:** New ONNX server uses different WebSocket protocol. - -**Old Protocol (NeMo):** Automatic VAD-triggered `partial` and `final` events -**New Protocol (ONNX):** Manual control with `{"type": "final"}` command - -**Fix:** -- Updated `stt_client._handle_event()` to handle `transcript` type with `is_final` flag -- Added `send_final()` method to request final transcription -- Bot should call `stt_client.send_final()` when user stops speaking - -## Rollback Plan - -If needed, revert docker-compose.yml: -```yaml -miku-stt: - build: - context: ./stt - dockerfile: Dockerfile.stt - # ... rest of old config -``` - -## Notes - -- Model downloads on first run (~600MB) -- Models cached in `./stt-parakeet/models/` -- No word-level timestamps (ONNX model doesn't provide them) -- VAD handled internally (no need for external VAD integration) -- Uses same GPU (GTX 1660, device 0) as before diff --git a/STT_VOICE_TESTING.md b/STT_VOICE_TESTING.md deleted file mode 100644 index 0bcabcc..0000000 --- a/STT_VOICE_TESTING.md +++ /dev/null @@ -1,266 +0,0 @@ -# STT Voice Testing Guide - -## Phase 4B: Bot-Side STT Integration - COMPLETE ✅ - -All code has been deployed to containers. Ready for testing! - -## Architecture Overview - -``` -Discord Voice (User) → Opus 48kHz stereo - ↓ - VoiceReceiver.write() - ↓ - Opus decode → Stereo-to-mono → Resample to 16kHz - ↓ - STTClient.send_audio() → WebSocket - ↓ - miku-stt:8001 (Silero VAD + Faster-Whisper) - ↓ - JSON events (vad, partial, final, interruption) - ↓ - VoiceReceiver callbacks → voice_manager - ↓ - on_final_transcript() → _generate_voice_response() - ↓ - LLM streaming → TTS tokens → Audio playback -``` - -## New Voice Commands - -### 1. Start Listening -``` -!miku listen -``` -- Starts listening to **your** voice in the current voice channel -- You must be in the same channel as Miku -- Miku will transcribe your speech and respond with voice - -``` -!miku listen @username -``` -- Start listening to a specific user's voice -- Useful for moderators or testing with multiple users - -### 2. Stop Listening -``` -!miku stop-listening -``` -- Stop listening to your voice -- Miku will no longer transcribe or respond to your speech - -``` -!miku stop-listening @username -``` -- Stop listening to a specific user - -## Testing Procedure - -### Test 1: Basic STT Connection -1. Join a voice channel -2. `!miku join` - Miku joins your channel -3. `!miku listen` - Start listening to your voice -4. Check bot logs for "Started listening to user" -5. Check STT logs: `docker logs miku-stt --tail 50` - - Should show: "WebSocket connection from user {user_id}" - - Should show: "Session started for user {user_id}" - -### Test 2: VAD Detection -1. After `!miku listen`, speak into your microphone -2. Say something like: "Hello Miku, can you hear me?" -3. Check STT logs for VAD events: - ``` - [DEBUG] VAD: speech_start probability=0.85 - [DEBUG] VAD: speaking probability=0.92 - [DEBUG] VAD: speech_end probability=0.15 - ``` -4. Bot logs should show: "VAD event for user {id}: speech_start/speaking/speech_end" - -### Test 3: Transcription -1. Speak clearly into microphone: "Hey Miku, tell me a joke" -2. Watch bot logs for: - - "Partial transcript from user {id}: Hey Miku..." - - "Final transcript from user {id}: Hey Miku, tell me a joke" -3. Miku should respond with LLM-generated speech -4. Check channel for: "🎤 Miku: *[her response]*" - -### Test 4: Interruption Detection -1. `!miku listen` -2. `!miku say Tell me a very long story about your favorite song` -3. While Miku is speaking, start talking yourself -4. Speak loudly enough to trigger VAD (probability > 0.7) -5. Expected behavior: - - Miku's audio should stop immediately - - Bot logs: "User {id} interrupted Miku (probability={prob})" - - STT logs: "Interruption detected during TTS playback" - - RVC logs: "Interrupted: Flushed {N} ZMQ chunks" - -### Test 5: Multi-User (if available) -1. Have two users join voice channel -2. `!miku listen @user1` - Listen to first user -3. `!miku listen @user2` - Listen to second user -4. Both users speak separately -5. Verify Miku responds to each user individually -6. Check STT logs for multiple active sessions - -## Logs to Monitor - -### Bot Logs -```bash -docker logs -f miku-bot | grep -E "(listen|STT|transcript|interrupt)" -``` -Expected output: -``` -[INFO] Started listening to user 123456789 (username) -[DEBUG] VAD event for user 123456789: speech_start -[DEBUG] Partial transcript from user 123456789: Hello Miku... -[INFO] Final transcript from user 123456789: Hello Miku, how are you? -[INFO] User 123456789 interrupted Miku (probability=0.82) -``` - -### STT Logs -```bash -docker logs -f miku-stt -``` -Expected output: -``` -[INFO] WebSocket connection from user_123456789 -[INFO] Session started for user 123456789 -[DEBUG] Received 320 audio samples from user_123456789 -[DEBUG] VAD speech_start: probability=0.87 -[INFO] Transcribing audio segment (duration=2.5s) -[INFO] Final transcript: "Hello Miku, how are you?" -``` - -### RVC Logs (for interruption) -```bash -docker logs -f miku-rvc-api | grep -i interrupt -``` -Expected output: -``` -[INFO] Interrupted: Flushed 15 ZMQ chunks, cleared 48000 RVC buffer samples -``` - -## Component Status - -### ✅ Completed -- [x] STT container running (miku-stt:8001) -- [x] Silero VAD on CPU with chunk buffering -- [x] Faster-Whisper on GTX 1660 (1.3GB VRAM) -- [x] STTClient WebSocket client -- [x] VoiceReceiver Discord audio sink -- [x] VoiceSession STT integration -- [x] listen/stop-listening commands -- [x] /interrupt endpoint in RVC API -- [x] LLM response generation from transcripts -- [x] Interruption detection and cancellation - -### ⏳ Pending Testing -- [ ] Basic STT connection test -- [ ] VAD speech detection test -- [ ] End-to-end transcription test -- [ ] LLM voice response test -- [ ] Interruption cancellation test -- [ ] Multi-user testing (if available) - -### 🔧 Configuration Tuning (after testing) -- VAD sensitivity (currently threshold=0.5) -- VAD timing (min_speech=250ms, min_silence=500ms) -- Interruption threshold (currently 0.7) -- Whisper beam size and patience -- LLM streaming chunk size - -## API Endpoints - -### STT Container (port 8001) -- WebSocket: `ws://localhost:8001/ws/stt/{user_id}` -- Health: `http://localhost:8001/health` - -### RVC Container (port 8765) -- WebSocket: `ws://localhost:8765/ws/stream` -- Interrupt: `http://localhost:8765/interrupt` (POST) -- Health: `http://localhost:8765/health` - -## Troubleshooting - -### No audio received from Discord -- Check bot logs for "write() called with data" -- Verify user is in same voice channel as Miku -- Check Discord permissions (View Channel, Connect, Speak) - -### VAD not detecting speech -- Check chunk buffer accumulation in STT logs -- Verify audio format: PCM int16, 16kHz mono -- Try speaking louder or more clearly -- Check VAD threshold (may need adjustment) - -### Transcription empty or gibberish -- Verify Whisper model loaded (check STT startup logs) -- Check GPU VRAM usage: `nvidia-smi` -- Ensure audio segments are at least 1-2 seconds long -- Try speaking more clearly with less background noise - -### Interruption not working -- Verify Miku is actually speaking (check miku_speaking flag) -- Check VAD probability in logs (must be > 0.7) -- Verify /interrupt endpoint returns success -- Check RVC logs for flushed chunks - -### Multiple users causing issues -- Check STT logs for per-user session management -- Verify each user has separate STTClient instance -- Check for resource contention on GTX 1660 - -## Next Steps After Testing - -### Phase 4C: LLM KV Cache Precomputation -- Use partial transcripts to start LLM generation early -- Precompute KV cache for common phrases -- Reduce latency between speech end and response start - -### Phase 4D: Multi-User Refinement -- Queue management for multiple simultaneous speakers -- Priority system for interruptions -- Resource allocation for multiple Whisper requests - -### Phase 4E: Latency Optimization -- Profile each stage of the pipeline -- Optimize audio chunk sizes -- Reduce WebSocket message overhead -- Tune Whisper beam search parameters -- Implement VAD lookahead for quicker detection - -## Hardware Utilization - -### Current Allocation -- **AMD RX 6800**: LLaMA text models (idle during listen/speak) -- **GTX 1660**: - - Listen phase: Faster-Whisper (1.3GB VRAM) - - Speak phase: Soprano TTS + RVC (time-multiplexed) -- **CPU**: Silero VAD, audio preprocessing - -### Expected Performance -- VAD latency: <50ms (CPU processing) -- Transcription latency: 200-500ms (Whisper inference) -- LLM streaming: 20-30 tokens/sec (RX 6800) -- TTS synthesis: Real-time (GTX 1660) -- Total latency (speech → response): 1-2 seconds - -## Testing Checklist - -Before marking Phase 4B as complete: - -- [ ] Test basic STT connection with `!miku listen` -- [ ] Verify VAD detects speech start/end correctly -- [ ] Confirm transcripts are accurate and complete -- [ ] Test LLM voice response generation works -- [ ] Verify interruption cancels TTS playback -- [ ] Check multi-user handling (if possible) -- [ ] Verify resource cleanup on `!miku stop-listening` -- [ ] Test edge cases (silence, background noise, overlapping speech) -- [ ] Profile latencies at each stage -- [ ] Document any configuration tuning needed - ---- - -**Status**: Code deployed, ready for user testing! 🎤🤖 diff --git a/VOICE_CALL_AUTOMATION.md b/VOICE_CALL_AUTOMATION.md deleted file mode 100644 index 63aa7b6..0000000 --- a/VOICE_CALL_AUTOMATION.md +++ /dev/null @@ -1,261 +0,0 @@ -# Voice Call Automation System - -## Overview - -Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience. - -## Features - -### 1. Voice Debug Mode Toggle -- **Environment Variable**: `VOICE_DEBUG_MODE` (default: `false`) -- When `true`: Shows manual commands, text notifications, transcripts in chat -- When `false` (field deployment): Silent operation, no command notifications - -### 2. Automated Voice Call Flow - -#### Initiation (Web UI → API) -``` -POST /api/voice/call -{ - "user_id": 123456789, - "voice_channel_id": 987654321 -} -``` - -#### What Happens: -1. **Container Startup**: Starts `miku-stt` and `miku-rvc-api` containers -2. **Warmup Wait**: Monitors containers until fully warmed up - - STT: WebSocket connection check (30s timeout) - - TTS: Health endpoint check for `warmed_up: true` (60s timeout) -3. **Join Voice Channel**: Creates voice session with full resource locking -4. **Send DM**: Generates personalized LLM invitation and sends with voice channel invite link -5. **Auto-Listen**: Automatically starts listening when user joins - -#### User Join Detection: -- Monitors `on_voice_state_update` events -- When target user joins: - - Marks `user_has_joined = True` - - Cancels 30min timeout - - Auto-starts STT for that user - -#### Auto-Leave After User Disconnect: -- **45 second timer** starts when user leaves voice channel -- If user doesn't rejoin within 45s: - - Ends voice session - - Stops STT and TTS containers - - Releases all resources - - Returns to normal operation -- If user rejoins before 45s, timer is cancelled - -#### 30-Minute Join Timeout: -- If user never joins within 30 minutes: - - Ends voice session - - Stops containers - - Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙" - -### 3. Container Management - -**File**: `bot/utils/container_manager.py` - -#### Methods: -- `start_voice_containers()`: Starts STT & TTS, waits for warmup -- `stop_voice_containers()`: Stops both containers -- `are_containers_running()`: Check container status -- `_wait_for_stt_warmup()`: WebSocket connection check -- `_wait_for_tts_warmup()`: Health endpoint check - -#### Warmup Detection: -```python -# STT Warmup: Try WebSocket connection -ws://miku-stt:8765 - -# TTS Warmup: Check health endpoint -GET http://miku-rvc-api:8765/health -Response: {"status": "ready", "warmed_up": true} -``` - -### 4. Voice Session Tracking - -**File**: `bot/utils/voice_manager.py` - -#### New VoiceSession Fields: -```python -call_user_id: Optional[int] # User ID that was called -call_timeout_task: Optional[asyncio.Task] # 30min timeout -user_has_joined: bool # Track if user joined -auto_leave_task: Optional[asyncio.Task] # 45s auto-leave -user_leave_time: Optional[float] # When user left -``` - -#### Methods: -- `on_user_join(user_id)`: Handle user joining voice channel -- `on_user_leave(user_id)`: Start 45s auto-leave timer -- `_auto_leave_after_user_disconnect()`: Execute auto-leave - -### 5. LLM Context Update - -Miku's voice chat prompt now includes: -``` -NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel, -so you can mention this if asked about leaving -``` - -### 6. Debug Mode Integration - -#### With `VOICE_DEBUG_MODE=true`: -- Shows "🎤 User said: ..." in text chat -- Shows "💬 Miku: ..." responses -- Shows interruption messages -- Manual commands work (`!miku join`, `!miku listen`, etc.) - -#### With `VOICE_DEBUG_MODE=false` (field deployment): -- No text notifications -- No command outputs -- Silent operation -- Only log files show activity - -## API Endpoint - -### POST `/api/voice/call` - -**Request Body**: -```json -{ - "user_id": 123456789, - "voice_channel_id": 987654321 -} -``` - -**Success Response**: -```json -{ - "success": true, - "user_id": 123456789, - "channel_id": 987654321, - "invite_url": "https://discord.gg/abc123" -} -``` - -**Error Response**: -```json -{ - "success": false, - "error": "Failed to start voice containers" -} -``` - -## File Changes - -### New Files: -1. `bot/utils/container_manager.py` - Docker container management -2. `VOICE_CALL_AUTOMATION.md` - This documentation - -### Modified Files: -1. `bot/globals.py` - Added `VOICE_DEBUG_MODE` flag -2. `bot/api.py` - Added `/api/voice/call` endpoint and timeout handler -3. `bot/bot.py` - Added `on_voice_state_update` event handler -4. `bot/utils/voice_manager.py`: - - Added call tracking fields to VoiceSession - - Added `on_user_join()` and `on_user_leave()` methods - - Added `_auto_leave_after_user_disconnect()` method - - Updated LLM prompt with auto-disconnect context - - Gated debug messages behind `VOICE_DEBUG_MODE` -5. `bot/utils/voice_receiver.py` - Removed Discord VAD events (rely on RealtimeSTT only) - -## Testing Checklist - -### Web UI Integration: -- [ ] Create voice call trigger UI with user ID and channel ID inputs -- [ ] Display call status (starting containers, waiting for warmup, joined VC, waiting for user) -- [ ] Show timeout countdown -- [ ] Handle errors gracefully - -### Flow Testing: -- [ ] Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop) -- [ ] Test 30min timeout (user never joins) -- [ ] Test user rejoin within 45s (cancels auto-leave) -- [ ] Test container failure handling -- [ ] Test warmup timeout handling -- [ ] Test DM failure (should continue anyway) - -### Debug Mode: -- [ ] Test with `VOICE_DEBUG_MODE=true` (should see all notifications) -- [ ] Test with `VOICE_DEBUG_MODE=false` (should be silent) - -## Environment Variables - -Add to `.env` or `docker-compose.yml`: -```bash -VOICE_DEBUG_MODE=false # Set to true for debugging -``` - -## Next Steps - -1. **Web UI**: Create voice call interface with: - - User ID input - - Voice channel ID dropdown (fetch from Discord) - - "Call User" button - - Status display - - Active call management - -2. **Monitoring**: Add voice call metrics: - - Call duration - - User join time - - Auto-leave triggers - - Container startup times - -3. **Enhancements**: - - Multiple simultaneous calls (different channels) - - Call history logging - - User preferences (auto-answer, DND mode) - - Scheduled voice calls - -## Technical Notes - -### Container Warmup Times: -- **STT** (`miku-stt`): ~5-15 seconds (model loading) -- **TTS** (`miku-rvc-api`): ~30-60 seconds (RVC model loading, synthesis warmup) -- **Total**: ~35-75 seconds from API call to ready - -### Resource Management: -- Voice sessions use `VoiceSessionManager` singleton -- Only one voice session active at a time -- Full resource locking during voice: - - AMD GPU for text inference - - Vision model blocked - - Image generation disabled - - Bipolar mode disabled - - Autonomous engine paused - -### Cleanup Guarantees: -- 45s auto-leave ensures no orphaned sessions -- 30min timeout prevents indefinite container running -- All cleanup paths stop containers -- Voice session end releases all resources - -## Troubleshooting - -### Containers won't start: -- Check Docker daemon status -- Check `docker compose ps` for existing containers -- Check logs: `docker logs miku-stt` / `docker logs miku-rvc-api` - -### Warmup timeout: -- STT: Check WebSocket is accepting connections on port 8765 -- TTS: Check health endpoint returns `{"warmed_up": true}` -- Increase timeout values if needed (slow hardware) - -### User never joins: -- Verify invite URL is valid -- Check user has permission to join voice channel -- Verify DM was delivered (may be blocked) - -### Auto-leave not triggering: -- Check `on_voice_state_update` events are firing -- Verify user ID matches `call_user_id` -- Check logs for timer creation/cancellation - -### Containers not stopping: -- Manual stop: `docker compose stop miku-stt miku-rvc-api` -- Check for orphaned containers: `docker ps` -- Force remove: `docker rm -f miku-stt miku-rvc-api` diff --git a/VOICE_CHAT_CONTEXT.md b/VOICE_CHAT_CONTEXT.md deleted file mode 100644 index 55a8d8f..0000000 --- a/VOICE_CHAT_CONTEXT.md +++ /dev/null @@ -1,225 +0,0 @@ -# Voice Chat Context System - -## Implementation Complete ✅ - -Added comprehensive voice chat context to give Miku awareness of the conversation environment. - ---- - -## Features - -### 1. Voice-Aware System Prompt -Miku now knows she's in a voice chat and adjusts her behavior: -- ✅ Aware she's speaking via TTS -- ✅ Knows who she's talking to (user names included) -- ✅ Understands responses will be spoken aloud -- ✅ Instructed to keep responses short (1-3 sentences) -- ✅ **CRITICAL: Instructed to only use English** (TTS can't handle Japanese well) - -### 2. Conversation History (Last 8 Exchanges) -- Stores last 16 messages (8 user + 8 assistant) -- Maintains context across multiple voice interactions -- Automatically trimmed to keep memory manageable -- Each message includes username for multi-user context - -### 3. Personality Integration -- Loads `miku_lore.txt` - Her background, personality, likes/dislikes -- Loads `miku_prompt.txt` - Core personality instructions -- Combines with voice-specific instructions -- Maintains character consistency - -### 4. Reduced Log Spam -- Set voice_recv logger to CRITICAL level -- Suppresses routine CryptoErrors and RTCP packets -- Only shows actual critical errors - ---- - -## System Prompt Structure - -``` -[miku_prompt.txt content] - -[miku_lore.txt content] - -VOICE CHAT CONTEXT: -- You are currently in a voice channel speaking with {user.name} and others -- Your responses will be spoken aloud via text-to-speech -- Keep responses SHORT and CONVERSATIONAL (1-3 sentences max) -- Speak naturally as if having a real-time voice conversation -- IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well. -- Be expressive and use casual language, but stay in character as Miku - -Remember: This is a live voice conversation, so be concise and engaging! -``` - ---- - -## Conversation Flow - -``` -User speaks → STT transcribes → Add to history - ↓ - [System Prompt] - [Last 8 exchanges] - [Current user message] - ↓ - LLM generates - ↓ - Add response to history - ↓ - Stream to TTS → Speak -``` - ---- - -## Message History Format - -```python -conversation_history = [ - {"role": "user", "content": "koko210: Hey Miku, how are you?"}, - {"role": "assistant", "content": "Hey koko210! I'm doing great, thanks for asking!"}, - {"role": "user", "content": "koko210: Can you sing something?"}, - {"role": "assistant", "content": "I'd love to! What song would you like to hear?"}, - # ... up to 16 messages total (8 exchanges) -] -``` - ---- - -## Configuration - -### Conversation History Limit -**Current**: 16 messages (8 exchanges) - -To adjust, edit `voice_manager.py`: -```python -# Keep only last 8 exchanges (16 messages = 8 user + 8 assistant) -if len(self.conversation_history) > 16: - self.conversation_history = self.conversation_history[-16:] -``` - -**Recommendations**: -- **8 exchanges**: Good balance (current setting) -- **12 exchanges**: More context, slightly more tokens -- **4 exchanges**: Minimal context, faster responses - -### Response Length -**Current**: max_tokens=200 - -To adjust: -```python -payload = { - "max_tokens": 200 # Change this -} -``` - ---- - -## Language Enforcement - -### Why English-Only? -The RVC TTS system is trained on English audio and struggles with: -- Japanese characters (even though Miku is Japanese!) -- Special characters -- Mixed language text -- Non-English phonetics - -### Implementation -The system prompt explicitly tells Miku: -> **IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.** - -This is reinforced in every voice chat interaction. - ---- - -## Testing - -### Test 1: Basic Conversation -``` -User: "Hey Miku!" -Miku: "Hi there! Great to hear from you!" (should be in English) -User: "How are you doing?" -Miku: "I'm doing wonderful! How about you?" (remembers previous exchange) -``` - -### Test 2: Context Retention -Have a multi-turn conversation and verify Miku remembers: -- Previous topics discussed -- User names -- Conversation flow - -### Test 3: Response Length -Verify responses are: -- Short (1-3 sentences) -- Conversational -- Not truncated mid-sentence - -### Test 4: Language Enforcement -Try asking in Japanese or requesting Japanese response: -- Miku should politely respond in English -- Should explain she needs to use English for voice chat - ---- - -## Monitoring - -### Check Conversation History -```bash -# Add debug logging to voice_manager.py to see history -logger.debug(f"Conversation history: {self.conversation_history}") -``` - -### Check System Prompt -```bash -docker exec miku-bot cat /app/miku_prompt.txt -docker exec miku-bot cat /app/miku_lore.txt -``` - -### Monitor Responses -```bash -docker logs -f miku-bot | grep "Voice response complete" -``` - ---- - -## Files Modified - -1. **bot/bot.py** - - Changed voice_recv logger level from WARNING to CRITICAL - - Suppresses CryptoError spam - -2. **bot/utils/voice_manager.py** - - Added `conversation_history` to `VoiceSession.__init__()` - - Updated `_generate_voice_response()` to load lore files - - Built comprehensive voice-aware system prompt - - Implemented conversation history tracking (last 8 exchanges) - - Added English-only instruction - - Saves both user and assistant messages to history - ---- - -## Benefits - -✅ **Better Context**: Miku remembers previous exchanges -✅ **Cleaner Logs**: No more CryptoError spam -✅ **Natural Responses**: Knows she's in voice chat, responds appropriately -✅ **Language Consistency**: Enforces English for TTS compatibility -✅ **Personality Intact**: Still loads lore and personality files -✅ **User Awareness**: Knows who she's talking to - ---- - -## Next Steps - -1. **Test thoroughly** with multi-turn conversations -2. **Adjust history length** if needed (currently 8 exchanges) -3. **Fine-tune response length** based on TTS performance -4. **Add conversation reset** command if needed (e.g., `!miku reset`) -5. **Consider adding** conversation summaries for very long sessions - ---- - -**Status**: ✅ **DEPLOYED AND READY FOR TESTING** - -Miku now has full context awareness in voice chat with personality, conversation history, and language enforcement! diff --git a/VOICE_TO_VOICE_REFERENCE.md b/VOICE_TO_VOICE_REFERENCE.md deleted file mode 100644 index e9b1dca..0000000 --- a/VOICE_TO_VOICE_REFERENCE.md +++ /dev/null @@ -1,323 +0,0 @@ -# Voice-to-Voice Quick Reference - -## Complete Pipeline Status ✅ - -All phases complete and deployed! - -## Phase Completion Status - -### ✅ Phase 1: Voice Connection (COMPLETE) -- Discord voice channel connection -- Audio playback via discord.py -- Resource management and cleanup - -### ✅ Phase 2: Audio Streaming (COMPLETE) -- Soprano TTS server (GTX 1660) -- RVC voice conversion -- Real-time streaming via WebSocket -- Token-by-token synthesis - -### ✅ Phase 3: Text-to-Voice (COMPLETE) -- LLaMA text generation (AMD RX 6800) -- Streaming token pipeline -- TTS integration with `!miku say` -- Natural conversation flow - -### ✅ Phase 4A: STT Container (COMPLETE) -- Silero VAD on CPU -- Faster-Whisper on GTX 1660 -- WebSocket server at port 8001 -- Per-user session management -- Chunk buffering for VAD - -### ✅ Phase 4B: Bot STT Integration (COMPLETE - READY FOR TESTING) -- Discord audio capture -- Opus decode + resampling -- STT client WebSocket integration -- Voice commands: `!miku listen`, `!miku stop-listening` -- LLM voice response generation -- Interruption detection and cancellation -- `/interrupt` endpoint in RVC API - -## Quick Start Commands - -### Setup -```bash -!miku join # Join your voice channel -!miku listen # Start listening to your voice -``` - -### Usage -- **Speak** into your microphone -- Miku will **transcribe** your speech -- Miku will **respond** with voice -- **Interrupt** her by speaking while she's talking - -### Teardown -```bash -!miku stop-listening # Stop listening to your voice -!miku leave # Leave voice channel -``` - -## Architecture Diagram - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ USER INPUT │ -└─────────────────────────────────────────────────────────────────┘ - │ - │ Discord Voice (Opus 48kHz) - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ miku-bot Container │ -│ ┌───────────────────────────────────────────────────────────┐ │ -│ │ VoiceReceiver (discord.sinks.Sink) │ │ -│ │ - Opus decode → PCM │ │ -│ │ - Stereo → Mono │ │ -│ │ - Resample 48kHz → 16kHz │ │ -│ └─────────────────┬─────────────────────────────────────────┘ │ -│ │ PCM int16, 16kHz, 20ms chunks │ -│ ┌─────────────────▼─────────────────────────────────────────┐ │ -│ │ STTClient (WebSocket) │ │ -│ │ - Sends audio to miku-stt │ │ -│ │ - Receives VAD events, transcripts │ │ -│ └─────────────────┬─────────────────────────────────────────┘ │ -└────────────────────┼───────────────────────────────────────────┘ - │ ws://miku-stt:8001/ws/stt/{user_id} - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ miku-stt Container │ -│ ┌───────────────────────────────────────────────────────────┐ │ -│ │ VADProcessor (Silero VAD 5.1.2) [CPU] │ │ -│ │ - Chunk buffering (512 samples min) │ │ -│ │ - Speech detection (threshold=0.5) │ │ -│ │ - Events: speech_start, speaking, speech_end │ │ -│ └─────────────────┬─────────────────────────────────────────┘ │ -│ │ Audio segments │ -│ ┌─────────────────▼─────────────────────────────────────────┐ │ -│ │ WhisperTranscriber (Faster-Whisper 1.2.1) [GTX 1660] │ │ -│ │ - Model: small (1.3GB VRAM) │ │ -│ │ - Transcribes speech segments │ │ -│ │ - Returns: partial & final transcripts │ │ -│ └─────────────────┬─────────────────────────────────────────┘ │ -└────────────────────┼───────────────────────────────────────────┘ - │ JSON events via WebSocket - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ miku-bot Container │ -│ ┌───────────────────────────────────────────────────────────┐ │ -│ │ voice_manager.py Callbacks │ │ -│ │ - on_vad_event() → Log VAD states │ │ -│ │ - on_partial_transcript() → Show typing indicator │ │ -│ │ - on_final_transcript() → Generate LLM response │ │ -│ │ - on_interruption() → Cancel TTS playback │ │ -│ └─────────────────┬─────────────────────────────────────────┘ │ -│ │ Final transcript text │ -│ ┌─────────────────▼─────────────────────────────────────────┐ │ -│ │ _generate_voice_response() │ │ -│ │ - Build LLM prompt with conversation history │ │ -│ │ - Stream LLM response │ │ -│ │ - Send tokens to TTS │ │ -│ └─────────────────┬─────────────────────────────────────────┘ │ -└────────────────────┼───────────────────────────────────────────┘ - │ HTTP streaming to LLaMA server - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ llama-cpp-server (AMD RX 6800) │ -│ - Streaming text generation │ -│ - 20-30 tokens/sec │ -│ - Returns: {"delta": {"content": "token"}} │ -└─────────────────┬───────────────────────────────────────────────┘ - │ Token stream - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ miku-bot Container │ -│ ┌───────────────────────────────────────────────────────────┐ │ -│ │ audio_source.send_token() │ │ -│ │ - Buffers tokens │ │ -│ │ - Sends to RVC WebSocket │ │ -│ └─────────────────┬─────────────────────────────────────────┘ │ -└────────────────────┼───────────────────────────────────────────┘ - │ ws://miku-rvc-api:8765/ws/stream - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ miku-rvc-api Container │ -│ ┌───────────────────────────────────────────────────────────┐ │ -│ │ Soprano TTS Server (miku-soprano-tts) [GTX 1660] │ │ -│ │ - Text → Audio synthesis │ │ -│ │ - 32kHz output │ │ -│ └─────────────────┬─────────────────────────────────────────┘ │ -│ │ Raw audio via ZMQ │ -│ ┌─────────────────▼─────────────────────────────────────────┐ │ -│ │ RVC Voice Conversion [GTX 1660] │ │ -│ │ - Voice cloning & pitch shifting │ │ -│ │ - 48kHz output │ │ -│ └─────────────────┬─────────────────────────────────────────┘ │ -└────────────────────┼───────────────────────────────────────────┘ - │ PCM float32, 48kHz - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ miku-bot Container │ -│ ┌───────────────────────────────────────────────────────────┐ │ -│ │ discord.VoiceClient │ │ -│ │ - Plays audio in voice channel │ │ -│ │ - Can be interrupted by user speech │ │ -│ └───────────────────────────────────────────────────────────┘ │ -└─────────────────────────────────────────────────────────────────┘ - │ - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ USER OUTPUT │ -│ (Miku's voice response) │ -└─────────────────────────────────────────────────────────────────┘ -``` - -## Interruption Flow - -``` -User speaks during Miku's TTS - │ - ▼ -VAD detects speech (probability > 0.7) - │ - ▼ -STT sends interruption event - │ - ▼ -on_user_interruption() callback - │ - ▼ -_cancel_tts() → voice_client.stop() - │ - ▼ -POST http://miku-rvc-api:8765/interrupt - │ - ▼ -Flush ZMQ socket + clear RVC buffers - │ - ▼ -Miku stops speaking, ready for new input -``` - -## Hardware Utilization - -### Listen Phase (User Speaking) -- **CPU**: Silero VAD processing -- **GTX 1660**: Faster-Whisper transcription (1.3GB VRAM) -- **AMD RX 6800**: Idle - -### Think Phase (LLM Generation) -- **CPU**: Idle -- **GTX 1660**: Idle -- **AMD RX 6800**: LLaMA inference (20-30 tokens/sec) - -### Speak Phase (Miku Responding) -- **CPU**: Silero VAD monitoring for interruption -- **GTX 1660**: Soprano TTS + RVC synthesis -- **AMD RX 6800**: Idle - -## Performance Metrics - -### Expected Latencies -| Stage | Latency | -|--------------------------|--------------| -| Discord audio capture | ~20ms | -| Opus decode + resample | <10ms | -| VAD processing | <50ms | -| Whisper transcription | 200-500ms | -| LLM token generation | 33-50ms/tok | -| TTS synthesis | Real-time | -| **Total (speech → response)** | **1-2s** | - -### VRAM Usage -| GPU | Component | VRAM | -|-------------|----------------|-----------| -| AMD RX 6800 | LLaMA 8B Q4 | ~5.5GB | -| GTX 1660 | Whisper small | 1.3GB | -| GTX 1660 | Soprano + RVC | ~3GB | - -## Key Files - -### Bot Container -- `bot/utils/stt_client.py` - WebSocket client for STT -- `bot/utils/voice_receiver.py` - Discord audio sink -- `bot/utils/voice_manager.py` - Voice session with STT integration -- `bot/commands/voice.py` - Voice commands including listen/stop-listening - -### STT Container -- `stt/vad_processor.py` - Silero VAD with chunk buffering -- `stt/whisper_transcriber.py` - Faster-Whisper transcription -- `stt/stt_server.py` - FastAPI WebSocket server - -### RVC Container -- `soprano_to_rvc/soprano_rvc_api.py` - TTS + RVC pipeline with /interrupt endpoint - -## Configuration Files - -### docker-compose.yml -- Network: `miku-network` (all containers) -- Ports: - - miku-bot: 8081 (API) - - miku-rvc-api: 8765 (TTS) - - miku-stt: 8001 (STT) - - llama-cpp-server: 8080 (LLM) - -### VAD Settings (stt/vad_processor.py) -```python -threshold = 0.5 # Speech detection sensitivity -min_speech = 250 # Minimum speech duration (ms) -min_silence = 500 # Silence before speech_end (ms) -interruption_threshold = 0.7 # Probability for interruption -``` - -### Whisper Settings (stt/whisper_transcriber.py) -```python -model = "small" # 1.3GB VRAM -device = "cuda" -compute_type = "float16" -beam_size = 5 -patience = 1.0 -``` - -## Testing Commands - -```bash -# Check all container health -curl http://localhost:8001/health # STT -curl http://localhost:8765/health # RVC -curl http://localhost:8080/health # LLM - -# Monitor logs -docker logs -f miku-bot | grep -E "(listen|transcript|interrupt)" -docker logs -f miku-stt -docker logs -f miku-rvc-api | grep interrupt - -# Test interrupt endpoint -curl -X POST http://localhost:8765/interrupt - -# Check GPU usage -nvidia-smi -``` - -## Troubleshooting - -| Issue | Solution | -|-------|----------| -| No audio from Discord | Check bot has Connect and Speak permissions | -| VAD not detecting | Speak louder, check microphone, lower threshold | -| Empty transcripts | Speak for at least 1-2 seconds, check Whisper model | -| Interruption not working | Verify `miku_speaking=true`, check VAD probability | -| High latency | Profile each stage, check GPU utilization | - -## Next Features (Phase 4C+) - -- [ ] KV cache precomputation from partial transcripts -- [ ] Multi-user simultaneous conversation -- [ ] Latency optimization (<1s total) -- [ ] Voice activity history and analytics -- [ ] Emotion detection from speech patterns -- [ ] Context-aware interruption handling - ---- - -**Ready to test!** Use `!miku join` → `!miku listen` → speak to Miku 🎤