Koko210/miku-discord

Fork 0

Files

koko210Serve c58b941587 moved AI generated readmes to readme folder (may delete)

2026-01-27 19:57:48 +02:00

6.3 KiB

Raw Blame History

Voice Chat Context System

Implementation Complete ✅

Added comprehensive voice chat context to give Miku awareness of the conversation environment.

Features

1. Voice-Aware System Prompt

Miku now knows she's in a voice chat and adjusts her behavior:

✅ Aware she's speaking via TTS
✅ Knows who she's talking to (user names included)
✅ Understands responses will be spoken aloud
✅ Instructed to keep responses short (1-3 sentences)
✅ CRITICAL: Instructed to only use English (TTS can't handle Japanese well)

2. Conversation History (Last 8 Exchanges)

Stores last 16 messages (8 user + 8 assistant)
Maintains context across multiple voice interactions
Automatically trimmed to keep memory manageable
Each message includes username for multi-user context

3. Personality Integration

Loads miku_lore.txt - Her background, personality, likes/dislikes
Loads miku_prompt.txt - Core personality instructions
Combines with voice-specific instructions
Maintains character consistency

4. Reduced Log Spam

Set voice_recv logger to CRITICAL level
Suppresses routine CryptoErrors and RTCP packets
Only shows actual critical errors

System Prompt Structure

[miku_prompt.txt content]

[miku_lore.txt content]

VOICE CHAT CONTEXT:
- You are currently in a voice channel speaking with {user.name} and others
- Your responses will be spoken aloud via text-to-speech
- Keep responses SHORT and CONVERSATIONAL (1-3 sentences max)
- Speak naturally as if having a real-time voice conversation
- IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.
- Be expressive and use casual language, but stay in character as Miku

Remember: This is a live voice conversation, so be concise and engaging!

Conversation Flow

User speaks → STT transcribes → Add to history
                                      ↓
                              [System Prompt]
                              [Last 8 exchanges]
                              [Current user message]
                                      ↓
                                  LLM generates
                                      ↓
                              Add response to history
                                      ↓
                              Stream to TTS → Speak

Message History Format

conversation_history = [
    {"role": "user", "content": "koko210: Hey Miku, how are you?"},
    {"role": "assistant", "content": "Hey koko210! I'm doing great, thanks for asking!"},
    {"role": "user", "content": "koko210: Can you sing something?"},
    {"role": "assistant", "content": "I'd love to! What song would you like to hear?"},
    # ... up to 16 messages total (8 exchanges)
]

Configuration

Conversation History Limit

Current: 16 messages (8 exchanges)

To adjust, edit voice_manager.py:

# Keep only last 8 exchanges (16 messages = 8 user + 8 assistant)
if len(self.conversation_history) > 16:
    self.conversation_history = self.conversation_history[-16:]

Recommendations:

8 exchanges: Good balance (current setting)
12 exchanges: More context, slightly more tokens
4 exchanges: Minimal context, faster responses

Response Length

Current: max_tokens=200

To adjust:

payload = {
    "max_tokens": 200  # Change this
}

Language Enforcement

Why English-Only?

The RVC TTS system is trained on English audio and struggles with:

Japanese characters (even though Miku is Japanese!)
Special characters
Mixed language text
Non-English phonetics

Implementation

The system prompt explicitly tells Miku:

IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.

This is reinforced in every voice chat interaction.

Testing

Test 1: Basic Conversation

User: "Hey Miku!"
Miku: "Hi there! Great to hear from you!" (should be in English)
User: "How are you doing?"
Miku: "I'm doing wonderful! How about you?" (remembers previous exchange)

Test 2: Context Retention

Have a multi-turn conversation and verify Miku remembers:

Previous topics discussed
User names
Conversation flow

Test 3: Response Length

Verify responses are:

Short (1-3 sentences)
Conversational
Not truncated mid-sentence

Test 4: Language Enforcement

Try asking in Japanese or requesting Japanese response:

Miku should politely respond in English
Should explain she needs to use English for voice chat

Monitoring

Check Conversation History

# Add debug logging to voice_manager.py to see history
logger.debug(f"Conversation history: {self.conversation_history}")

Check System Prompt

docker exec miku-bot cat /app/miku_prompt.txt
docker exec miku-bot cat /app/miku_lore.txt

Monitor Responses

docker logs -f miku-bot | grep "Voice response complete"

Files Modified

bot/bot.py
- Changed voice_recv logger level from WARNING to CRITICAL
- Suppresses CryptoError spam
bot/utils/voice_manager.py
- Added conversation_history to VoiceSession.__init__()
- Updated _generate_voice_response() to load lore files
- Built comprehensive voice-aware system prompt
- Implemented conversation history tracking (last 8 exchanges)
- Added English-only instruction
- Saves both user and assistant messages to history

Benefits

✅ Better Context: Miku remembers previous exchanges
✅ Cleaner Logs: No more CryptoError spam
✅ Natural Responses: Knows she's in voice chat, responds appropriately
✅ Language Consistency: Enforces English for TTS compatibility
✅ Personality Intact: Still loads lore and personality files
✅ User Awareness: Knows who she's talking to

Next Steps

Test thoroughly with multi-turn conversations
Adjust history length if needed (currently 8 exchanges)
Fine-tune response length based on TTS performance
Add conversation reset command if needed (e.g., !miku reset)
Consider adding conversation summaries for very long sessions

Status: ✅ DEPLOYED AND READY FOR TESTING

Miku now has full context awareness in voice chat with personality, conversation history, and language enforcement!

6.3 KiB Raw Blame History