Files
miku-discord/readmes/VOICE_CHAT_CONTEXT.md

6.3 KiB

Voice Chat Context System

Implementation Complete

Added comprehensive voice chat context to give Miku awareness of the conversation environment.


Features

1. Voice-Aware System Prompt

Miku now knows she's in a voice chat and adjusts her behavior:

  • Aware she's speaking via TTS
  • Knows who she's talking to (user names included)
  • Understands responses will be spoken aloud
  • Instructed to keep responses short (1-3 sentences)
  • CRITICAL: Instructed to only use English (TTS can't handle Japanese well)

2. Conversation History (Last 8 Exchanges)

  • Stores last 16 messages (8 user + 8 assistant)
  • Maintains context across multiple voice interactions
  • Automatically trimmed to keep memory manageable
  • Each message includes username for multi-user context

3. Personality Integration

  • Loads miku_lore.txt - Her background, personality, likes/dislikes
  • Loads miku_prompt.txt - Core personality instructions
  • Combines with voice-specific instructions
  • Maintains character consistency

4. Reduced Log Spam

  • Set voice_recv logger to CRITICAL level
  • Suppresses routine CryptoErrors and RTCP packets
  • Only shows actual critical errors

System Prompt Structure

[miku_prompt.txt content]

[miku_lore.txt content]

VOICE CHAT CONTEXT:
- You are currently in a voice channel speaking with {user.name} and others
- Your responses will be spoken aloud via text-to-speech
- Keep responses SHORT and CONVERSATIONAL (1-3 sentences max)
- Speak naturally as if having a real-time voice conversation
- IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.
- Be expressive and use casual language, but stay in character as Miku

Remember: This is a live voice conversation, so be concise and engaging!

Conversation Flow

User speaks → STT transcribes → Add to history
                                      ↓
                              [System Prompt]
                              [Last 8 exchanges]
                              [Current user message]
                                      ↓
                                  LLM generates
                                      ↓
                              Add response to history
                                      ↓
                              Stream to TTS → Speak

Message History Format

conversation_history = [
    {"role": "user", "content": "koko210: Hey Miku, how are you?"},
    {"role": "assistant", "content": "Hey koko210! I'm doing great, thanks for asking!"},
    {"role": "user", "content": "koko210: Can you sing something?"},
    {"role": "assistant", "content": "I'd love to! What song would you like to hear?"},
    # ... up to 16 messages total (8 exchanges)
]

Configuration

Conversation History Limit

Current: 16 messages (8 exchanges)

To adjust, edit voice_manager.py:

# Keep only last 8 exchanges (16 messages = 8 user + 8 assistant)
if len(self.conversation_history) > 16:
    self.conversation_history = self.conversation_history[-16:]

Recommendations:

  • 8 exchanges: Good balance (current setting)
  • 12 exchanges: More context, slightly more tokens
  • 4 exchanges: Minimal context, faster responses

Response Length

Current: max_tokens=200

To adjust:

payload = {
    "max_tokens": 200  # Change this
}

Language Enforcement

Why English-Only?

The RVC TTS system is trained on English audio and struggles with:

  • Japanese characters (even though Miku is Japanese!)
  • Special characters
  • Mixed language text
  • Non-English phonetics

Implementation

The system prompt explicitly tells Miku:

IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.

This is reinforced in every voice chat interaction.


Testing

Test 1: Basic Conversation

User: "Hey Miku!"
Miku: "Hi there! Great to hear from you!" (should be in English)
User: "How are you doing?"
Miku: "I'm doing wonderful! How about you?" (remembers previous exchange)

Test 2: Context Retention

Have a multi-turn conversation and verify Miku remembers:

  • Previous topics discussed
  • User names
  • Conversation flow

Test 3: Response Length

Verify responses are:

  • Short (1-3 sentences)
  • Conversational
  • Not truncated mid-sentence

Test 4: Language Enforcement

Try asking in Japanese or requesting Japanese response:

  • Miku should politely respond in English
  • Should explain she needs to use English for voice chat

Monitoring

Check Conversation History

# Add debug logging to voice_manager.py to see history
logger.debug(f"Conversation history: {self.conversation_history}")

Check System Prompt

docker exec miku-bot cat /app/miku_prompt.txt
docker exec miku-bot cat /app/miku_lore.txt

Monitor Responses

docker logs -f miku-bot | grep "Voice response complete"

Files Modified

  1. bot/bot.py

    • Changed voice_recv logger level from WARNING to CRITICAL
    • Suppresses CryptoError spam
  2. bot/utils/voice_manager.py

    • Added conversation_history to VoiceSession.__init__()
    • Updated _generate_voice_response() to load lore files
    • Built comprehensive voice-aware system prompt
    • Implemented conversation history tracking (last 8 exchanges)
    • Added English-only instruction
    • Saves both user and assistant messages to history

Benefits

Better Context: Miku remembers previous exchanges
Cleaner Logs: No more CryptoError spam
Natural Responses: Knows she's in voice chat, responds appropriately
Language Consistency: Enforces English for TTS compatibility
Personality Intact: Still loads lore and personality files
User Awareness: Knows who she's talking to


Next Steps

  1. Test thoroughly with multi-turn conversations
  2. Adjust history length if needed (currently 8 exchanges)
  3. Fine-tune response length based on TTS performance
  4. Add conversation reset command if needed (e.g., !miku reset)
  5. Consider adding conversation summaries for very long sessions

Status: DEPLOYED AND READY FOR TESTING

Miku now has full context awareness in voice chat with personality, conversation history, and language enforcement!