HIGH: No Graceful Shutdown on Bot Exit #5

Closed
opened 2026-02-16 22:01:54 +02:00 by Koko210 · 1 comment
Owner

What the Problem Is

When the bot is stopped (SIGTERM/SIGINT), it does not properly clean up resources. Voice connections remain active, WebSocket connections hang, and in-memory state is lost.

Where It Occurs

  • bot/bot.py#L861-L862 - Main loop run without shutdown handler
  • bot/bot.py - No signal handlers registered
  • bot/server_manager.py - Server processes not terminated gracefully

Why This Is a Problem

  1. Data Loss: In-memory conversation history not persisted
  2. Ghost Sessions: Voice connections not closed
  3. Resource Leaks: WebSocket connections not terminated
  4. Corrupted State: Files may be left mid-write

What Can Go Wrong

Scenario 1: Docker Container Stops

  1. User runs docker-compose down
  2. Container receives SIGTERM
  3. Bot process terminates immediately
  4. Voice connections still active on Discord side
  5. Users still see bot as "in voice" but it does not respond
  6. In-memory memory consolidation data is lost
  7. Bot restarts, tries to join voice again, but still has ghost connection
  8. Discord API rejects connection: "Already connected to voice"

Scenario 2: Keyboard Interrupt During File Write

  1. Bot is writing consolidated memories to disk (large JSON file)
  2. User presses Ctrl+C to stop bot
  3. File is left in incomplete/corrupted state
  4. Bot crashes on restart when loading corrupted file
  5. Bot cannot start until corrupted file is manually deleted

Proposed Fix

Implement graceful shutdown handler:

# bot/bot.py
import signal
import asyncio
from bot.utils.voice_manager import VoiceManager

voice_manager = VoiceManager()

class GracefulShutdown:
    def __init__(self):
        self.shutdown_requested = False
        self.shutdown_complete = asyncio.Event()
    
    async def shutdown(self):
        """Perform graceful shutdown sequence"""
        if self.shutdown_requested:
            return
        
        self.shutdown_requested = True
        logger.warning("Shutdown requested, initiating cleanup...")
        
        # 1. Stop accepting new commands
        bot.is_shutdown = True
        
        # 2. Wait for in-flight tasks to complete (with timeout)
        logger.info("Waiting for in-flight tasks (max 30 seconds)...")
        tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
        if tasks:
            await asyncio.wait(tasks, timeout=30)
        
        # 3. Clean up voice sessions
        logger.info("Cleaning up voice sessions...")
        await voice_manager.cleanup_all_sessions()
        
        # 4. Persist in-memory state
        logger.info("Persisting in-memory state...")
        # Save conversation history, memory consolidation data, etc.
        
        # 5. Close WebSocket connections
        logger.info("Closing connections...")
        await bot.close()
        
        self.shutdown_complete.set()
        logger.info("Shutdown complete")

def signal_handler(signum, frame):
    """Handle SIGTERM/SIGINT"""
    logger.warning(f"Received signal {signum}, initiating graceful shutdown...")
    # Trigger shutdown in event loop
    asyncio.create_task(graceful_shutdown.shutdown())

# Register signal handlers
signal.signal(signal.SIGTERM, signal_handler)
signal.signal(signal.SIGINT, signal_handler)

# Wrap main loop with shutdown check
try:
    bot.run(TOKEN)
finally:
    if not graceful_shutdown.shutdown_complete.is_set():
        logger.warning("Bot exited, performing emergency shutdown...")
        asyncio.run(graceful_shutdown.shutdown())

Severity

HIGH - Lack of graceful shutdown causes data loss, corrupted state, and ghost connections after restarts.

Files Affected

bot/bot.py, bot/server_manager.py, bot/utils/voice_manager.py

## What the Problem Is When the bot is stopped (SIGTERM/SIGINT), it does not properly clean up resources. Voice connections remain active, WebSocket connections hang, and in-memory state is lost. ## Where It Occurs - `bot/bot.py#L861-L862` - Main loop run without shutdown handler - `bot/bot.py` - No signal handlers registered - `bot/server_manager.py` - Server processes not terminated gracefully ## Why This Is a Problem 1. **Data Loss**: In-memory conversation history not persisted 2. **Ghost Sessions**: Voice connections not closed 3. **Resource Leaks**: WebSocket connections not terminated 4. **Corrupted State**: Files may be left mid-write ## What Can Go Wrong ### Scenario 1: Docker Container Stops 1. User runs `docker-compose down` 2. Container receives SIGTERM 3. Bot process terminates immediately 4. Voice connections still active on Discord side 5. Users still see bot as "in voice" but it does not respond 6. In-memory memory consolidation data is lost 7. Bot restarts, tries to join voice again, but still has ghost connection 8. **Discord API rejects connection: "Already connected to voice"** ### Scenario 2: Keyboard Interrupt During File Write 1. Bot is writing consolidated memories to disk (large JSON file) 2. User presses Ctrl+C to stop bot 3. File is left in incomplete/corrupted state 4. Bot crashes on restart when loading corrupted file 5. **Bot cannot start until corrupted file is manually deleted** ## Proposed Fix Implement graceful shutdown handler: ```python # bot/bot.py import signal import asyncio from bot.utils.voice_manager import VoiceManager voice_manager = VoiceManager() class GracefulShutdown: def __init__(self): self.shutdown_requested = False self.shutdown_complete = asyncio.Event() async def shutdown(self): """Perform graceful shutdown sequence""" if self.shutdown_requested: return self.shutdown_requested = True logger.warning("Shutdown requested, initiating cleanup...") # 1. Stop accepting new commands bot.is_shutdown = True # 2. Wait for in-flight tasks to complete (with timeout) logger.info("Waiting for in-flight tasks (max 30 seconds)...") tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()] if tasks: await asyncio.wait(tasks, timeout=30) # 3. Clean up voice sessions logger.info("Cleaning up voice sessions...") await voice_manager.cleanup_all_sessions() # 4. Persist in-memory state logger.info("Persisting in-memory state...") # Save conversation history, memory consolidation data, etc. # 5. Close WebSocket connections logger.info("Closing connections...") await bot.close() self.shutdown_complete.set() logger.info("Shutdown complete") def signal_handler(signum, frame): """Handle SIGTERM/SIGINT""" logger.warning(f"Received signal {signum}, initiating graceful shutdown...") # Trigger shutdown in event loop asyncio.create_task(graceful_shutdown.shutdown()) # Register signal handlers signal.signal(signal.SIGTERM, signal_handler) signal.signal(signal.SIGINT, signal_handler) # Wrap main loop with shutdown check try: bot.run(TOKEN) finally: if not graceful_shutdown.shutdown_complete.is_set(): logger.warning("Bot exited, performing emergency shutdown...") asyncio.run(graceful_shutdown.shutdown()) ``` ## Severity **HIGH** - Lack of graceful shutdown causes data loss, corrupted state, and ghost connections after restarts. ## Files Affected bot/bot.py, bot/server_manager.py, bot/utils/voice_manager.py
Koko210 reopened this issue 2026-02-16 22:17:02 +02:00
Author
Owner

Fixed in commit 8d51370. Implemented a comprehensive async graceful_shutdown() function that replaces the old sync-only handler. The shutdown sequence now: (1) ends active voice sessions, (2) saves autonomous state, (3) stops APScheduler, (4) cancels all tracked background tasks, (5) closes the Discord gateway. Signal handlers (SIGTERM/SIGINT) schedule the async shutdown on the event loop, with atexit kept as last-resort fallback.

Fixed in commit 8d51370. Implemented a comprehensive async graceful_shutdown() function that replaces the old sync-only handler. The shutdown sequence now: (1) ends active voice sessions, (2) saves autonomous state, (3) stops APScheduler, (4) cancels all tracked background tasks, (5) closes the Discord gateway. Signal handlers (SIGTERM/SIGINT) schedule the async shutdown on the event loop, with atexit kept as last-resort fallback.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Koko210/miku-discord#5