HIGH: Add Comprehensive Input Validation #29

New Issue

Koko210 · 2026-02-16T22:52:01+02:00

Koko210 commented

2026-02-16 22:52:01 +02:00

User inputs, API responses, and configuration values lack comprehensive validation, causing crashes and unexpected behavior.

Where It Occurs

bot/bot.py - Message/command handling
bot/api.py - API endpoints
bot/config.py - Configuration loading
bot/utils/llm.py - LLM response parsing
bot/stt_client.py - Speech-to-text results

Why This Is a Problem

Crashes: Invalid inputs cause unhandled exceptions
Security Risks: No sanitization of user content
Data Corruption: Bad data propagates through system
Confusing Errors: Generic exceptions instead of meaningful validation errors

What Can Go Wrong

Scenario 1: Invalid Configuration Value

User sets invalid LLM temperature in config.yaml
config.py loads value as string instead of float
LLM API call fails with validation error
Bot crashes instead of showing helpful error
User confused what's wrong

Scenario 2: Malicious User Input

User sends message with special characters
Message passed directly to LLM prompt
Prompt injection possible
LLM responds with unintended behavior
Security vulnerability exploited

Scenario 3: Empty/Missing Data

User sends empty message
Bot tries to process None/empty string
String operations fail
Bot crashes
Bad user experience

Proposed Fix

Implement comprehensive input validation:

# bot/utils/validation.py - NEW FILE
import re
from typing import Any, Optional, List
from pydantic import BaseModel, Field, validator
from enum import Enum
import logging

logger = logging.getLogger(__name__)

class ValidationError(Exception):
    def __init__(self, message: str, field: str):
        self.message = message
        self.field = field
        super().__init__(f"{field}: {message}")

class MessageType(str, Enum):
    TEXT = "text"
    VOICE = "voice"
    IMAGE = "image"

class UserMessage(BaseModel):
    """Validated user message"""
    content: str = Field(..., min_length=1, max_length=5000)
    user_id: int
    guild_id: int
    message_type: MessageType = MessageType.TEXT
    
    @validator('content')
    def validate_content(cls, v):
        if not v or not v.strip():
            raise ValueError("Content cannot be empty")
        # Remove potential injection patterns
        if re.search(r'<script.*?>|javascript:|on\w+\s*=', v, re.IGNORECASE):
            raise ValueError("Content contains potentially malicious patterns")
        return v.strip()[:5000]

class LLMConfig(BaseModel):
    """Validated LLM configuration"""
    temperature: float = Field(0.7, ge=0.0, le=2.0)
    max_tokens: int = Field(2048, ge=1, le=8192)
    top_p: float = Field(0.9, ge=0.0, le=1.0)
    
    @validator('temperature')
    def validate_temperature(cls, v):
        if not 0.0 <= v <= 2.0:
            raise ValueError("Temperature must be between 0.0 and 2.0")
        return float(v)

class STTResult(BaseModel):
    """Validated speech-to-text result"""
    text: str = Field(..., min_length=1, max_length=10000)
    confidence: float = Field(0.0, ge=0.0, le=1.0)
    language: str = "en"
    
    @validator('text')
    def validate_text(cls, v):
        if not v or not v.strip():
            raise ValueError("Transcription cannot be empty")
        return v.strip()

def validate_user_message(content: str, user_id: int, guild_id: int) -> UserMessage:
    """Validate and parse user message"""
    try:
        return UserMessage(
            content=content,
            user_id=user_id,
            guild_id=guild_id
        )
    except Exception as e:
        logger.warning(f"Message validation failed: {e}")
        raise ValidationError(str(e), "message")

def validate_llm_config(config: dict) -> LLMConfig:
    """Validate LLM configuration"""
    try:
        return LLMConfig(**config)
    except Exception as e:
        logger.warning(f"LLM config validation failed: {e}")
        raise ValidationError(str(e), "llm_config")

def validate_stt_result(result: dict) -> STTResult:
    """Validate speech-to-text result"""
    try:
        return STTResult(**result)
    except Exception as e:
        logger.warning(f"STT result validation failed: {e}")
        raise ValidationError(str(e), "stt_result")

# Usage in bot.py
@bot.event
async def on_message(message):
    try:
        validated_msg = validate_user_message(
            content=message.content,
            user_id=message.author.id,
            guild_id=message.guild.id
        )
        await process_validated_message(validated_msg)
    except ValidationError as e:
        await message.channel.send(f"Invalid message: {e.message}")
    except Exception as e:
        logger.error(f"Unexpected error processing message: {e}")
        await message.channel.send("Sorry, I couldn't process that message.")

# Usage in config.py
def load_config():
    config_data = yaml.safe_load(open('config.yaml'))
    try:
        llm_config = validate_llm_config(config_data.get('llm', {}))
        config_data['llm'] = llm_config.dict()
    except ValidationError as e:
        logger.error(f"Invalid LLM configuration: {e}")
        raise
    return config_data

Severity

HIGH - Lack of input validation causes crashes, security issues, and poor user experience.

Files Affected

bot/bot.py, bot/api.py, bot/config.py, bot/utils/llm.py, bot/stt_client.py, new file: bot/utils/validation.py

User inputs, API responses, and configuration values lack comprehensive validation, causing crashes and unexpected behavior. ## Where It Occurs - bot/bot.py - Message/command handling - bot/api.py - API endpoints - bot/config.py - Configuration loading - bot/utils/llm.py - LLM response parsing - bot/stt_client.py - Speech-to-text results ## Why This Is a Problem 1. Crashes: Invalid inputs cause unhandled exceptions 2. Security Risks: No sanitization of user content 3. Data Corruption: Bad data propagates through system 4. Confusing Errors: Generic exceptions instead of meaningful validation errors ## What Can Go Wrong ### Scenario 1: Invalid Configuration Value 1. User sets invalid LLM temperature in config.yaml 2. config.py loads value as string instead of float 3. LLM API call fails with validation error 4. Bot crashes instead of showing helpful error 5. User confused what's wrong ### Scenario 2: Malicious User Input 1. User sends message with special characters 2. Message passed directly to LLM prompt 3. Prompt injection possible 4. LLM responds with unintended behavior 5. Security vulnerability exploited ### Scenario 3: Empty/Missing Data 1. User sends empty message 2. Bot tries to process None/empty string 3. String operations fail 4. Bot crashes 5. Bad user experience ## Proposed Fix Implement comprehensive input validation: ```python # bot/utils/validation.py - NEW FILE import re from typing import Any, Optional, List from pydantic import BaseModel, Field, validator from enum import Enum import logging logger = logging.getLogger(__name__) class ValidationError(Exception): def __init__(self, message: str, field: str): self.message = message self.field = field super().__init__(f"{field}: {message}") class MessageType(str, Enum): TEXT = "text" VOICE = "voice" IMAGE = "image" class UserMessage(BaseModel): """Validated user message""" content: str = Field(..., min_length=1, max_length=5000) user_id: int guild_id: int message_type: MessageType = MessageType.TEXT @validator('content') def validate_content(cls, v): if not v or not v.strip(): raise ValueError("Content cannot be empty") # Remove potential injection patterns if re.search(r'<script.*?>|javascript:|on\w+\s*=', v, re.IGNORECASE): raise ValueError("Content contains potentially malicious patterns") return v.strip()[:5000] class LLMConfig(BaseModel): """Validated LLM configuration""" temperature: float = Field(0.7, ge=0.0, le=2.0) max_tokens: int = Field(2048, ge=1, le=8192) top_p: float = Field(0.9, ge=0.0, le=1.0) @validator('temperature') def validate_temperature(cls, v): if not 0.0 <= v <= 2.0: raise ValueError("Temperature must be between 0.0 and 2.0") return float(v) class STTResult(BaseModel): """Validated speech-to-text result""" text: str = Field(..., min_length=1, max_length=10000) confidence: float = Field(0.0, ge=0.0, le=1.0) language: str = "en" @validator('text') def validate_text(cls, v): if not v or not v.strip(): raise ValueError("Transcription cannot be empty") return v.strip() def validate_user_message(content: str, user_id: int, guild_id: int) -> UserMessage: """Validate and parse user message""" try: return UserMessage( content=content, user_id=user_id, guild_id=guild_id ) except Exception as e: logger.warning(f"Message validation failed: {e}") raise ValidationError(str(e), "message") def validate_llm_config(config: dict) -> LLMConfig: """Validate LLM configuration""" try: return LLMConfig(**config) except Exception as e: logger.warning(f"LLM config validation failed: {e}") raise ValidationError(str(e), "llm_config") def validate_stt_result(result: dict) -> STTResult: """Validate speech-to-text result""" try: return STTResult(**result) except Exception as e: logger.warning(f"STT result validation failed: {e}") raise ValidationError(str(e), "stt_result") # Usage in bot.py @bot.event async def on_message(message): try: validated_msg = validate_user_message( content=message.content, user_id=message.author.id, guild_id=message.guild.id ) await process_validated_message(validated_msg) except ValidationError as e: await message.channel.send(f"Invalid message: {e.message}") except Exception as e: logger.error(f"Unexpected error processing message: {e}") await message.channel.send("Sorry, I couldn't process that message.") # Usage in config.py def load_config(): config_data = yaml.safe_load(open('config.yaml')) try: llm_config = validate_llm_config(config_data.get('llm', {})) config_data['llm'] = llm_config.dict() except ValidationError as e: logger.error(f"Invalid LLM configuration: {e}") raise return config_data ``` ## Severity HIGH - Lack of input validation causes crashes, security issues, and poor user experience. ## Files Affected bot/bot.py, bot/api.py, bot/config.py, bot/utils/llm.py, bot/stt_client.py, new file: bot/utils/validation.py

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Koko210/miku-discord#29