# GEMI - Generative Model Interface ## Overview GEMI is a flexible tool for interfacing with AI language models via external HTTP REST APIs (such as OpenAI, Anthropic, or compatible endpoints). It provides multiple interface options (CLI, REPL, Web UI) and advanced features like conversation management, file context integration, and intelligent document retrieval. ## Architecture ### Core Module **Responsibilities:** - HTTP API client for communication with external model service (e.g. OpenAI, Anthropic, Gemini, Ollama) - Conversation context management - Message formatting and transformation (e.g. an adapter to the varying protocols of each of the models' API specs) - File context integration and caching (configurable triggers for when the model should look at file metadata and when it should read file contents) - Session persistence and retrieval ### CLI Interface **Responsibilities:** - Command-line argument parsing - Single-shot query execution - Scriptable automation support - Output formatting for pipes/redirects ### REPL Interface **Responsibilities:** - Interactive multi-turn conversations - Command history and editing - Session switching and management - Rich terminal output formatting ### Log Module **Responsibilities:** - Structured logging (debug, info, warn, error) - Multiple output targets (console, file, syslog) - Request/response logging for debugging - Performance metrics tracking ### Config Module **Responsibilities:** - Configuration file loading (YAML/TOML/JSON) - Environment variable overrides - CLI argument precedence - Validation and defaults ### Web Server Module **Responsibilities:** - HTTP server for web UI - WebSocket for real-time chat - Session authentication - Static file serving - RESTful API endpoints ## Features ### Conversation Management - **Persistent History**: Conversations saved to disk with configurable retention - **Session Selection**: Resume previous conversations or start fresh - **Context Editing**: Manually modify conversation history before sending - **Multi-Session Support**: Manage multiple concurrent conversation threads - **Export/Import**: Share conversations in portable formats ### File Integration - **Local Context**: Include local files in prompts - **Metadata Store**: Index file information for relevance matching - **Smart Suggestions**: AI-recommended files based on query content - **Directory Watching**: Auto-update file index on changes (optional) - **Git Integration**: Respect .gitignore patterns ### Model Configuration - **System Prompts**: Customizable with sensible defaults - **Parameter Tuning**: Temperature, top-p, max tokens, etc. - **Model Selection**: Easy switching between models/providers - **Token Counting**: Preview token usage before sending - **Cost Estimation**: Track API usage costs ### Quality of Life - **Syntax Highlighting**: Code blocks in terminal/web - **Copy Controls**: Easy copying of AI responses - **Keyboard Shortcuts**: Efficient navigation and actions - **Diff View**: See changes in code suggestions - **Search History**: Full-text search across conversations ## Technical Requirements ### Language & Runtime - **Language**: Python 3.10+ (current implementation) - **Async Support**: Required for efficient HTTP operations and concurrent tasks - **Cross-platform**: Must work on Linux, macOS, Windows ### HTTP Client Requirements - Async/await support - Connection pooling - Timeout and retry mechanisms - Streaming response support - Custom header support ### CLI Requirements - Argument parsing with subcommands - Help text generation - Input/output stream handling - Exit code standards ### REPL Requirements - Multi-line input support - Command history persistence - Auto-completion - Syntax highlighting for code blocks - Configurable key bindings ### Storage Requirements - Relational database for structured data (conversations, sessions) - File-based storage for large content - Transaction support - Schema migration capability - Optional: Vector similarity search ### Web Server Requirements (Future) - Async request handling - WebSocket support for streaming - Static file serving - Request validation - Session management ## Technology Considerations *Note: These are suggested technologies based on the current Python implementation. Alternative choices are acceptable if they meet the technical requirements above.* ### Current Implementation Stack - **HTTP**: httpx (alternatives: aiohttp, requests) - **Validation**: pydantic (alternatives: marshmallow, attrs) - **CLI**: click (alternatives: argparse, typer) - **REPL**: prompt_toolkit (alternatives: readline, custom) - **Terminal UI**: rich (alternatives: colorama, termcolor) - **Database**: SQLite (alternatives: PostgreSQL, DuckDB) - **Config**: pyyaml/tomli (alternatives: json, ini, toml) ### Future Web Stack Considerations - **Framework**: FastAPI, Flask, or Django - **ASGI Server**: uvicorn, hypercorn, or daphne - **Frontend**: Plain HTML/JS, React, or Vue.js ### Alternative Language Considerations If rewriting from Python: - **Rust**: Excellent performance, compiled binary, strong typing - Trade-offs: Steeper learning curve, longer compile times - **Go**: Simple concurrency, fast compilation, single binary - Trade-offs: Less rich ecosystem for ML/AI tooling - **TypeScript/Node**: Familiar for web developers, good async support - Trade-offs: Runtime performance, less mature for CLI tools ## Configuration ### Config File Structure ```yaml # API Configuration api: provider: "gemini" # openai, anthropic, custom base_url: "https://generativelanguage.googleapis.com/v1beta/" api_key: "${GEMINI_API_KEY}" model: "gemini-2.0-flash" timeout: 60 max_retries: 3 # Model Parameters model: temperature: 0.7 max_tokens: 4096 top_p: 1.0 system_prompt: "You are a helpful coding assistant..." system_prompts_path: "./prompts" # Conversation Settings conversation: max_history_messages: 50 save_to_disk: true storage_path: "~/.gemi/conversations" auto_save: true # File Context files: enable_indexing: true index_path: "~/.gemi/file_index.db" watched_directories: ["./"] ignore_patterns: [".git", "node_modules", "__pycache__"] max_file_size_mb: 5 # Logging logging: level: "INFO" console: true file: "~/.gemi/gemi.log" log_requests: false # Web Server web: host: "127.0.0.1" port: 8080 enable_auth: false ``` ### Environment Variables - `GEMI_API_KEY`: Override API key - `GEMI_CONFIG`: Custom config file path - `GEMI_LOG_LEVEL`: Override log level ## API Design *High-level interface contracts - implementation details flexible* ### Core Abstractions **Message Protocol** ``` Message: { role: string, content: string, metadata: dict } Conversation: { id: string, messages: List[Message], created: timestamp } ``` **API Client Interface** ``` send_message(conversation, config) -> Message stream_message(conversation, config) -> Iterator[string] list_models() -> List[ModelInfo] ``` **Storage Interface** ``` save_conversation(conversation) -> session_id load_conversation(session_id) -> Conversation list_sessions() -> List[SessionMetadata] delete_session(session_id) -> bool ``` **File Index Interface** ``` index_directory(path) -> int search_relevant(query, limit) -> List[FileInfo] get_file_content(path) -> string ``` ## Data Models ### Conversation Storage - Session metadata (ID, title, created, modified) - Message history (role, content, timestamp, tokens) - File attachments (path, hash, included_at) ### File Index - File metadata (path, size, modified, language) - Content hash for change detection - Optional: embedding vectors for similarity search ## Development Roadmap ### Phase 1: Core Functionality (Current) - [x] Basic API client implementation - [ ] **Fix conversation context persistence** - [ ] Session management (list, select, new) - [ ] Config file support - [ ] Basic CLI interface ### Phase 2: Enhanced Features - [ ] REPL interface with prompt_toolkit - [ ] File context integration - [ ] Conversation editing - [ ] Export/import functionality - [ ] Comprehensive logging ### Phase 3: Web Interface - [ ] FastAPI web server - [ ] WebSocket real-time chat - [ ] Session authentication - [ ] File upload handling - [ ] Responsive UI design ### Phase 4: Advanced Features - [ ] File indexing and search - [ ] Git integration - [ ] Multiple API provider support - [ ] Cost tracking and limits - [ ] Plugin system ### Phase 5: Polish & Testing - [ ] Comprehensive unit tests - [ ] Integration tests - [ ] Documentation (user guide, API reference) - [ ] Performance optimization - [ ] Security audit ## Testing Strategy ### Test Categories - **Unit Tests**: Individual function/class behavior - **Integration Tests**: Component interactions - **End-to-End Tests**: Full user workflows - **Performance Tests**: Throughput and latency benchmarks ### Testing Requirements - Mockable external dependencies (API calls, file system) - Deterministic test data - CI/CD integration capability - Code coverage reporting (target: >80%) ### Manual Testing - CLI usability testing - REPL interaction flows - Web UI responsiveness (future) - Cross-platform compatibility ## Implementation Notes ### Current Status (Python Implementation) - Basic API integration exists - Missing conversation context management (critical) - Need to refactor for modularity ### Design Principles 1. **Modularity**: Each component should be independently testable 2. **Configurability**: Behavior controlled via config, not code changes 3. **Extensibility**: Easy to add new API providers, storage backends 4. **Observability**: Comprehensive logging and error reporting 5. **User-Centric**: Intuitive defaults, helpful error messages ### Next Immediate Steps 1. Implement conversation persistence layer 2. Add session listing and selection 3. Create Config class with file loading 4. Separate concerns into modules (api, storage, cli) 5. Add basic error handling and logging ### Architectural Decisions to Make - [ ] Storage format for conversations (JSON, SQLite, both?) - [ ] Config file format (YAML, TOML, or support both?) - [ ] Streaming vs batch for file indexing - [ ] Caching strategy for API responses - [ ] Plugin architecture design (if needed) ### Future Considerations - Support for function calling / tool use - Image input support for vision models - Voice input/output integration - Collaborative features (shared sessions) - API rate limiting and queuing - Multi-provider fallback/routing - Local model support (Ollama, llama.cpp)