diff options
Diffstat (limited to 'spec.md')
| -rw-r--r-- | spec.md | 281 |
1 files changed, 281 insertions, 0 deletions
@@ -0,0 +1,281 @@ +# GEMI - Generative Model Interface + +## Overview + +GEMI is a flexible tool for interfacing with AI language models via external HTTP REST APIs (such as OpenAI, Anthropic, or compatible endpoints). It provides multiple interface options (CLI, REPL, Web UI) and advanced features like conversation management, file context integration, and intelligent document retrieval. + +### Use Cases +- Interactive AI-assisted coding and development +- Document analysis with local file context +- Persistent conversation sessions with history management +- Automated AI workflows via CLI scripting +- Web-based team collaboration with shared AI access + +## Architecture + +### Core Module +**Responsibilities:** +- HTTP API client for LLM communication +- Conversation context management +- Message formatting and transformation +- File context integration and caching +- Session persistence and retrieval + +### CLI Interface +**Responsibilities:** +- Command-line argument parsing +- Single-shot query execution +- Scriptable automation support +- Output formatting for pipes/redirects + +### REPL Interface +**Responsibilities:** +- Interactive multi-turn conversations +- Command history and editing +- Session switching and management +- Rich terminal output formatting + +### Log Module +**Responsibilities:** +- Structured logging (debug, info, warn, error) +- Multiple output targets (console, file, syslog) +- Request/response logging for debugging +- Performance metrics tracking + +### Config Module +**Responsibilities:** +- Configuration file loading (YAML/TOML/JSON) +- Environment variable overrides +- CLI argument precedence +- Validation and defaults + +### Web Server Module +**Responsibilities:** +- HTTP server for web UI +- WebSocket for real-time chat +- Session authentication +- Static file serving +- RESTful API endpoints + +## Features + +### Conversation Management +- **Persistent History**: Conversations saved to disk with configurable retention +- **Session Selection**: Resume previous conversations or start fresh +- **Context Editing**: Manually modify conversation history before sending +- **Multi-Session Support**: Manage multiple concurrent conversation threads +- **Export/Import**: Share conversations in portable formats + +### File Integration +- **Local Context**: Include local files in prompts +- **Metadata Store**: Index file information for relevance matching +- **Smart Suggestions**: AI-recommended files based on query content +- **Directory Watching**: Auto-update file index on changes (optional) +- **Git Integration**: Respect .gitignore patterns + +### Model Configuration +- **System Prompts**: Customizable with sensible defaults +- **Parameter Tuning**: Temperature, top-p, max tokens, etc. +- **Model Selection**: Easy switching between models/providers +- **Token Counting**: Preview token usage before sending +- **Cost Estimation**: Track API usage costs + +### Quality of Life +- **Syntax Highlighting**: Code blocks in terminal/web +- **Copy Controls**: Easy copying of AI responses +- **Keyboard Shortcuts**: Efficient navigation and actions +- **Diff View**: See changes in code suggestions +- **Search History**: Full-text search across conversations + +## Tech Stack + +### Language & Runtime +- **Python 3.10+**: Core implementation language +- **asyncio**: Async HTTP and concurrent operations + +### HTTP & API +- **httpx**: Modern async HTTP client +- **pydantic**: Data validation and settings management + +### CLI & REPL +- **click**: CLI framework +- **prompt_toolkit**: Rich REPL with history/completion +- **rich**: Terminal formatting and syntax highlighting + +### Web Server +- **FastAPI**: Modern async web framework +- **uvicorn**: ASGI server +- **websockets**: Real-time communication +- **Jinja2**: HTML templating + +### Data Storage +- **SQLite**: Conversation history and metadata +- **sqlite-vec** (optional): Vector similarity search for files + +### Configuration +- **python-dotenv**: Environment variable loading +- **pyyaml** or **tomli**: Config file parsing + +## Configuration + +### Config File Structure +```yaml +# API Configuration +api: + provider: "openai" # openai, anthropic, custom + base_url: "https://api.openai.com/v1" + api_key: "${OPENAI_API_KEY}" + model: "gpt-4-turbo-preview" + timeout: 60 + max_retries: 3 + +# Model Parameters +model: + temperature: 0.7 + max_tokens: 4096 + top_p: 1.0 + system_prompt: "You are a helpful coding assistant..." + +# Conversation Settings +conversation: + max_history_messages: 50 + save_to_disk: true + storage_path: "~/.gemi/conversations" + auto_save: true + +# File Context +files: + enable_indexing: true + index_path: "~/.gemi/file_index.db" + watched_directories: ["./"] + ignore_patterns: [".git", "node_modules", "__pycache__"] + max_file_size_mb: 5 + +# Logging +logging: + level: "INFO" + console: true + file: "~/.gemi/gemi.log" + log_requests: false + +# Web Server +web: + host: "127.0.0.1" + port: 8080 + enable_auth: false +``` + +### Environment Variables +- `GEMI_API_KEY`: Override API key +- `GEMI_CONFIG`: Custom config file path +- `GEMI_LOG_LEVEL`: Override log level + +## API Design + +### Core Classes +```python +# Message types +Message(role: str, content: str, metadata: dict) +Conversation(id: str, messages: List[Message], created_at: datetime) + +# API client +LLMClient(config: Config) + - send_message(conversation: Conversation) -> Message + - stream_message(conversation: Conversation) -> AsyncIterator[str] + +# File indexer +FileIndex(db_path: Path) + - index_directory(path: Path) -> int + - search_relevant(query: str, limit: int) -> List[FileInfo] + - get_file_content(path: Path) -> str +``` + +## Data Models + +### Conversation Storage +- Session metadata (ID, title, created, modified) +- Message history (role, content, timestamp, tokens) +- File attachments (path, hash, included_at) + +### File Index +- File metadata (path, size, modified, language) +- Content hash for change detection +- Optional: embedding vectors for similarity search + +## Development Roadmap + +### Phase 1: Core Functionality (Current) +- [x] Basic API client implementation +- [ ] **Fix conversation context persistence** +- [ ] Session management (list, select, new) +- [ ] Config file support +- [ ] Basic CLI interface + +### Phase 2: Enhanced Features +- [ ] REPL interface with prompt_toolkit +- [ ] File context integration +- [ ] Conversation editing +- [ ] Export/import functionality +- [ ] Comprehensive logging + +### Phase 3: Web Interface +- [ ] FastAPI web server +- [ ] WebSocket real-time chat +- [ ] Session authentication +- [ ] File upload handling +- [ ] Responsive UI design + +### Phase 4: Advanced Features +- [ ] File indexing and search +- [ ] Git integration +- [ ] Multiple API provider support +- [ ] Cost tracking and limits +- [ ] Plugin system + +### Phase 5: Polish & Testing +- [ ] Comprehensive unit tests +- [ ] Integration tests +- [ ] Documentation (user guide, API reference) +- [ ] Performance optimization +- [ ] Security audit + +## Testing Strategy + +### Unit Tests +- Config loading and validation +- Message formatting +- File indexing logic +- API response parsing + +### Integration Tests +- End-to-end API calls (with mocking) +- File system operations +- Database operations +- Web server endpoints + +### Manual Testing +- CLI usability testing +- REPL interaction flows +- Web UI responsiveness +- Cross-platform compatibility + +## Implementation Notes + +### Current Status (Python Implementation) +- Basic API integration exists +- Missing conversation context management (critical) +- Need to refactor for modularity + +### Next Immediate Steps +1. Implement conversation persistence layer +2. Add session listing and selection +3. Create Config class with file loading +4. Separate concerns into modules (api, storage, cli) +5. Add basic error handling and logging + +### Future Considerations +- Support for function calling / tool use +- Image input support for vision models +- Voice input/output integration +- Collaborative features (shared sessions) +- API rate limiting and queuing
\ No newline at end of file |
