spec.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344

# GEMI - Generative Model Interface

## Overview

GEMI is a flexible tool for interfacing with AI language models via external HTTP REST APIs (such as OpenAI, Anthropic, or compatible endpoints). It provides multiple interface options (CLI, REPL, Web UI) and advanced features like conversation management, file context integration, and intelligent document retrieval.

## Architecture

### Core Module
**Responsibilities:**
- HTTP API client for communication with external model service (e.g. OpenAI, Anthropic, Gemini, Ollama)
- Conversation context management
- Message formatting and transformation (e.g. an adapter to the varying protocols of each of the models' API specs)
- File context integration and caching (configurable triggers for when the model should look at file metadata and when it should read file contents)
- Session persistence and retrieval

### CLI Interface
**Responsibilities:**
- Command-line argument parsing
- Single-shot query execution
- Scriptable automation support
- Output formatting for pipes/redirects

### REPL Interface
**Responsibilities:**
- Interactive multi-turn conversations
- Command history and editing
- Session switching and management
- Rich terminal output formatting

### Log Module
**Responsibilities:**
- Structured logging (debug, info, warn, error)
- Multiple output targets (console, file, syslog)
- Request/response logging for debugging
- Performance metrics tracking

### Config Module
**Responsibilities:**
- Configuration file loading (YAML/TOML/JSON)
- Environment variable overrides
- CLI argument precedence
- Validation and defaults

### Web Server Module
**Responsibilities:**
- HTTP server for web UI
- WebSocket for real-time chat
- Session authentication
- Static file serving
- RESTful API endpoints

## Features

### Conversation Management
- **Persistent History**: Conversations saved to disk with configurable retention
- **Session Selection**: Resume previous conversations or start fresh
- **Context Editing**: Manually modify conversation history before sending
- **Multi-Session Support**: Manage multiple concurrent conversation threads
- **Export/Import**: Share conversations in portable formats

### File Integration
- **Local Context**: Include local files in prompts
- **Metadata Store**: Index file information for relevance matching
- **Smart Suggestions**: AI-recommended files based on query content
- **Directory Watching**: Auto-update file index on changes (optional)
- **Git Integration**: Respect .gitignore patterns

### Model Configuration
- **System Prompts**: Customizable with sensible defaults
- **Parameter Tuning**: Temperature, top-p, max tokens, etc.
- **Model Selection**: Easy switching between models/providers
- **Token Counting**: Preview token usage before sending
- **Cost Estimation**: Track API usage costs

### Quality of Life
- **Syntax Highlighting**: Code blocks in terminal/web
- **Copy Controls**: Easy copying of AI responses
- **Keyboard Shortcuts**: Efficient navigation and actions
- **Diff View**: See changes in code suggestions
- **Search History**: Full-text search across conversations

## Technical Requirements

### Language & Runtime
- **Language**: Python 3.10+ (current implementation)
- **Async Support**: Required for efficient HTTP operations and concurrent tasks
- **Cross-platform**: Must work on Linux, macOS, Windows

### HTTP Client Requirements
- Async/await support
- Connection pooling
- Timeout and retry mechanisms
- Streaming response support
- Custom header support

### CLI Requirements
- Argument parsing with subcommands
- Help text generation
- Input/output stream handling
- Exit code standards

### REPL Requirements
- Multi-line input support
- Command history persistence
- Auto-completion
- Syntax highlighting for code blocks
- Configurable key bindings

### Storage Requirements
- Relational database for structured data (conversations, sessions)
- File-based storage for large content
- Transaction support
- Schema migration capability
- Optional: Vector similarity search

### Web Server Requirements (Future)
- Async request handling
- WebSocket support for streaming
- Static file serving
- Request validation
- Session management

## Technology Considerations

*Note: These are suggested technologies based on the current Python implementation. Alternative choices are acceptable if they meet the technical requirements above.*

### Current Implementation Stack
- **HTTP**: httpx (alternatives: aiohttp, requests)
- **Validation**: pydantic (alternatives: marshmallow, attrs)
- **CLI**: click (alternatives: argparse, typer)
- **REPL**: prompt_toolkit (alternatives: readline, custom)
- **Terminal UI**: rich (alternatives: colorama, termcolor)
- **Database**: SQLite (alternatives: PostgreSQL, DuckDB)
- **Config**: pyyaml/tomli (alternatives: json, ini, toml)

### Future Web Stack Considerations
- **Framework**: FastAPI, Flask, or Django
- **ASGI Server**: uvicorn, hypercorn, or daphne
- **Frontend**: Plain HTML/JS, React, or Vue.js

### Alternative Language Considerations
If rewriting from Python:
- **Rust**: Excellent performance, compiled binary, strong typing
  - Trade-offs: Steeper learning curve, longer compile times
- **Go**: Simple concurrency, fast compilation, single binary
  - Trade-offs: Less rich ecosystem for ML/AI tooling
- **TypeScript/Node**: Familiar for web developers, good async support
  - Trade-offs: Runtime performance, less mature for CLI tools

## Configuration

### Config File Structure
```yaml
# API Configuration
api:
  provider: "gemini"  # openai, anthropic, custom
  base_url: "https://generativelanguage.googleapis.com/v1beta/"
  api_key: "${GEMINI_API_KEY}"
  model: "gemini-2.0-flash"
  timeout: 60
  max_retries: 3

# Model Parameters
model:
  temperature: 0.7
  max_tokens: 4096
  top_p: 1.0
  system_prompt: "You are a helpful coding assistant..."
  system_prompts_path: "./prompts"

# Conversation Settings
conversation:
  max_history_messages: 50
  save_to_disk: true
  storage_path: "~/.gemi/conversations"
  auto_save: true

# File Context
files:
  enable_indexing: true
  index_path: "~/.gemi/file_index.db"
  watched_directories: ["./"]
  ignore_patterns: [".git", "node_modules", "__pycache__"]
  max_file_size_mb: 5

# Logging
logging:
  level: "INFO"
  console: true
  file: "~/.gemi/gemi.log"
  log_requests: false

# Web Server
web:
  host: "127.0.0.1"
  port: 8080
  enable_auth: false
```

### Environment Variables
- `GEMI_API_KEY`: Override API key
- `GEMI_CONFIG`: Custom config file path
- `GEMI_LOG_LEVEL`: Override log level

## API Design

*High-level interface contracts - implementation details flexible*

### Core Abstractions

**Message Protocol**
```
Message: { role: string, content: string, metadata: dict }
Conversation: { id: string, messages: List[Message], created: timestamp }
```

**API Client Interface**
```
send_message(conversation, config) -> Message
stream_message(conversation, config) -> Iterator[string]
list_models() -> List[ModelInfo]
```

**Storage Interface**
```
save_conversation(conversation) -> session_id
load_conversation(session_id) -> Conversation
list_sessions() -> List[SessionMetadata]
delete_session(session_id) -> bool
```

**File Index Interface**
```
index_directory(path) -> int
search_relevant(query, limit) -> List[FileInfo]
get_file_content(path) -> string
```

## Data Models

### Conversation Storage
- Session metadata (ID, title, created, modified)
- Message history (role, content, timestamp, tokens)
- File attachments (path, hash, included_at)

### File Index
- File metadata (path, size, modified, language)
- Content hash for change detection
- Optional: embedding vectors for similarity search

## Development Roadmap

### Phase 1: Core Functionality (Current)
- [x] Basic API client implementation
- [ ] **Fix conversation context persistence**
- [ ] Session management (list, select, new)
- [ ] Config file support
- [ ] Basic CLI interface

### Phase 2: Enhanced Features
- [ ] REPL interface with prompt_toolkit
- [ ] File context integration
- [ ] Conversation editing
- [ ] Export/import functionality
- [ ] Comprehensive logging

### Phase 3: Web Interface
- [ ] FastAPI web server
- [ ] WebSocket real-time chat
- [ ] Session authentication
- [ ] File upload handling
- [ ] Responsive UI design

### Phase 4: Advanced Features
- [ ] File indexing and search
- [ ] Git integration
- [ ] Multiple API provider support
- [ ] Cost tracking and limits
- [ ] Plugin system

### Phase 5: Polish & Testing
- [ ] Comprehensive unit tests
- [ ] Integration tests
- [ ] Documentation (user guide, API reference)
- [ ] Performance optimization
- [ ] Security audit

## Testing Strategy

### Test Categories
- **Unit Tests**: Individual function/class behavior
- **Integration Tests**: Component interactions
- **End-to-End Tests**: Full user workflows
- **Performance Tests**: Throughput and latency benchmarks

### Testing Requirements
- Mockable external dependencies (API calls, file system)
- Deterministic test data
- CI/CD integration capability
- Code coverage reporting (target: >80%)

### Manual Testing
- CLI usability testing
- REPL interaction flows
- Web UI responsiveness (future)
- Cross-platform compatibility

## Implementation Notes

### Current Status (Python Implementation)
- Basic API integration exists
- Missing conversation context management (critical)
- Need to refactor for modularity

### Design Principles
1. **Modularity**: Each component should be independently testable
2. **Configurability**: Behavior controlled via config, not code changes
3. **Extensibility**: Easy to add new API providers, storage backends
4. **Observability**: Comprehensive logging and error reporting
5. **User-Centric**: Intuitive defaults, helpful error messages

### Next Immediate Steps
1. Implement conversation persistence layer
2. Add session listing and selection
3. Create Config class with file loading
4. Separate concerns into modules (api, storage, cli)
5. Add basic error handling and logging

### Architectural Decisions to Make
- [ ] Storage format for conversations (JSON, SQLite, both?)
- [ ] Config file format (YAML, TOML, or support both?)
- [ ] Streaming vs batch for file indexing
- [ ] Caching strategy for API responses
- [ ] Plugin architecture design (if needed)

### Future Considerations
- Support for function calling / tool use
- Image input support for vision models
- Voice input/output integration
- Collaborative features (shared sessions)
- API rate limiting and queuing
- Multi-provider fallback/routing
- Local model support (Ollama, llama.cpp)