diff --git a/PLAN.md b/PLAN.md index 79ec8eb..64c6b0d 100644 --- a/PLAN.md +++ b/PLAN.md @@ -1,31 +1,37 @@ - # Refactoring Plan to Reduce Code Duplication ## 1. Create Core Service Layer **Files:** `src/core/` (new directory) -- `src/core/item_service.rs` - Main service for item operations +- `src/core/item_service.rs` - Main service for item operations (synchronous) +- `src/core/async_item_service.rs` - Async wrapper for API use - `src/core/compression_service.rs` - Compression handling - `src/core/meta_service.rs` - Metadata processing - `src/core/mod.rs` - Module exports +- `src/core/error.rs` - Unified error handling **Reason:** Extract common business logic from modes and APIs into reusable services -**Implementation:** +**Implementation:** - Move logic from modes (get, save, list, info) and API handlers into service functions - Services should return structured data, not format output - Handle compression, metadata, and database operations +- Keep core services **synchronous** for CLI performance +- Provide async wrappers using `tokio::task::spawn_blocking` for API use +- Use streaming for pipeline efficiency (process data in chunks) ## 2. Create Common Data Structures **Files:** `src/core/types.rs` **Reason:** Standardize data structures used across modes and APIs **Implementation:** -- Define structs for Item, ItemContent, ItemWithMeta, etc. -- Include conversion functions from database types -- Add serialization/deserialization support +- Define structs for `Item`, `ItemWithContent`, `ItemWithMeta`, `Response` +- Include conversion functions from database types (`From`) +- Add serialization/deserialization support for JSON/YAML +- Ensure all fields are properly documented +- Use zero-copy patterns where possible (slicing instead of copying) ## 3. Refactor CLI Modes to Use Services -**Files:** +**Files:** - `src/modes/get.rs` -- `src/modes/save.rs` +- `src/modes/save.rs` - `src/modes/list.rs` - `src/modes/info.rs` - `src/modes/delete.rs` @@ -37,61 +43,119 @@ - Replace current implementations with calls to core services - Keep only CLI-specific formatting and output logic - Handle command-line argument parsing and validation +- Use synchronous services directly +- Implement streaming for stdin/stdout to maintain pipeline performance -## 4. Refactor REST API to Use Services +## 4. Refactor REST API to Use Async Services **Files:** - `src/modes/server/api/item.rs` - `src/modes/server/api/status.rs` **Reason:** Remove business logic from HTTP handlers **Implementation:** -- Convert handlers to call core services +- Convert handlers to call async core services - Keep only HTTP-specific code (status codes, headers, etc.) -- Use common error handling +- Use common error handling with conversions to HTTP responses +- Wrap synchronous service calls in `tokio::task::spawn_blocking` ## 5. Refactor MCP Tools to Use Services -**Files:** -- `src/modes/server/mcp/tools.rs` +**Files:** `src/modes/server/mcp/tools.rs` **Reason:** Remove duplication with REST API and CLI modes **Implementation:** - Replace current implementation with calls to core services - Keep only MCP protocol-specific logic +- Use synchronous services directly (MCP is typically local/short-lived) +- Standardize response format to match API/CLI ## 6. Create Common Error Handling **Files:** `src/core/error.rs` **Reason:** Standardize error types across the application **Implementation:** -- Define comprehensive error enum with conversions -- Implement proper error formatting for different outputs (CLI, JSON, HTTP) +- Define comprehensive error enum with conversions: + - From database errors + - From I/O errors + - From compression errors + - From validation errors +- Implement conversions to: + - `anyhow::Error` (for CLI) + - `axum::http::StatusCode` (for API) + - `ToolError` (for MCP) +- Provide user-friendly error messages +- Include error codes for programmatic handling ## 7. Update Database Layer for Batch Operations **Files:** `src/db.rs` **Reason:** Support efficient batch operations needed by services **Implementation:** - Add functions to get multiple items with their metadata and tags +- Add batch insertion/updates for tags and metadata +- Add transaction support for atomic operations - Optimize queries for common access patterns +- Ensure all batch operations are properly documented ## 8. Add Integration Tests **Files:** `tests/integration/` (new directory) -**Reason:** Ensure refactored code maintains functionality +**Reason:** Ensure refactored code maintains functionality and performance **Implementation:** - Test core services independently - Test CLI modes and APIs through their public interfaces - Verify compression, metadata, and database operations +- Include performance benchmarks for critical paths +- Use in-memory databases and tempfiles for isolation +- Test both sync and async service implementations + +## 9. Performance Optimization Guidelines +**Reason:** Ensure the refactored version doesn't slow down pipelines +**Implementation:** +- Use streaming for stdin/stdout processing (chunked I/O) +- Minimize buffering and memory copies +- Offload CPU-bound work (compression, plugins) to thread pools +- Provide fast-path options (e.g., `--fast` flag to skip metadata plugins) +- Benchmark critical operations before/after refactoring +- Document performance characteristics ## Implementation Order: -1. Create core module structure and error types -2. Implement core services with basic functionality -3. Refactor one mode (e.g., get) to use services -4. Refactor corresponding API endpoints -5. Repeat for other modes and APIs -6. Add comprehensive tests -7. Clean up removed code from original files +1. Create core module structure and error types (`core/error.rs`, `core/types.rs`) +2. Implement core services with basic functionality (sync first) +3. Add async wrappers for API use +4. Refactor one mode (e.g., `get`) to use services and validate approach +5. Refactor corresponding API endpoints to use async services +6. Repeat for other modes and APIs +7. Refactor MCP tools to use services +8. Add comprehensive tests and benchmarks +9. Clean up removed code from original files +10. Document performance characteristics and tradeoffs ## Benefits: -- Reduced code duplication -- Easier maintenance -- Consistent behavior across interfaces -- Better testability -- Clear separation of concerns +- Reduced code duplication between CLI, API, and MCP +- Easier maintenance with clear separation of concerns +- Consistent behavior across all interfaces +- Better testability with isolated service layer +- Maintained or improved pipeline performance +- Flexible architecture supporting both sync and async use cases + +## Key Risks and Mitigations: +1. **Performance Regression:** + - Risk: Refactoring could slow down pipeline operations + - Mitigation: Benchmark before/after, use streaming, minimize overhead + +2. **Increased Complexity:** + - Risk: Adding service layer could make code harder to understand + - Mitigation: Clear documentation, gradual refactoring, maintain simple interfaces + +3. **Async/Sync Boundaries:** + - Risk: Mixing sync/async could lead to deadlocks or inefficiencies + - Mitigation: Clear boundaries, use `spawn_blocking` for sync work in async context + +4. **Breaking Changes:** + - Risk: Refactoring could change behavior in subtle ways + - Mitigation: Comprehensive tests, gradual rollout, maintain backward compatibility + +## Design Principles: +1. **Zero-Copy Where Possible:** Use slicing instead of copying data +2. **Streaming Processing:** Handle data in chunks for memory efficiency +3. **Clear Boundaries:** Separate core logic from interface-specific code +4. **Performance First:** Optimize for common pipeline use cases +5. **Consistent Errors:** Unified error handling across all interfaces +6. **Backward Compatibility:** Maintain existing CLI/API behavior