Files
keep/DESIGN.md
Andrew Phillips 5bad7ac7a6 refactor: decouple meta plugins from DB via SaveMetaFn callback, extract shared utilities
- Add SaveMetaFn callback pattern: meta plugins receive a closure instead of
  &Connection, enabling the same plugin code to work in local, client, and
  server contexts (collect-to-Vec, collect-to-HashMap, or direct DB write)
- Client save now runs meta plugins locally during streaming (smart client
  sets meta=false, server skips its own plugins)
- Add POST /api/item/{id}/update endpoint for re-running plugins on stored
  content without downloading compressed data
- Add client update mode (--update with --meta-plugin flags)
- Extract shared utilities: stream_copy, print_serialized, build_path_table,
  ensure_default_tag to reduce duplication across modes
- Add upsert_tag for idempotent tag addition (INSERT OR IGNORE)
- Add warn logging on save_meta lock failure in BaseMetaPlugin and MetaService
2026-03-14 22:36:59 -03:00

12 KiB

PROJECT RULES - KEEP FIRST

Standard Rules

  1. ALWAYS keep DESIGN.md updated with any architectural or design changes
  2. ALWAYS keep project rules first in this document
  3. ALWAYS use git commands to remove or move files (git rm, git mv, etc.)
  4. Follow Rust naming conventions and idioms
  5. Use anyhow for error handling throughout the codebase
  6. Maintain comprehensive logging with the log crate
  7. Write unit tests for critical functionality
  8. Document public APIs with rustdoc comments
  9. Keep modules focused on single responsibilities
  10. Prefer composition over inheritance
  11. Handle errors gracefully and provide meaningful error messages
  12. Ensure code is safe and avoids unsafe blocks where possible

Code - Modules

Main Module

  • main.rs - Entry point, CLI argument parsing, mode dispatching
  • Interacts with all mode modules based on user input
  • Handles database connection setup and data directory management

Mode Modules

  • modes/save.rs - Save new items with tags/metadata
  • modes/get.rs - Retrieve items by ID/tags
  • modes/list.rs - List items with filtering and formatting
  • modes/delete.rs - Delete items by ID
  • modes/update.rs - Update item tags/metadata
  • modes/info.rs - Show detailed item information
  • modes/diff.rs - Compare two items
  • modes/status.rs - Show system status and capabilities
  • modes/server.rs - REST HTTP/HTTPS server mode with OpenAPI documentation
  • modes/client.rs - Client mode for remote server (streaming save, local decompression)
  • modes/common.rs - Shared utilities for all modes (OutputFormat, table creation, print_serialized, build_path_table, ensure_default_tag, render_item_info_table, render_list_table_with_format)

Database Module

  • db.rs - SQLite database operations
  • Handles items, tags, and metadata storage
  • Provides query functions for all modes
  • Manages database migrations

Compression Engine Module

  • compression_engine.rs - Trait and type definitions
  • compression_engine/gzip.rs - GZip implementation
  • compression_engine/lz4.rs - LZ4 implementation
  • compression_engine/none.rs - No compression implementation
  • compression_engine/program.rs - External program wrapper

Meta Plugin Module

  • meta_plugin.rs - Trait and type definitions, SaveMetaFn callback type
  • meta_plugin/program.rs - External program wrapper
  • meta_plugin/digest.rs - Internal digest implementations
  • meta_plugin/system.rs - System information metadata plugins

SaveMetaFn Architecture: Meta plugins are decoupled from direct DB access via a SaveMetaFn callback (Arc<Mutex<dyn FnMut(&str, &str) + Send>>). The callback is injected at MetaService construction and propagated to all plugins via BaseMetaPlugin. This enables:

  • Local mode: Callback collects metadata into a Vec, written to DB after plugins finish
  • Client mode: Callback collects into a HashMap, sent to server after streaming completes
  • Server mode: Callback collects into a Vec, written to DB after plugins finish (same as local)

Common Modules

  • common/is_binary.rs - Binary file detection utilities
  • common/status.rs - Status information generation
  • common/mod.rs - PIPESIZE constant (8192), stream_copy() streaming utility

Client Module

  • client.rs - HTTP client wrapper (ureq-based, supports streaming POST)
  • modes/client/save.rs - 3-thread streaming save with local meta plugins (stdin → tee → compress → meta plugins → pipe → HTTP POST)
  • modes/client/get.rs - Get with server-side raw fetch + local decompression
  • modes/client/list.rs - List delegation to server
  • modes/client/info.rs - Info delegation to server
  • modes/client/delete.rs - Delete delegation to server
  • modes/client/diff.rs - Diff delegation to server
  • modes/client/status.rs - Status delegation to server
  • modes/client/update.rs - Update delegation to server (sends plugin names/metadata/tags)

Utility Modules

  • plugins.rs - Shared plugin utilities
  • args.rs - CLI argument definitions

Command Line Interface

Modes

  • Save mode: keep [--save] (default when no mode specified and no IDs provided)
  • Get mode: keep [--get] <ID|tag...> (default when IDs provided)
  • List mode: keep [--list] [tag...]
  • Info mode: keep [--info] <ID|tag...>
  • Delete mode: keep [--delete] <ID...>
  • Update mode: keep [--update] <ID> [tag...]
  • Diff mode: keep [--diff] <ID1> <ID2>
  • Status mode: keep [--status]
  • Server mode: keep [--server] <address:port>

Item Options

  • --meta KEY[=VALUE] - Set metadata for the item, remove if VALUE not provided
  • --digest <sha256|md5> - Digest algorithm to use when saving items
  • --compression <lz4|gzip|bzip2|xz|zstd|none> - Compression algorithm to use when saving items
  • --meta-plugins <plugin[,plugin...]> - Meta plugins to use when saving items

General Options

  • --dir <PATH> - Specify the directory to use for storage
  • --list-format <FORMAT> - A comma separated list of columns to display with --list
  • --human-readable - Display file sizes with units
  • --verbose - Increase message verbosity
  • --quiet - Do not show any messages
  • --output-format <table|json|yaml> - Output format for info, status, and list modes
  • --server-password <PASSWORD> - Password for server authentication
  • --server-cert <PATH> - TLS certificate file (PEM) for HTTPS server
  • --server-key <PATH> - TLS private key file (PEM) for HTTPS server
  • --force - Force output even when binary data would be sent to a TTY

Client Options (requires client feature)

  • --client-url <URL> - Remote keep server URL
  • --client-password <PASSWORD> - Remote server password

Data Storage

Database Schema

  • items table: id (primary key), ts (timestamp), size (optional), compression
  • tags table: id (foreign key to items), name (tag name)
  • metas table: id (foreign key to items), name (meta key), value (meta value)
  • Indexes on tag names and meta names for faster queries

File Storage

  • Data directory contains compressed item files named by their item ID
  • Database file stored in data directory
  • File permissions set to be private to user (umask 077)

REST API Endpoints

Status Operations

  • GET /api/status - Get system status information
  • GET /api/plugins/status - Get plugin status information

Item Operations

  • GET /api/item/ - Get a list of items as JSON. Optional params: order=newest|oldest, start=0, count=100, tags=tag1,tag2
  • POST /api/item/ - Add a new item (body: raw content, streamed through fixed-size 8192-byte buffers). Query params: tags, metadata (JSON), compress=true|false, meta=true|false
  • POST /api/item/<#>/meta - Add metadata to an existing item (body: JSON object)
  • POST /api/item/<#>/update - Re-run meta plugins on stored content. Query params: plugins (comma-separated), metadata (JSON), tags (comma-separated, idempotent)
  • DELETE /api/item/<#> - Delete an item
  • GET /api/item/latest - Return the latest item as JSON. Optional params: tags=tag1,tag2, allow_binary=true|false
  • GET /api/item/latest/meta - Return the latest item metadata as JSON. Optional params: tags=tag1,tag2
  • GET /api/item/latest/content - Return the raw content of the latest item (streamed). Optional params: tags=tag1,tag2, decompress=true|false
  • GET /api/item/<#> - Return the item as JSON. Optional params: allow_binary=true|false
  • GET /api/item/<#>/meta - Return the item metadata as JSON
  • GET /api/item/<#>/content - Return the raw content of the item (streamed). Optional params: decompress=true|false
  • GET /api/diff - Diff two items. Params: id_a, id_b (individual items capped at 10 MB)

Server Configuration

  • max_body_size - Maximum POST body size in bytes (default: unlimited). When exceeded, server returns 413 PAYLOAD_TOO_LARGE while keeping the partial item already saved through the streaming pipeline. Set to 0 for unlimited.

Server Modes

  • Plain HTTP (default): tokio::net::TcpListener + axum::serve()
  • HTTPS (with tls feature): axum_server::bind_rustls() with rustls when --server-cert and --server-key are provided
  • Conditional selection at startup: cert+key present → HTTPS, otherwise → HTTP

Client/Server Protocol

  • Smart clients (keep CLI) set compress=false and meta=false on POST, handling compression and meta plugins locally
  • Dumb clients (curl) use defaults (compress=true, meta=true), server handles everything
  • Smart client update: sends plugins param to server, server runs plugins on stored content (avoids downloading compressed data)
  • GET responses include X-Keep-Compression header when decompress=false
  • Streaming save uses chunked transfer encoding for constant memory usage
  • Universal streaming: All server paths (POST, GET, diff) use PIPESIZE (8192) byte buffers
  • 413 partial item: When max_body_size is exceeded, the server returns 413 but keeps the partial item already saved through the pipeline (nonfatal design — pipes continue normally)

Authentication

  • Bearer token authentication: Authorization: Bearer <password>
  • Basic authentication: Authorization: Basic base64(keep:<password>)
  • When no password is set, authentication is disabled

Supported Compression Types

  • LZ4 (internal implementation)
  • GZip (internal implementation)
  • BZip2 (external program)
  • XZ (external program)
  • ZStd (external program)
  • None (no compression)

Supported Meta Plugins

  • FileMagic - File type detection using file command
  • FileMime - MIME type detection using file command
  • FileEncoding - File encoding detection using file command
  • LineCount - Line count using wc command
  • WordCount - Word count using wc command
  • Cwd - Current working directory
  • Binary - Binary file detection
  • Uid - Current user ID
  • User - Current username
  • Gid - Current group ID
  • Group - Current group name
  • Shell - Shell path from SHELL environment variable
  • ShellPid - Shell process ID from PPID environment variable
  • KeepPid - Keep process ID
  • DigestSha256 - SHA-256 digest
  • DigestMd5 - MD5 digest using md5sum command
  • ReadTime - Time taken to read data
  • ReadRate - Rate of data reading
  • Hostname - System hostname
  • FullHostname - Fully qualified domain name

Testing Strategy

  • Unit tests for each module in src/tests/
  • Integration tests for modes
  • Database tests for CRUD operations
  • Compression engine tests for each supported format
  • Meta plugin tests for each plugin type
  • Server tests for API endpoints and authentication
  • Common utilities tests for helper functions

Binary Data Handling

  • Automatic binary detection using file signatures and heuristics
  • Prevents binary data output to TTY unless --force is used
  • Binary meta plugin analyzes content to determine if it's binary
  • API endpoints respect binary flags to prevent accidental binary transmission

Security Considerations

  • File permissions are restricted to user only (umask 077)
  • Input validation for item IDs to prevent path traversal
  • Authentication for server mode with bearer or basic auth
  • TLS/HTTPS support via rustls when certificate and key are provided
  • Proper resource cleanup using RAII patterns
  • Safe handling of external processes with proper stdin/stdout management
  • Streaming architecture: All server I/O uses fixed-size 8192-byte buffers; no full file contents held in memory
  • XSS protection: All user-controlled data in HTML pages is escaped via html-escape
  • Security headers: X-Content-Type-Options: nosniff, X-Frame-Options: DENY, Referrer-Policy: strict-origin-when-cross-origin
  • CORS: Explicit allowed headers only (Content-Type, Authorization, Accept); no wildcard headers
  • Input limits: Tags (256 chars), metadata keys (128 chars), metadata values (4096 chars), pagination (10,000 max)
  • Config file size: 4 KB cap with from_utf8_lossy for safe UTF-8 handling
  • Error sanitization: Internal errors never exposed in HTML responses
  • No unsafe_code: Enforced via #![deny(unsafe_code)] (exceptions: libc::umask in main.rs, unsafe impl Send for SendCookie in magic_file.rs)

Feature Flags

  • server - HTTP REST API server (axum-based)
  • tls - HTTPS/TLS support for server (axum-server + rustls)
  • client - HTTP client for remote server (ureq-based, includes streaming save)
  • swagger - OpenAPI/Swagger UI documentation
  • magic - File type detection via libmagic
  • lz4 - LZ4 compression (internal)
  • gzip - GZip compression (internal)
  • bzip2 - BZip2 compression (external)
  • xz - XZ compression (external)
  • zstd - ZStd compression (external)