Files
keep/DESIGN.md
Andrew Phillips b3edfe7de6 chore: code review cleanup — fixes, deps, docs
Fixed:
- CLI help typo: "metatdata" -> "metadata"
- Filter buffer OOM: check size before loading into memory

Changed:
- #[inline] on HTML escape helpers for hot path performance
- Replaced once_cell and lazy_static with std::sync::LazyLock
- Removed unused once_cell and lazy_static crate dependencies

Refactored:
- Added module-level doc to services/ module

Documentation:
- README.md: zstd is native not external, "none" -> "raw"
- DESIGN.md: current schema and meta plugins section
- CHANGELOG.md: Unreleased section populated
2026-03-21 11:44:37 -03:00

13 KiB

PROJECT RULES - KEEP FIRST

Standard Rules

  1. ALWAYS keep DESIGN.md updated with any architectural or design changes
  2. ALWAYS keep project rules first in this document
  3. ALWAYS use git commands to remove or move files (git rm, git mv, etc.)
  4. Follow Rust naming conventions and idioms
  5. Use anyhow for error handling throughout the codebase
  6. Maintain comprehensive logging with the log crate
  7. Write unit tests for critical functionality
  8. Document public APIs with rustdoc comments
  9. Keep modules focused on single responsibilities
  10. Prefer composition over inheritance
  11. Handle errors gracefully and provide meaningful error messages
  12. Ensure code is safe and avoids unsafe blocks where possible

Code - Modules

Main Module

  • main.rs - Entry point, CLI argument parsing, mode dispatching
  • Interacts with all mode modules based on user input
  • Handles database connection setup and data directory management

Mode Modules

  • modes/save.rs - Save new items with tags/metadata
  • modes/get.rs - Retrieve items by ID/tags
  • modes/list.rs - List items with filtering and formatting
  • modes/delete.rs - Delete items by ID
  • modes/update.rs - Update item tags/metadata
  • modes/info.rs - Show detailed item information
  • modes/diff.rs - Compare two items
  • modes/status.rs - Show system status and capabilities
  • modes/server.rs - REST HTTP/HTTPS server mode with OpenAPI documentation
  • modes/client.rs - Client mode for remote server (streaming save, local decompression)
  • modes/common.rs - Shared utilities for all modes (OutputFormat, table creation, print_serialized, build_path_table, ensure_default_tag, render_item_info_table, render_list_table_with_format)

Database Module

  • db.rs - SQLite database operations
  • Handles items, tags, and metadata storage
  • Provides query functions for all modes
  • Manages database migrations

Compression Engine Module

  • compression_engine.rs - Trait and type definitions
  • compression_engine/gzip.rs - GZip implementation
  • compression_engine/lz4.rs - LZ4 implementation
  • compression_engine/none.rs - No compression implementation
  • compression_engine/program.rs - External program wrapper

Meta Plugin Module

  • meta_plugin.rs - Trait and type definitions, SaveMetaFn callback type
  • meta_plugin/program.rs - External program wrapper
  • meta_plugin/digest.rs - Internal digest implementations
  • meta_plugin/system.rs - System information metadata plugins

SaveMetaFn Architecture: Meta plugins are decoupled from direct DB access via a SaveMetaFn callback (Arc<Mutex<dyn FnMut(&str, &str) + Send>>). The callback is injected at MetaService construction and propagated to all plugins via BaseMetaPlugin. This enables:

  • Local mode: Callback collects metadata into a Vec, written to DB after plugins finish
  • Client mode: Callback collects into a HashMap, sent to server after streaming completes
  • Server mode: Callback collects into a Vec, written to DB after plugins finish (same as local)

Common Modules

  • common/is_binary.rs - Binary file detection utilities
  • common/status.rs - Status information generation
  • common/mod.rs - PIPESIZE constant (8192), stream_copy() streaming utility

Client Module

  • client.rs - HTTP client wrapper (ureq-based, supports streaming POST)
  • modes/client/save.rs - 3-thread streaming save with local meta plugins (stdin → tee → compress → meta plugins → pipe → HTTP POST)
  • modes/client/get.rs - Get with server-side raw fetch + local decompression
  • modes/client/list.rs - List delegation to server
  • modes/client/info.rs - Info delegation to server
  • modes/client/delete.rs - Delete delegation to server
  • modes/client/diff.rs - Diff delegation to server
  • modes/client/status.rs - Status delegation to server
  • modes/client/update.rs - Update delegation to server (sends plugin names/metadata/tags)

Utility Modules

  • plugins.rs - Shared plugin utilities
  • args.rs - CLI argument definitions

Command Line Interface

Modes

  • Save mode: keep [--save] (default when no mode specified and no IDs provided)
  • Get mode: keep [--get] <ID|tag...> (default when IDs provided)
  • List mode: keep [--list] [tag...]
  • Info mode: keep [--info] <ID|tag...>
  • Delete mode: keep [--delete] <ID...>
  • Update mode: keep [--update] <ID> [tag...]
  • Diff mode: keep [--diff] <ID1> <ID2>
  • Status mode: keep [--status]
  • Server mode: keep [--server] <address:port>

Item Options

  • --meta KEY[=VALUE] - Set metadata for the item, remove if VALUE not provided
  • --digest <sha256|md5> - Digest algorithm to use when saving items
  • --compression <lz4|gzip|bzip2|xz|zstd|none> - Compression algorithm to use when saving items
  • --meta-plugins <plugin[,plugin...]> - Meta plugins to use when saving items

General Options

  • --dir <PATH> - Specify the directory to use for storage
  • --list-format <FORMAT> - A comma separated list of columns to display with --list
  • --human-readable - Display file sizes with units
  • --verbose - Increase message verbosity
  • --quiet - Do not show any messages
  • --output-format <table|json|yaml> - Output format for info, status, and list modes
  • --server-password <PASSWORD> - Password for server authentication
  • --server-cert <PATH> - TLS certificate file (PEM) for HTTPS server
  • --server-key <PATH> - TLS private key file (PEM) for HTTPS server
  • --force - Force output even when binary data would be sent to a TTY

Client Options (requires client feature)

  • --client-url <URL> - Remote keep server URL
  • --client-password <PASSWORD> - Remote server password

Data Storage

Database Schema

  • items table: id (primary key), ts (timestamp), uncompressed_size (optional), compressed_size (optional), closed (boolean), compression
  • tags table: id (foreign key to items), name (tag name)
  • metas table: id (foreign key to items), name (meta key), value (meta value)
  • Indexes on tag names and meta names for faster queries

File Storage

  • Data directory contains compressed item files named by their item ID
  • Database file stored in data directory
  • File permissions set to be private to user (umask 077)

REST API Endpoints

Status Operations

  • GET /api/status - Get system status information
  • GET /api/plugins/status - Get plugin status information

Item Operations

  • GET /api/item/ - Get a list of items as JSON. Optional params: order=newest|oldest, start=0, count=100, tags=tag1,tag2
  • POST /api/item/ - Add a new item (body: raw content, streamed through fixed-size 8192-byte buffers). Query params: tags, metadata (JSON), compress=true|false, meta=true|false
  • POST /api/item/<#>/meta - Add metadata to an existing item (body: JSON object)
  • POST /api/item/<#>/update - Re-run meta plugins on stored content. Query params: plugins (comma-separated), metadata (JSON), tags (comma-separated, idempotent)
  • DELETE /api/item/<#> - Delete an item
  • GET /api/item/latest - Return the latest item as JSON. Optional params: tags=tag1,tag2, allow_binary=true|false
  • GET /api/item/latest/meta - Return the latest item metadata as JSON. Optional params: tags=tag1,tag2
  • GET /api/item/latest/content - Return the raw content of the latest item (streamed). Optional params: tags=tag1,tag2, decompress=true|false
  • GET /api/item/<#> - Return the item as JSON. Optional params: allow_binary=true|false
  • GET /api/item/<#>/meta - Return the item metadata as JSON
  • GET /api/item/<#>/content - Return the raw content of the item (streamed). Optional params: decompress=true|false
  • GET /api/diff - Diff two items. Params: id_a, id_b (individual items capped at 10 MB)

Server Configuration

  • max_body_size - Maximum POST body size in bytes (default: unlimited). When exceeded, server returns 413 PAYLOAD_TOO_LARGE while keeping the partial item already saved through the streaming pipeline. Set to 0 for unlimited.

Server Modes

  • Plain HTTP (default): tokio::net::TcpListener + axum::serve()
  • HTTPS (with tls feature): axum_server::bind_rustls() with rustls when --server-cert and --server-key are provided
  • Conditional selection at startup: cert+key present → HTTPS, otherwise → HTTP

Client/Server Protocol

  • Smart clients (keep CLI) set compress=false and meta=false on POST, handling compression and meta plugins locally
  • Dumb clients (curl) use defaults (compress=true, meta=true), server handles everything
  • Smart client update: sends plugins param to server, server runs plugins on stored content (avoids downloading compressed data)
  • GET responses include X-Keep-Compression header when decompress=false
  • Streaming save uses chunked transfer encoding for constant memory usage
  • Universal streaming: All server paths (POST, GET, diff) use PIPESIZE (8192) byte buffers
  • 413 partial item: When max_body_size is exceeded, the server returns 413 but keeps the partial item already saved through the pipeline (nonfatal design — pipes continue normally)

Authentication

  • Bearer token authentication: Authorization: Bearer <password>
  • Basic authentication: Authorization: Basic base64(keep:<password>)
  • When no password is set, authentication is disabled

Supported Compression Types

  • LZ4 (internal implementation)
  • GZip (internal implementation)
  • BZip2 (external program)
  • XZ (external program)
  • ZStd (external program)
  • None (no compression)

Supported Meta Plugins

Meta plugins collect metadata during item save. Each plugin produces one or more key-value pairs:

  • magic_file - File type detection using libmagic (when magic feature enabled)
  • infer - MIME type detection using infer crate (when infer feature enabled)
  • tree_magic_mini - MIME type detection using tree_magic_mini (when tree_magic_mini feature enabled)
  • tokens - LLM token counting using tiktoken (when tokens feature enabled)
  • text - Text analysis: line count, word count, char count, line average length
  • digest - SHA-256 and MD5 checksums
  • hostname - System hostname (full and short)
  • cwd - Current working directory
  • user - Current username and UID
  • shell - Shell path from SHELL environment variable
  • shell_pid - Shell process ID from PPID
  • keep_pid - Keep process ID
  • env - Arbitrary environment variables (via KEEP_META_ENV_* prefix)
  • exec - Execute external commands for custom metadata
  • read_time - Time taken to read content
  • read_rate - Content read rate (bytes/second)

Testing Strategy

  • Unit tests for each module in src/tests/
  • Integration tests for modes
  • Database tests for CRUD operations
  • Compression engine tests for each supported format
  • Meta plugin tests for each plugin type
  • Server tests for API endpoints and authentication
  • Common utilities tests for helper functions

Binary Data Handling

  • Automatic binary detection using file signatures and heuristics
  • Prevents binary data output to TTY unless --force is used
  • Binary meta plugin analyzes content to determine if it's binary
  • API endpoints respect binary flags to prevent accidental binary transmission

Security Considerations

  • File permissions are restricted to user only (umask 077)
  • Input validation for item IDs to prevent path traversal
  • Authentication for server mode with bearer or basic auth
  • TLS/HTTPS support via rustls when certificate and key are provided
  • Proper resource cleanup using RAII patterns
  • Safe handling of external processes with proper stdin/stdout management
  • Streaming architecture: All server I/O uses fixed-size 8192-byte buffers; no full file contents held in memory
  • XSS protection: All user-controlled data in HTML pages is escaped via html-escape
  • Security headers: X-Content-Type-Options: nosniff, X-Frame-Options: DENY, Referrer-Policy: strict-origin-when-cross-origin
  • CORS: Explicit allowed headers only (Content-Type, Authorization, Accept); no wildcard headers
  • Input limits: Tags (256 chars), metadata keys (128 chars), metadata values (4096 chars), pagination (10,000 max)
  • Config file size: 4 KB cap with from_utf8_lossy for safe UTF-8 handling
  • Error sanitization: Internal errors never exposed in HTML responses
  • No unsafe_code: Enforced via #![deny(unsafe_code)] (exceptions: libc::umask in main.rs, unsafe impl Send for SendCookie in magic_file.rs)

Feature Flags

  • server - HTTP REST API server (axum-based)
  • tls - HTTPS/TLS support for server (axum-server + rustls)
  • client - HTTP client for remote server (ureq-based, includes streaming save)
  • swagger - OpenAPI/Swagger UI documentation
  • magic - File type detection via libmagic
  • lz4 - LZ4 compression (internal)
  • gzip - GZip compression (internal)
  • bzip2 - BZip2 compression (external)
  • xz - XZ compression (external)
  • zstd - ZStd compression (external)