Compare commits

...

59 Commits

Author SHA1 Message Date
8379ae2136 refactor: rename plugin features with type prefix for consistency
- Plugin features now use type_ prefix (meta_magic, filter_grep, etc.)
- Added meta_all_musl and filter_all_musl for MUSL-compatible builds
- grep filter plugin made optional via filter_grep feature flag
- Removed regex crate from grep-related code, uses strip_prefix instead
- Updated CHANGELOG.md with breaking change documentation
2026-03-21 17:36:29 -03:00
12de215527 feat: feature-gate CLI args by server/client features
- CLI now shows only relevant options: --server and --server-* args
  hidden when built without 'server' feature; --client-* args hidden
  without 'client' feature. Run --help only displays applicable options.
- Removed verbose 'conflicts_with_all' from all mode args — clap's
  implicit group("mode") already enforces mutual exclusivity.
- 'server' feature now includes TLS/HTTPS by default (axum-server);
  'tls' feature removed. rustls already available via client/ureq.
- Gated KeepModes::Server, server mode detection, and server-password
  validation in main.rs.
- Gated server arg reads in config.rs.
- Removed redundant #[cfg(feature = "tls")] guards from server/mod.rs.
- Gated resolve_item_id/resolve_item_ids helpers in common.rs.
- All 4 feature combinations (server+client, server-only, client-only,
  neither) compile and pass tests.
2026-03-21 16:26:27 -03:00
e2cb36d2a8 feat(server): add file_size to API ItemInfo response 2026-03-21 14:03:58 -03:00
0004324301 perf: pre-allocate status info collections with known capacities 2026-03-21 13:54:37 -03:00
b3edfe7de6 chore: code review cleanup — fixes, deps, docs
Fixed:
- CLI help typo: "metatdata" -> "metadata"
- Filter buffer OOM: check size before loading into memory

Changed:
- #[inline] on HTML escape helpers for hot path performance
- Replaced once_cell and lazy_static with std::sync::LazyLock
- Removed unused once_cell and lazy_static crate dependencies

Refactored:
- Added module-level doc to services/ module

Documentation:
- README.md: zstd is native not external, "none" -> "raw"
- DESIGN.md: current schema and meta plugins section
- CHANGELOG.md: Unreleased section populated
2026-03-21 11:44:37 -03:00
ab2fb07505 docs: add changelog update instructions to AGENTS.md 2026-03-21 10:56:43 -03:00
547f0b5d11 docs: add CHANGELOG.md following Keep a Changelog format 2026-03-21 10:55:16 -03:00
30d7836bcf refactor: deduplicate ItemInfo, improve error handling, fix pre-existing bugs
- Move ItemInfo to services/types.rs for sharing between client and server
- Replace .expect() in compression_service with proper error handling
- Add CoreError::PayloadTooLarge variant for semantic error handling
- Export CoreError from lib.rs for library users
- Unify get_item_meta_name/value to take &str instead of String
- Extract item_path() helper in ItemService to reduce duplication
- Add warning logs for silent errors in list.rs
- Fix pre-existing borrow errors: tx moved in export handler,
  item_with_meta partial move in TryFrom implementation
- Fix unused data_dir variables in server code
2026-03-21 10:43:26 -03:00
2cfee5075e fix: panic guards, dedup, and unsafe documentation
- diff.rs: graceful error instead of expect() on item ID in spawned thread
- common.rs: lazy_static regex, avoid unwrap on regex captures
- db.rs: ok_or_else guard on item.id in delete_item
- list/get/info/export/client/list: use settings.meta_filter() helper
- item_service.rs: expect() on meta lock instead of silent swallow
- filter_plugin/mod.rs: extract parse_encoding_option() helper
- main.rs: document unsafe libc::umask block with safety rationale
2026-03-20 17:17:58 -03:00
52e9787edb refactor: deduplicate filter plugins, extract helpers across codebase
Bug fixes:
- client: add error field to ApiResponse to avoid swallowing server errors
- args/config: fix list_format default mismatch (5 vs 7 columns)
- client: url-encode size param in set_item_size

Dedup - filter plugins:
- Extract count_option() and pattern_option() helpers, replace 7 identical options()
- Add #[derive(Clone)] to all filter structs; remove verbose clone_box() impls
- Simplify FilterChain clone() and impl Clone for Box<dyn FilterPlugin>
- Add filter_clone_box! macro for future use
- Fix doctest example missing clone_box

Dedup - server API:
- Extract spawn_body_reader() with LimitBehavior enum for body streaming
- Extract check_binary_content() helper
- Extract stream_with_offset_and_length() helper
- Extract generate_status() helper in status.rs
- Extract append_query_params() helper in client.rs

Dedup - other:
- Extract yaml_value_to_string() in meta_plugin/mod.rs
- Extract item_from_row() in db.rs
- Delete unused DisplayListItem struct

Misc:
- Remove duplicate doc comment in compression_service.rs
2026-03-20 15:54:33 -03:00
00be72f3d0 refactor: rename size to uncompressed_size, add compressed_size and closed columns
Schema changes:
- Rename items.size to items.uncompressed_size for clarity
- Add compressed_size (INTEGER NULL) - tracks compressed file size on disk
- Add closed (BOOLEAN NOT NULL DEFAULT 1) - tracks whether item is fully written
- Existing items default to closed=true via migration

Lifecycle:
- Items created with closed=false, set to true on successful save/import
- Compressed size captured via fs::metadata() after compression writer closes
- Truncated uploads (413) get compressed_size set, closed=true, uncompressed_size=None
- Update command now backfills both uncompressed_size and compressed_size

Also includes bug fixes and dedup from prior review:
- Fix stream_raw_content_response using uncompressed_size for raw byte Content-Length
- ApiResponse::ok()/empty() constructors, TryFrom<ItemWithMeta> for ItemInfo
- tag_names() method on ItemWithMeta, meta_filter() on Settings
- Fix .unwrap() panics in compression engine Read/Write impls
- Fix TOCTOU race in stream_raw_content_response (now uses compressed_size)
- Fix swallowed write errors in meta plugins (digest, magic_file, exec)
- Fix term::stderr().unwrap() panic in item_service
- Deduplicate ItemService::new() calls across 20 API handlers
- ImportMeta supports #[serde(alias = "size")] for backward compat

All 75 tests, 67 doc tests pass. Clippy clean.
2026-03-18 10:58:26 -03:00
49793a0f94 feat: add streaming tar export/import and rename "none" to "raw"
- Add streaming tar-based export (--export produces .keep.tar)
- Add streaming tar import (--import reads .keep.tar archives)
- Add server endpoints GET /api/export and POST /api/import
- Rename CompressionType::None to CompressionType::Raw with "none" as alias
- Add DB migration to update existing "none" compression values to "raw"
- Fix export endpoint to propagate errors to client instead of swallowing
- Fix import endpoint to return 413 on max_body_size instead of truncating

Export streams items as tar archives without loading entire files into memory.
Import extracts items with new IDs, preserving original order. Both work
locally and via client/server mode.

Co-Authored-By: opencode <noreply@opencode.ai>
2026-03-17 21:24:39 -03:00
074ba64805 feat: allow --list to accept item IDs for filtering
- Local and client/server modes now support ID-based filtering
- keep -l 1 2 3 lists specific items by ID
- keep -l --ids-only 1 2 3 outputs just those IDs
- Server API adds optional 'ids' query parameter to GET /api/item/
- KeepClient.list_items gains ids parameter
2026-03-17 17:56:35 -03:00
02f0c8d453 fix: use XDG config directory for default config file location
Changes from manual HOME/.config/keep/config.yml construction to
dirs::config_dir(), which respects XDG_CONFIG_HOME.
2026-03-17 16:07:13 -03:00
c29e37c03e fix: use XDG data directory as default storage location
Changes default from ~/.keep to /keep
(e.g. ~/.local/share/keep on Linux). Uses dirs::data_dir() which
respects XDG_DATA_HOME environment variable.
2026-03-17 15:37:25 -03:00
28c3deaeca fix: expand tilde (~) in config file paths to home directory
Applies to dir, import_data_file, and all server certificate/secret file
paths. Uses existing dirs crate for home directory resolution.
2026-03-17 15:32:30 -03:00
cb56a398fa feat: add --ids-only flag to --list mode for scripting
Outputs one ID per line with no header. Errors if used with any mode
other than --list. Works with both local and client (remote) list.
2026-03-17 15:04:10 -03:00
2452da52ef chore: add license, repository, keywords, and rust-version to Cargo.toml 2026-03-17 14:50:45 -03:00
6347427536 chore: remove bin/keep binary from tracking, add bin/ to gitignore 2026-03-17 14:47:57 -03:00
a8759c4b83 feat: add infer and tree_magic_mini meta plugins, make zstd internal by default
- Add infer crate as meta plugin for MIME type detection
- Add tree_magic_mini crate as alternative meta plugin for MIME type detection
- Add zstd, infer, tree_magic_mini to default features
- Fix static build script to use musl target instead of glibc+crt-static
- Remove hardcoded shell list from --generate-completion help text
- Fix update() in both new plugins to emit MIME metadata when buffer fills
2026-03-17 14:46:51 -03:00
a90c19efc1 feat: add native zstd compression plugin and deduplicate shared compression/meta utilities
- Add zstd crate (v0.13) with native Rust compression engine (level 3)
- Gate behind 'zstd' feature flag, fall back to program-based when disabled
- Extract CompressionService::decompressing_reader/compressing_writer with zstd support
- Extract MetaService::with_collector() to eliminate Arc<Mutex<Vec>> boilerplate
- Extract read_with_bounds() helper for skip+read pattern
- Add input validation for mutually exclusive --id and --tags flags
- Add zstd round-trip tests
2026-03-16 20:03:30 -03:00
35ee71c3cf feat: add export/import modes, unify service layer, fix binary detection
Export/import:
- Add --export and --import modes for both local and client paths
- Use strfmt crate for --export-filename-format templates ({id}, {tags}, {ts}, {compression})
- Import preserves original timestamps via server ?ts= param
- --import-data-file for file-based import; stdin fallback streams with PIPESIZE buffers

Service unification:
- Merge SyncDataService unique methods into ItemService (delete_item now returns Result<Item>)
- Delete AsyncDataService, AsyncItemService, DataService trait (dead code / async-blocking anti-pattern)
- All server handlers use spawn_blocking + ItemService directly
- Extract shared types (ExportMeta, ImportMeta) and helpers (resolve_item_id(s), check_binary_tty)

Binary detection fix:
- Replace broken metadata.get("map") + is_binary(&[]) with actual content sampling
- Both as_meta and allow_binary paths read PIPESIZE sample before deciding
- Never load entire item into memory for binary check

Other fixes:
- Fix lock consistency: all handlers use blocking_lock() in spawn_blocking (no mixed lock().await)
- Use ISO 8601 format for {ts} in export filenames
- Fix resolve_item_ids returning only 1 item for tag lookups
- Fix client get.rs triple-buffering and export.rs whole-file buffering
- Add KeepClient::get_item_content_stream() for streaming reads
- Pass all clippy --features server lints (Path vs PathBuf, &mut conn, etc.)
2026-03-16 08:43:26 -03:00
0a3d61a875 fix: client save with --compression none stored lz4 instead of none
- server_compress was true when compression_type=None, telling server to
  recompress with its default (lz4) instead of storing raw
- compression_type query param was only sent when !server_compress,
  so 'none' was never sent to server
- Fix: server_compress always false in client mode (client handles all
  compression), compression_type always sent to server

Tested: save/get/list/info/filters/delete for lz4, none, gzip on both
local and client/server modes. All operations produce matching results.
2026-03-15 12:46:29 -03:00
eca17b36ee fix: client save logs item ID early, stores compression via proper field and size via update endpoint
- Client save now logs 'New item: {id}' immediately after server response
- Compression type sent as query param, stored in DB compression field (not _client_compression metadata)
- Client set_item_size() sends uncompressed size via POST /api/item/{id}/update?size=N
- Server raw content GET uses actual file size for Content-Length (not uncompressed item.size)
- Removed _client_compression metadata hack from client save and get
- Fixed server handle_update_item to support size-only updates
- Fixed clippy: collapsible_if, too_many_arguments, unnecessary mut refs
- Fixed ListItemsQuery doctest missing meta field
2026-03-15 10:14:55 -03:00
5bad7ac7a6 refactor: decouple meta plugins from DB via SaveMetaFn callback, extract shared utilities
- Add SaveMetaFn callback pattern: meta plugins receive a closure instead of
  &Connection, enabling the same plugin code to work in local, client, and
  server contexts (collect-to-Vec, collect-to-HashMap, or direct DB write)
- Client save now runs meta plugins locally during streaming (smart client
  sets meta=false, server skips its own plugins)
- Add POST /api/item/{id}/update endpoint for re-running plugins on stored
  content without downloading compressed data
- Add client update mode (--update with --meta-plugin flags)
- Extract shared utilities: stream_copy, print_serialized, build_path_table,
  ensure_default_tag to reduce duplication across modes
- Add upsert_tag for idempotent tag addition (INSERT OR IGNORE)
- Add warn logging on save_meta lock failure in BaseMetaPlugin and MetaService
2026-03-14 22:36:59 -03:00
fdc5f1d744 fix: client --list uses list_format from config like local mode
Move apply_color/apply_table_attribute to common.rs for sharing.
Add render_list_table_with_format() that takes ColumnConfig slice
and pre-computed row values. Client list now renders columns based
on settings.list_format, showing empty for columns where server
data is unavailable (e.g. text_line_count, token_count).
2026-03-14 20:01:58 -03:00
f5bae46620 fix: all tables respect table_config from settings
Extract shared render_item_info_table() and render_list_table() in
modes/common.rs. Update client/info, client/list, client/status,
info, status, and status_plugins to use create_table_with_config
with settings.table_config instead of hardcoded presets.

Previously only local --list used table_config; all other tables
(client modes, status, status-plugins) ignored it.
2026-03-14 19:49:31 -03:00
0bc8d9c909 fix: surface server error in get_status and trim table output
- Include error field in get_status() ApiResponse so server error
  messages are surfaced instead of generic 'No status data returned'
- Use trim_lines_end() on table output to match local mode formatting
2026-03-14 19:32:39 -03:00
1a942b4d23 fix: format client --status output as tables instead of raw JSON
Change client get_status() to return StatusInfo struct instead of
serde_json::Value, then render paths, meta plugins, and compression
tables matching the local mode's output style.
2026-03-14 19:25:53 -03:00
886ac98b21 fix: URL-encode query params in client and pass --meta to server on save
- URL-encode all query parameter keys and values in get_json_with_query
  and post_stream. Previously raw JSON like {"project":"alpha"} was
  sent unencoded, causing 'invalid uri character' errors.
- Pass settings.meta (key=value pairs) from client save to server as
  metadata. Previously always passed empty HashMap, so --meta was
  silently ignored in client save mode.
2026-03-14 19:16:39 -03:00
0658d8378f fix: group all server options under Server Options help heading
The --server-password, --server-password-hash, --server-username,
--server-jwt-secret, --server-jwt-secret-file, and --server-max-body-size
options were appearing in the generic Options section instead of the
Server Options section.
2026-03-14 18:56:32 -03:00
ffe71440d9 fix: use explicit snake_case serialization for CompressionType
Per project convention, enum string representations should use
snake_case. Use explicit strum serialize attributes instead of
serialize_all to avoid incorrect splitting of acronyms like
GZip → g_zip and ZStd → z_std.
2026-03-14 18:26:58 -03:00
8acbd34150 fix: add --meta filtering support to client/server list mode
Plumb metadata filter from client CLI through the HTTP API to the
server's data_service.list_items(). The server accepts a JSON-encoded
meta query parameter where null values mean 'key exists' and string
values mean 'exact match'.

Also fix LZ4 compression round-trip for client mode:
- Explicit flush FrameEncoder before drop to avoid sending only the
  frame header when compress=false
- Send _client_compression metadata so client knows actual compression
  on retrieval (server records compression=None when compress=false)
- Use FrameDecoder (frame format) instead of decompress_size_prepended
  (size-prepended format) to match server storage format
2026-03-14 18:22:07 -03:00
f2d93a2812 fix: skip_lines/skip_bytes filters producing empty output on large files
FilteringReader::read() returned Ok(0) (EOF) when a filter consumed a
chunk without producing output. Filters like skip_lines need to see
multiple chunks before outputting anything — returning 0 prematurely
truncated the stream. Loop until the filter produces output or the
underlying reader is truly exhausted.
2026-03-14 16:20:30 -03:00
0af74000d2 fix: eliminate unsafe code via nix, command-fds, and thread-local cookie
Replace 4 unsafe sites with safe wrappers:

- libc::pipe2 → nix::unistd::pipe2 (safe OwnedFd return)
- File::from_raw_fd → File::from(OwnedFd) (safe ownership transfer)
- unsafe impl Send for SendCookie → thread_local! lazy Cookie
  (each thread gets its own independent Cookie, no Send needed)
- pre_exec + libc::fcntl → command-fds crate fd_mappings()
  (handles CLOEXEC clearing safely, also fixes potential fd leak
  on spawn failure via OwnedFd RAII)

Only libc::umask remains as a single unavoidable unsafe site
(no safe Rust wrapper exists for the umask syscall).

Also updates AGENTS.md to remove stale SendCookie exception.
2026-03-14 16:01:54 -03:00
9a1e23e85f fix: use tempdir for db doctests instead of project root
All 27 doctests in db.rs wrote keep.db to the project root via
PathBuf::from("keep.db"). Now use tempfile::tempdir() so the
database is created in a temp directory and cleaned up automatically.
2026-03-14 15:10:47 -03:00
b3ca673b52 feat: add --update mode, --meta/--meta-plugin flags, streaming diff
- Add --update mode to modify tags and metadata for existing items by ID
- Add --meta key=value flag to set metadata during save/update
- Add --meta key (bare) to delete metadata keys or filter by existence
- Add --meta-plugin/-M name:{json} flag for plugin options via CLI
- Env meta plugin now uses options from --meta-plugin instead of only env vars
- Stream decompressed content to diff via /dev/fd pipes (no temp files)
- Wire --list-format CLI arg to settings (was parsed but ignored)
- Allow --info to accept tags (was restricted to numeric IDs only)
- Change DB meta filtering to HashMap<String, Option<String>> for exact match + key existence
- Fix fcntl error checking in diff pre_exec
- Fix README inaccuracies (delete by tag, nonexistent --digest flag, meta plugin key names)
2026-03-14 15:02:16 -03:00
4b51825917 docs: document default mode shortcuts for save and get
- Quick Start: show bare keep <tag> (save) and keep <#> (get) shortcuts
- Save Mode: note that --save is optional when piping content
- Get Mode: clarify that only numeric IDs default to Get mode;
  fix incorrect keep <tag> example that would actually save
2026-03-14 11:48:37 -03:00
2ffa2a977a feat: add shell profiles for zsh, sh, csh/tcsh
- profile.bash: simplified preexec_init (early return), extracted
  ___keep_complete helper for @/@@ completion wrappers
- profile.zsh: add-zsh-hook preexec, wrapper function, @/@@ aliases,
  completions via compdef
- profile.sh: POSIX-compatible for sh/dash/ksh. Wrapper function,
  @/@@ aliases. No preexec or completions.
- profile.csh: alias-based keep wrapper, @/@@ aliases. No preexec
  or completions.
- modulefile: adds KEEP_SH_PROFILE, KEEP_ZSH_PROFILE, KEEP_CSH_PROFILE
- README: updated Shell Integration table and Shell Completion section
2026-03-14 11:36:29 -03:00
1a8ed56b68 feat: add --generate-completion for shell tab completion
- Add clap_complete dependency for bash/zsh/fish/elvish/powershell
- Add --generate-completion <shell> flag that prints completion script to stdout
- profile.bash sources completions via command keep --generate-completion bash
- @ and @@ aliases get completions via wrapper functions that delegate to _keep
- README updated with Shell Completion section
2026-03-14 11:02:38 -03:00
158bf50864 docs: add environment modulefile instructions to README 2026-03-14 10:36:57 -03:00
17be6abaab refactor: streaming, security hardening, and MCP removal
Major overhaul of server architecture and security posture:

- Streaming: Unified all I/O through PIPESIZE (8192-byte) buffers.
  POST bodies stream via MpscReader through the save pipeline. GET
  content streams from disk via decompression to client. Removed
  save_item_with_reader, get_item_content_info, ChannelReader.
  413 responses keep partial items (nonfatal by design).

- Security: XSS protection in all HTML pages via html_escape crate.
  Security headers middleware (nosniff, frame deny, referrer policy).
  CORS tightened to explicit headers. Input validation for tags
  (256 chars), metadata (128/4096), pagination (10k cap). Config
  file reads use from_utf8_lossy. Generic error messages in HTML.
  Diff endpoint has 10 MB per-item cap. max_body_size config option.

- Panics eliminated: Path unwraps → proper error propagation.
  Mutex unwraps → map_err (registries) / expect with message (local).

- MCP removed: Deleted all MCP code, rmcp dependency, mcp feature.

- Docs: Updated README, DESIGN, AGENTS to reflect all changes.
2026-03-14 00:03:42 -03:00
560ba6e20c fix: count_bounded error counting, clippy if-let, auth test dedup, doc tests
- count_bounded: break on iterator error instead of counting errors as tokens
- collapse nested if-let chains with let-chains in auth middleware
- document JWT/Basic Auth as mutually exclusive
- TailTokensFilter::clone uses empty buffer (always pre-filter)
- fix 9 broken doc examples in server/common.rs
- remove 7 duplicate auth tests from auth.rs (covered by auth_tests.rs)
2026-03-13 22:04:38 -03:00
a07bb6b350 feat: plugin-declared parallel execution, switch to env_logger, update deps
Parallel execution (opt-in via MetaPlugin::parallel_safe):
- Add Send bound to MetaPlugin, parallel_safe() method (default false)
- Override to true in digest, tokens, exec, magic_file plugins
- MetaService: std::thread::scope for initialize_plugins and process_chunk
- Extract plugins via NullMetaPlugin sentinel + std::mem::replace (no unsafe)
- Panic tracking: join errors logged, NullMetaPlugin restored and finalized
- MetaPluginExec: Box<dyn Write> -> Box<dyn Write + Send>
- SendCookie wrapper for libmagic Cookie with unsafe impl Send

Logging (stderrlog -> env_logger):
- Custom format: [SSSSSS.mmm] LEVEL [module:] message (time-since-start ms)
- Default level: Warn (matches previous behavior)
- -v: Debug, -vv+: Trace, -q: off
- -vv+ shows module path

Maintenance:
- Bump deps: thiserror 2.0, config 0.15, dns-lookup 3.0, lz4_flex 0.12,
  ringbuf 0.4, rand 0.9, lazy_static 1.5, env_logger 0.11
- Update Cargo.lock (186 transitive packages)
- Clippy fixes: is_multiple_of, to_string_in_format_args, collapsible_if
- Fix double-counting bug in TokensMetaPlugin::update
- Fix schema description using plugin.description()

Co-Authored-By: opencode <noreply@opencode.ai>
2026-03-13 21:49:51 -03:00
e7d8a83369 feat: add plugin schema system, tokenizer cache, and config validation
- Add plugin schema types and runtime discovery for meta/filter plugins
- Rewrite --generate-config to use schema system instead of hardcoded types
- Add Settings::validate_config() for startup validation
- Cache tokenizer instances via static Lazy to avoid repeated BPE loading
- Add split_by_token_iter() and count_bounded() to Tokenizer
- Fix double-counting bug in TokensMetaPlugin when buffer < max_buffer_size
- Eliminate unnecessary allocations in token count methods
- Refactor token filters: remove Option<Tokenizer>, use iterator API
- Fix TailTokensFilter correctness: unbounded buffer instead of ring buffer
- Add encoding option to all token filters
- Add description() to MetaPlugin and FilterPlugin traits
- Fix unused_mut warning in compression engine (feature-gated code)

Co-Authored-By: code-review-bot <noreply@anthropic.com>
2026-03-13 20:23:17 -03:00
914190e119 feat: add LLM token counting meta plugin and token filters
Add tiktoken-based token counting via new 'tokens' feature flag.

New components:
- Shared tokenizer module wrapping tiktoken CoreBPE (cl100k_base, o200k_base)
- TokensMetaPlugin: streaming token counter, tokenizes each chunk independently
- head_tokens(N): stream first N tokens, split at exact boundary when mid-chunk
- skip_tokens(N): skip first N tokens, stream the rest
- tail_tokens(N): bounded ring buffer (~16KB), outputs last N tokens at finalize

All filters are fully streaming — no full-stream buffering.
Meta plugin accuracy: exact for normal text, ±1-2 tokens if long whitespace
sequence spans a chunk boundary.

Also: add 'client' and 'tokens' to default features, add curl to Dockerfile builder stage.
2026-03-13 16:48:31 -03:00
e672ec751e feat: add JWT auth, configurable username, switch password auth to Basic
Add server-side JWT authentication with permission-based access control
(read/write/delete claims). Password authentication now uses HTTP Basic
auth only (replacing Bearer). Add configurable username for both server
and client (--server-username/--client-username, defaults to "keep").

JWT secret supports file-based loading via --server-jwt-secret-file for
Docker secrets. OPTIONS preflight requests bypass auth. HEAD mapped to
read permission.

Co-Authored-By: opencode <noreply@opencode.ai>
2026-03-13 13:56:35 -03:00
af1e0ca570 feat: expand Docker build to all features, add docker-compose.yml
- Build with server, mcp, swagger, client, tls features (all except magic)
- Add KEEP_* environment variable documentation and defaults
- Copy CA certificates for HTTPS client support in scratch image
- Add docker-compose.yml with keep-data and keep-config volumes
2026-03-13 10:08:28 -03:00
d5d58bc52c feat: add lz4 command fallback, remove unused magic.rs
- Add program-based lz4 command fallback when lz4 feature is disabled
- Feature-gate lz4.rs and lz4 tests to compile without lz4_flex
- Delete legacy magic.rs (unused, no feature gating, superseded by magic_file.rs)
2026-03-13 08:51:10 -03:00
b166477202 fix: harden security, eliminate panics, remove dead code, add Dockerfile
Security:
- Use constant-time password comparison (subtle crate) to prevent timing attacks
- Replace permissive CORS with configurable origin-restricted CORS
- Add TLS warning when password auth is used without HTTPS

Bug fixes:
- Convert MetaPlugin panics to anyhow::Result (get_meta_plugin, outputs_mut, options_mut)
- Replace item.id.unwrap() with proper error handling across 15 call sites
- Fix panic on unknown column type in list mode
- Fix conflicting PIPESIZE constant (was 8192 vs 65536, now unified to 8192)
- Add 256MB filter chain buffer limit to prevent OOM
- Gracefully skip unregistered plugins instead of panicking

Dead code removal:
- Delete unused filter parser files (filter_parser.rs, filter.pest, parser/ module)
- ~260 lines of dead PEG parser code removed

Code consolidation:
- Add is_content_binary_from_metadata() helper (was duplicated in 4 places)
- Simplify save_item_raw() to delegate to save_item_raw_streaming() (~90 lines removed)

Incomplete features:
- Populate filter_plugins in status output from global registry
- Add FallbackMagicFileMetaPlugin (was referenced but never implemented)
- Document init_plugins() as intentional no-op

Infrastructure:
- Add Dockerfile (static musl binary on scratch, 4.8MB)
- Add .dockerignore
- Add cors_origin to ServerConfig and config.rs
2026-03-13 07:57:36 -03:00
bee980605f feat: add HTTPS/TLS server support via rustls
Add optional TLS support for the server using axum-server with the
tls-rustls feature. When --server-cert and --server-key are provided
(and tls feature is enabled), the server binds with TLS instead of
plain HTTP.

Changes:
- Add axum-server dependency with optional tls-rustls feature
- New 'tls' feature flag (independent of 'server')
- --server-cert/--server-key CLI args gated behind tls feature
- ServerConfig extended with cert_file/key_file fields
- Conditional TLS/HTTP binding in server mod.rs
- Fix PathBuf::to_str().unwrap() panic risk -> to_string_lossy()
- Update README.md and DESIGN.md with TLS documentation
2026-03-12 22:18:42 -03:00
237a581429 fix: add server streaming support, fix pre-existing compilation errors
Server changes for client mode streaming:
- POST /api/item/ now streams body via async channel → ChannelReader
  → save_item_raw_streaming when compress=false or meta=false
- Add POST /api/item/{id}/meta endpoint for client-side metadata
- Add save_item_raw_streaming<R: Read> to SyncDataService
- Add add_item_meta to AsyncDataService

Fix pre-existing issues that were hidden behind swagger cfg gate:
- Remove #[cfg(feature = "swagger")] from item module so it compiles
  with just the server feature
- Fix parse_comma_tags usage (returns Vec, not Result)
- Fix TextDiff temporary value lifetime issue
- Fix io::Error::new → io::Error::other
- Fix ok_or_else → ok_or for Copy types
- Inline format args throughout server code
- Fix empty line after doc comment in pages.rs
- Add cfg_attr for unused_mut where mcp feature gates mutation
- Add type_complexity allow on create_auth_middleware
- Distinguish task error vs save error in spawn_blocking handlers

Co-Authored-By: andrew/openrouter/hunter-alpha <noreply@opencode.ai>
2026-03-12 18:02:56 -03:00
c5529bedbf feat: add client mode with streaming support
Add client mode enabling the keep CLI to connect to a remote keep
server over HTTP. Local plugins (compression, meta, filters) run on
the client; the server stores/retrieves binary blobs.

Architecture:
- Client save uses 3-thread streaming pipeline: reader thread (stdin
  → tee/stdout → hash → compress), OS pipe, streamer thread (pipe →
  chunked HTTP POST). Memory usage is O(PIPESIZE) regardless of data
  size.
- Server accepts compress=false, meta=false, decompress=false query
  params for granular control of server-side processing.
- Streaming body handling on server via async channel → sync reader
  bridge (ChannelReader).

Key additions:
- src/client.rs: KeepClient with post_stream() for chunked upload
- src/modes/client/: save, get, list, info, delete, diff, status
- --client-url / KEEP_CLIENT_URL configuration
- --client-password / KEEP_CLIENT_PASSWORD for auth
- os_pipe dependency for zero-copy pipe streaming

Co-Authored-By: andrew/openrouter/hunter-alpha <noreply@opencode.ai>
2026-03-12 18:01:36 -03:00
d2581358e9 docs: rewrite README, add LICENSE, remove outdated files
- Rewrite README.md with comprehensive documentation covering all
  features: compression engines, meta plugins, filter plugins, server
  mode, MCP integration, and configuration
- Add MIT LICENSE file
- Delete README.org (consolidated into README.md)
- Delete empty PLAN.md
- Update AGENTS.md with current build/test commands and conventions

Co-Authored-By: andrew/openrouter/hunter-alpha <noreply@opencode.ai>
2026-03-12 18:01:23 -03:00
79930f4b01 chore: remove outdated tool usage notes from AGENTS.md 2026-03-12 12:00:32 -03:00
9b7cbd5244 fix: resolve doctest failures, database bugs, and remove dead code
- Fix all 96 doctest failures across 20 files by adding hidden imports and
  proper test setup (68 pass, 33 intentionally ignored)
- Fix set_item_tags: wrap in transaction and replace item.id.unwrap() with
  proper error handling
- Fix get_items_matching: replace N+1 per-item meta queries with batch
  get_meta_for_items() call
- Fix get_item_matching: apply meta filtering instead of ignoring the parameter
- Remove duplicate doc comment in store_meta
- Remove dead code files: plugin.rs, plugins.rs, binary_detection.rs
  (never declared as modules)
- Apply cargo fmt formatting fixes
- Add keep.db to .gitignore
2026-03-12 11:58:44 -03:00
8a8a6e1c4b fix: correct critical bugs and improve pipe streaming performance
Critical bug fixes:
- save_item now returns real Item from database, not a hardcoded fake
- AsyncDataService::save() reuses self.sync_service instead of creating redundant instance
- GenerateStatus trait signature mismatch fixed (CLI/API decoupling)

Performance improvements (pipe path untouched):
- CompressionEngine::open() returns Box<dyn Read + Send> enabling true streaming
- mode_get eliminates triple full-file read (was sampling then re-reading entire file)
- FilteringReader adds fast-path bypass when no filters, pre-allocates temp buffer
- text.rs meta plugin processes &[u8] slice directly, eliminates data.to_vec() clone

API correctness:
- Tag parse errors now return 400 instead of being silently discarded
- compute_diff uses similar crate (LCS-based) instead of naive positional comparison

Cleanup:
- Modernize string formatting (format!({x})) across codebase
- Remove redundant DB query in get mode
- Derive Debug/ToSchema on public types
- Delete placeholder test files with no real assertions
- Extract parse_comma_tags utility function
2026-03-11 20:45:05 -03:00
e8ea42506e feat: unify CLI and API with DataService trait
- Add DataService trait with streaming support for save/get operations
- Implement SyncDataService for CLI and AsyncDataService for API
- Add missing API endpoints: DELETE /api/item/{id}, GET /api/item/{id}/info, GET /api/diff
- Add GET /api/plugins/status endpoint
- Preserve stdin/stdout streaming performance via Read trait
2026-03-10 22:31:31 -03:00
fb4c1a2b11 fix: add missing serde default to list_format field
Fixes deserialization failure in generate-config mode by adding
#[serde(default)] attribute to list_format field in Settings struct.
This allows the config library to provide sensible defaults when
no config file exists, resolving the error "missing field list_format".

Also unstages AGENT.md naming change since that's a different fix.
2026-03-09 20:13:55 -03:00
143 changed files with 14192 additions and 5773 deletions

5
.dockerignore Normal file
View File

@@ -0,0 +1,5 @@
target/
.git/
*.db
keep.db
bin/

2
.gitignore vendored
View File

@@ -1,3 +1,5 @@
/target
.aider*
.crush
keep.db
bin/

View File

@@ -1,84 +0,0 @@
# Agent Configuration
**IMPORTANT:** Prefer to use the `write_file` tool if the edit is for the majority of a file, or if you are correcting previous problems made edits from other tools.
## Tools
**IMPORTANT**: Be very careful when quoting text in tool calls to add the right amount of escaping.
### `write_file`
When editing files use the `write_file` tool to output the complete version of the corrected file.
**IMPORTANT**: You must provide the whole file to `write_file`, even the unchanged parts.
## Build/Test Commands
**IMPORTANT**: Do not run application, start the web server, or the trunk server.
**IMPORTANT:** The cargo command cannot be ran in parallel.
```bash
# Check project
TERM=dumb cargo check
# Build project
TERM=dumb cargo build
# DO NOT RUN RUN APPLICATION (native)
# TERM=dumb cargo run
# Run all tests
TERM=dumb cargo test
# Run specific test (by name substring)
TERM=dumb cargo test test_function_name
# Run specific test with verbose output
TERM=dumb cargo test test_function_name -- --nocapture
# Check formatting
TERM=dumb cargo fmt --check
# Apply formatting
TERM=dumb cargo fmt
# Lint with clippy
TERM=dumb cargo clippy -- -D warnings
# Build for release
TERM=dumb cargo build --release
```
Prefix commands with `TERM=dumb` for consistent output.
## Code Style Guidelines
### Imports
- Group imports in order: standard library, external crates, local modules
- Use explicit imports over glob imports (`use std::fs::File;` not `use std::fs::*;`)
### Documentation
- Document all public APIs with rustdoc
- Use examples in documentation only when helpful
## Procedures
### Fix build problems
1. Check the project: `TERM=dumb cargo check`.
2. If there are errors or warnings, create a new sub agent (expert rust developer) that uses the `TERM=dumb cargo check` output as input, planned using strategic thinking.
a. Read all affected files
d. Plan the fixes using strategic thinking:
- Read other files if they provide context or examples
- Look up relevant API information
- Do not downgrade versions
- Preserve functionality
- Use `TERM=dumb cargo fix` if appropriate.
- Prefer the `write_file` tool if there is evidence of double escaping
- You must generate the full file contents when using `write_file` or it will be truncated.
c. Return the list of files modified
3. If any files were modified, loop back to 1.
### Fix formatting
1. Format the project the project: `TERM=dumb cargo fmt`
2. Continue with the fix build problems procedure.

65
AGENTS.md Normal file
View File

@@ -0,0 +1,65 @@
# Agent Configuration
**IMPORTANT:** `xxx | keep | zzz` must be as performant as possible in all situations.
## Build/Test Commands
**IMPORTANT**: Do not run the application, start the web server, or the trunk server.
**IMPORTANT:** Cargo commands cannot be run in parallel. Prefix all commands with `TERM=dumb`.
```bash
TERM=dumb cargo check # Fast compile check
TERM=dumb cargo build # Build project
TERM=dumb cargo test # Run all tests
TERM=dumb cargo test test_name # Run specific test by name substring
TERM=dumb cargo test -- --nocapture # Verbose test output
TERM=dumb cargo fmt --check # Check formatting
TERM=dumb cargo fmt # Apply formatting
TERM=dumb cargo clippy -- -D warnings # Lint (warnings are errors)
TERM=dumb cargo build --release # Release build
TERM=dumb cargo build --features server # With server feature
```
## Code Conventions
- `anyhow::Result` for error handling; `thiserror` for custom error types (`src/services/error.rs`)
- Plugin traits: `CompressionEngine`, `FilterPlugin`, `MetaPlugin`
- Dynamic trait objects use `clone_box()` for `Clone` on `Box<dyn Trait>`
- Plugin registration uses `ctor` constructors at module load time
- Filter plugins must implement `filter()`, `clone_box()`, and `options()`
- Meta plugins extend `BaseMetaPlugin` for boilerplate reduction
- Enum string representations: `#[strum(serialize_all = "snake_case")]`
- Lint rules: `deny(clippy::all)`, `deny(unsafe_code)` (except `libc::umask` in main.rs)
- Feature flags: `default = ["magic", "lz4", "gzip"]`; optional: `server`, `swagger`
## Testing
- Tests in `src/tests/` mirroring `src/` structure; shared helpers in `src/tests/common/test_helpers.rs`
- Key helpers: `create_temp_dir()`, `create_temp_db()`, `test_compression_engine()`
- Test naming: `test_<feature>_<scenario>`
## Streaming Constraint
**At no point should the whole file be in memory at once.** All I/O must use fixed-size buffers:
- `PIPESIZE` = 8192 bytes (`src/common/mod.rs:10`)
- Server POST body streams through `save_item_raw_streaming` via `MpscReader`
- Server GET content streams via streaming reader (not `read_to_end`)
- When `max_body_size` is exceeded, return `413` but keep the partial item (nonfatal by design)
- Filter/meta plugins use `PIPESIZE`-sized buffers
## HTML Rendering
- Use `html_escape` crate for all user-controlled data in HTML pages
- `esc()` for text content, `esc_attr()` for HTML attributes
- Security headers middleware: `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin-when-cross-origin`
## Changelog
The project uses [Keep a Changelog](https://keepachangelog.com/). The changelog lives at `CHANGELOG.md` in the project root.
- **Always update `CHANGELOG.md`** when making changes that affect users (new features, breaking changes, bug fixes, etc.)
- Add entries under the `[Unreleased]` section using these categories: `Added`, `Changed`, `Deprecated`, `Removed`, `Fixed`, `Security`
- Keep descriptions concise and user-focused — what changed from the user's perspective, not implementation details
- Commit changelog updates in the same commit as the feature/fix they document
- Before releasing a new version, move `[Unreleased]` entries to a versioned section (e.g., `[0.2.0] - YYYY-MM-DD`) and add a new empty `[Unreleased]` above it

107
CHANGELOG.md Normal file
View File

@@ -0,0 +1,107 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- New `filter_grep` feature to optionally include the grep filter plugin (regex-based line filtering). Disabling this feature removes the `regex` crate and its ~800 KiB dependency stack from the binary.
- New `meta_all_musl` feature for all MUSL-compatible meta plugins (excludes `meta_magic` which requires libmagic)
- New `filter_all_musl` feature for all MUSL-compatible filter plugins
- Database index on `items(ts)` column for faster ORDER BY sorting
- Server API `ItemInfo` now includes `file_size` — actual filesystem-reported size of the item data file
### Changed
- CLI args now feature-gated: `--server` and related options hidden when built without `server` feature; `--client-*` options hidden when built without `client` feature. Run `--help` only shows relevant options.
- `server` Cargo feature now includes TLS support by default (`axum-server`); `tls` feature removed
- Clap `conflicts_with_all` removed from all mode args — exclusivity now handled by implicit `group("mode")`
- Filter plugins check size before loading content into memory (prevents OOM on large inputs)
- Status page pre-allocates collections with known capacities (meta plugins, compression info)
- `#[inline]` on HTML escape helper functions (`esc`, `esc_attr`) for hot path performance
- Removed `once_cell` crate (replaced with `std::sync::LazyLock` from Rust 1.80)
- Removed `lazy_static` crate (replaced with `std::sync::LazyLock`)
### Breaking
- Plugin feature flags renamed with type prefix for consistency:
- `magic``meta_magic`
- `infer``meta_infer`
- `tree_magic_mini``meta_tree_magic_mini`
- `tokens``meta_tokens`
- `grep``filter_grep`
- `all-meta-plugins``meta_all`
- `all-filter-plugins``filter_all`
### Fixed
- CLI help text typo: "metatdata" → "metadata" in `--get` and `--info` descriptions
### Refactored
- Added module-level documentation to `services/` module
### Documentation
- README.md: Fixed compression table — zstd is native (not external), "none" renamed to "raw"
- DESIGN.md: Updated schema to reflect current `items` table columns and meta plugin inventory
## [0.1.0] - 2026-03-21
### Added
- Streaming tar-based export (`--export`) producing `.keep.tar` archives without loading entire files into memory
- Streaming tar-based import (`--import`) extracting `.keep.tar` archives with new IDs
- Server endpoints `GET /api/export` and `POST /api/import`
- ID-based filtering for `--list` (`keep -l 1 2 3` lists specific items by ID)
- Server API accepts optional `ids` query parameter on `GET /api/item/`
- `--ids-only` flag for `--list` mode for scripting
- `infer` and `tree_magic_mini` meta plugins for MIME type detection
- Native `zstd` compression plugin as default
- Configurable compression via `--compression` flag
- Export/import modes with format detection (JSON, YAML, binary)
- `XDG_CONFIG_HOME` support for default config file location
- `XDG_DATA_HOME` support for default storage location
- Tilde (`~`) expansion in config file paths
### Changed
- `CompressionType::None` renamed to `CompressionType::Raw` (with `"none"` as alias for backward compatibility)
- `items.size` column renamed to `items.uncompressed_size`
- Added `items.compressed_size` column tracking compressed file size on disk
- Added `items.closed` column tracking whether an item is fully written
- Default `list_format` in config now matches CLI default (7 vs 5 columns)
- All filter plugins share deduplicated option implementations
### Refactored
- Extracted `spawn_body_reader()` and `check_binary_content()` helpers for streaming uploads
- Extracted `yaml_value_to_string()` helper for meta plugins
- Extracted `item_path()` helper in `ItemService` to reduce path duplication
- Unified `get_item_meta_name`/`value` to take `&str` instead of `String`
- Shared `ItemInfo` struct between client and server
- Compression service now returns `Result` types instead of panicking via `.expect()`
- `ApiResponse::ok()` and `ApiResponse::empty()` constructors
- `meta_filter()` helper on `Settings` for consistent filtering
- Added `tag_names()` method on `ItemWithMeta`
- `filter_clone_box!` macro for filter plugin cloning
### Fixed
- Panic guards in diff, compression engine, and spawned threads
- Pre-existing borrow errors in export handler and `TryFrom` implementation
- TOCTOU race in `stream_raw_content_response`
- Swallowed write errors in meta plugins (digest, magic_file, exec)
- Truncated uploads (413) now properly store compressed data
- `term::stderr().unwrap()` panic in `item_service`
- `.unwrap()` panics in compression engine `Read`/`Write` impls
- Client API errors now propagate to user instead of being swallowed
- Import endpoint returns 413 on `max_body_size` instead of truncating
- `keep --list` uses `list_format` from config in all modes
- All tables respect `table_config` from settings
- `DisplayListItem` struct removed (was unused)
- `#[serde(alias = "size")]` on `ImportMeta` for backward compatibility

2306
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -2,103 +2,129 @@
name = "keep"
version = "0.1.0"
edition = "2024"
rust-version = "1.85"
description = "Keep and manage temporary files with automatic compression and metadata generation"
readme = "README.md"
license = "MIT"
repository = "https://gitea.gt0.ca/asp/keep"
keywords = ["cli", "files", "compression", "metadata"]
categories = ["command-line-utilities"]
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
anyhow = "1.0.72"
axum = { version = "0.8.4", optional = true }
anyhow = "1.0"
axum = { version = "0.8", optional = true }
derive_more = { version = "2.0", features = ["full"] }
smart-default = "0.7"
thiserror = "1.0"
base64 = "0.22.1"
chrono = { version = "0.4.26", features = ["serde"] }
clap = { version = "4.3.10", features = ["derive", "env"] }
config = "0.14.0"
thiserror = "2.0"
base64 = "0.22"
chrono = { version = "0.4", features = ["serde"] }
clap = { version = "4.6", features = ["derive", "env"] }
clap_complete = "4"
command-fds = "0.3"
config = "0.15"
ctor = "0.2"
directories = "6.0.0"
dns-lookup = "2.0.2"
enum-map = "2.6.1"
flate2 = { version = "1.0.27", features = ["zlib-ng-compat"], optional = true }
directories = "6.0"
dns-lookup = "3.0"
enum-map = "2.7"
flate2 = { version = "1.0", features = ["zlib-ng-compat"], optional = true }
futures = "0.3"
gethostname = "1.0.2"
humansize = "2.1.3"
gethostname = "1.0"
humansize = "2.1"
async-stream = "0.3"
hyper = { version = "1.0", features = ["full"] }
http-body-util = "0.1"
inventory = "0.3"
is-terminal = "0.4.9"
lazy_static = "1.4.0"
libc = "0.2.147"
local-ip-address = "0.6.5"
log = "0.4.19"
lz4_flex = { version = "0.11.1", optional = true }
magic = { version = "0.13.0", optional = true }
nix = "0.30.1"
once_cell = "1.19.0"
comfy-table = "7.2.0"
pwhash = "1.0.0"
regex = "1.9.5"
ringbuf = "0.3"
rmcp = { version = "0.2.0", features = ["server"], optional = true }
rusqlite = { version = "0.37.0", features = ["bundled", "array", "chrono"] }
rusqlite_migration = "2.3.0"
serde = { version = "1.0.219", features = ["derive"] }
serde_json = "1.0.142"
serde_yaml = "0.9.34"
sha2 = "0.10.0"
md5 = "0.7.0"
stderrlog = "0.6.0"
strum = { version = "0.27.2", features = ["derive"] }
term = "1.1.0"
is-terminal = "0.4"
libc = "0.2"
local-ip-address = "0.6"
log = "0.4"
lz4_flex = { version = "0.12", optional = true }
zstd = { version = "0.13", optional = true }
magic = { version = "0.13", optional = true }
infer = { version = "0.19", optional = true }
tree_magic_mini = { version = "3.2", optional = true }
nix = { version = "0.30", features = ["fs", "process"] }
comfy-table = "7.2"
pwhash = "1.0"
regex = { version = "1.10", optional = true }
ringbuf = "0.4"
rusqlite = { version = "0.37", features = ["bundled", "array", "chrono"] }
rusqlite_migration = "2.3"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_yaml = "0.9"
sha2 = "0.10"
md5 = "0.7"
subtle = "2.6"
env_logger = "0.11"
strfmt = "0.2"
strum = { version = "0.27", features = ["derive"] }
term = "1.2"
tokio = { version = "1.0", features = ["full"] }
tokio-stream = "0.1"
tokio-util = "0.7.16"
tower = { version = "0.5.2", optional = true }
tower-http = { version = "0.6.6", features = ["cors", "fs", "trace"], optional = true }
utoipa = { version = "5.4.0", features = ["axum_extras"], optional = true }
utoipa-swagger-ui = { version = "9.0.2", features = ["axum"], optional = true }
uzers = "0.12.1"
which = "8.0.0"
xdg = "2.5.2"
strip-ansi-escapes = "0.2.1"
pest = "2.8.1"
pest_derive = "2.8.1"
dirs = "6.0.0"
tokio-util = "0.7"
tower = { version = "0.5", optional = true }
tower-http = { version = "0.6", features = ["cors", "fs", "trace"], optional = true }
utoipa = { version = "5.4", features = ["axum_extras"], optional = true }
utoipa-swagger-ui = { version = "9.0", features = ["axum"], optional = true }
uzers = "0.12"
which = "8.0"
xdg = "2.5"
strip-ansi-escapes = "0.2"
tar = "0.4"
pest = "2.8"
pest_derive = "2.8"
dirs = "6.0"
similar = { version = "2.7", default-features = false, features = ["text"] }
html-escape = "0.2"
ureq = { version = "3", features = ["json"], optional = true }
os_pipe = { version = "1", optional = true }
axum-server = { version = "0.8", features = ["tls-rustls"], optional = true }
jsonwebtoken = { version = "10", optional = true, features = ["aws_lc_rs"] }
tiktoken-rs = { version = "0.9", optional = true }
tempfile = "3.3"
[features]
# Default features include core compression engines and swagger UI
default = ["magic", "lz4", "gzip"]
# Default features include core compression engines plugins that support MUSL
default = [
"client",
"gzip",
"filter_grep",
"meta_infer",
"lz4",
"meta_tokens",
"meta_tree_magic_mini",
"zstd"
]
# Full
#default = ["server", "magic", "lz4", "swagger"]
# Server feature (includes axum and related dependencies)
server = ["dep:axum", "dep:tower", "dep:tower-http", "dep:utoipa"]
# Server feature (includes axum and TLS/HTTPS via axum-server; rustls already available via client/ureq)
server = ["dep:axum", "dep:tower", "dep:tower-http", "dep:utoipa", "dep:jsonwebtoken", "dep:axum-server"]
# Compression features
gzip = ["flate2"]
lz4 = ["lz4_flex"]
bzip2 = []
xz = []
zstd = []
zstd = ["dep:zstd"]
# Plugin features (meta and filter)
all-meta-plugins = ["dep:magic"]
all-filter-plugins = []
# Meta plugin features
meta_magic = ["dep:magic"]
meta_infer = ["dep:infer"]
meta_tree_magic_mini = ["dep:tree_magic_mini"]
meta_tokens = ["dep:tiktoken-rs"]
meta_all = ["meta_magic", "meta_infer", "meta_tree_magic_mini", "meta_tokens"]
meta_all_musl = ["meta_infer", "meta_tree_magic_mini", "meta_tokens"]
# Individual plugin features
magic = ["dep:magic"]
# MCP feature (Model Context Protocol support)
mcp = ["dep:rmcp"]
# Filter plugin features
filter_grep = ["dep:regex"]
filter_all = ["filter_grep"]
filter_all_musl = ["filter_grep"]
# Swagger UI feature
swagger = ["dep:utoipa-swagger-ui"]
# Client feature (HTTP client for remote server)
client = ["dep:ureq", "dep:os_pipe"]
[dev-dependencies]
tempfile = "3.3.0"
rand = "0.8.5"
rand = "0.9"

125
DESIGN.md
View File

@@ -31,8 +31,9 @@
- `modes/info.rs` - Show detailed item information
- `modes/diff.rs` - Compare two items
- `modes/status.rs` - Show system status and capabilities
- `modes/server.rs` - REST HTTP server mode with OpenAPI documentation
- `modes/common.rs` - Shared utilities for all modes
- `modes/server.rs` - REST HTTP/HTTPS server mode with OpenAPI documentation
- `modes/client.rs` - Client mode for remote server (streaming save, local decompression)
- `modes/common.rs` - Shared utilities for all modes (OutputFormat, table creation, `print_serialized`, `build_path_table`, `ensure_default_tag`, `render_item_info_table`, `render_list_table_with_format`)
### Database Module
- `db.rs` - SQLite database operations
@@ -48,14 +49,31 @@
- `compression_engine/program.rs` - External program wrapper
### Meta Plugin Module
- `meta_plugin.rs` - Trait and type definitions
- `meta_plugin.rs` - Trait and type definitions, `SaveMetaFn` callback type
- `meta_plugin/program.rs` - External program wrapper
- `meta_plugin/digest.rs` - Internal digest implementations
- `meta_plugin/system.rs` - System information metadata plugins
**SaveMetaFn Architecture**: Meta plugins are decoupled from direct DB access via a `SaveMetaFn` callback (`Arc<Mutex<dyn FnMut(&str, &str) + Send>>`). The callback is injected at `MetaService` construction and propagated to all plugins via `BaseMetaPlugin`. This enables:
- **Local mode**: Callback collects metadata into a `Vec`, written to DB after plugins finish
- **Client mode**: Callback collects into a `HashMap`, sent to server after streaming completes
- **Server mode**: Callback collects into a `Vec`, written to DB after plugins finish (same as local)
### Common Modules
- `common/is_binary.rs` - Binary file detection utilities
- `common/status.rs` - Status information generation
- `common/mod.rs` - `PIPESIZE` constant (8192), `stream_copy()` streaming utility
### Client Module
- `client.rs` - HTTP client wrapper (ureq-based, supports streaming POST)
- `modes/client/save.rs` - 3-thread streaming save with local meta plugins (stdin → tee → compress → meta plugins → pipe → HTTP POST)
- `modes/client/get.rs` - Get with server-side raw fetch + local decompression
- `modes/client/list.rs` - List delegation to server
- `modes/client/info.rs` - Info delegation to server
- `modes/client/delete.rs` - Delete delegation to server
- `modes/client/diff.rs` - Diff delegation to server
- `modes/client/status.rs` - Status delegation to server
- `modes/client/update.rs` - Update delegation to server (sends plugin names/metadata/tags)
### Utility Modules
- `plugins.rs` - Shared plugin utilities
@@ -88,12 +106,18 @@
- `--quiet` - Do not show any messages
- `--output-format <table|json|yaml>` - Output format for info, status, and list modes
- `--server-password <PASSWORD>` - Password for server authentication
- `--server-cert <PATH>` - TLS certificate file (PEM) for HTTPS server
- `--server-key <PATH>` - TLS private key file (PEM) for HTTPS server
- `--force` - Force output even when binary data would be sent to a TTY
### Client Options (requires `client` feature)
- `--client-url <URL>` - Remote keep server URL
- `--client-password <PASSWORD>` - Remote server password
## Data Storage
### Database Schema
- `items` table: id (primary key), ts (timestamp), size (optional), compression
- `items` table: id (primary key), ts (timestamp), uncompressed_size (optional), compressed_size (optional), closed (boolean), compression
- `tags` table: id (foreign key to items), name (tag name)
- `metas` table: id (foreign key to items), name (meta key), value (meta value)
- Indexes on tag names and meta names for faster queries
@@ -107,17 +131,38 @@
### Status Operations
- `GET /api/status` - Get system status information
- `GET /api/plugins/status` - Get plugin status information
### Item Operations
- `GET /api/item/` - Get a list of items as JSON. Optional params: `order=newest|oldest`, `start=0`, `count=100`, `tags[]=tag1&tags[]=tag2`
- `POST /api/item/` - Add a new item
- `GET /api/item/` - Get a list of items as JSON. Optional params: `order=newest|oldest`, `start=0`, `count=100`, `tags=tag1,tag2`
- `POST /api/item/` - Add a new item (body: raw content, **streamed** through fixed-size 8192-byte buffers). Query params: `tags`, `metadata` (JSON), `compress=true|false`, `meta=true|false`
- `POST /api/item/<#>/meta` - Add metadata to an existing item (body: JSON object)
- `POST /api/item/<#>/update` - Re-run meta plugins on stored content. Query params: `plugins` (comma-separated), `metadata` (JSON), `tags` (comma-separated, idempotent)
- `DELETE /api/item/<#>` - Delete an item
- `GET /api/item/latest` - Return the latest item as JSON. Optional params: `tags[]=tag1&tags[]=tag2`, `allow_binary=true|false`
- `GET /api/item/latest/meta` - Return the latest item metadata as JSON. Optional params: `tags[]=tag1&tags[]=tag2`
- `GET /api/item/latest/content` - Return the raw content of the latest item. Optional params: `tags[]=tag1&tags[]=tag2`
- `GET /api/item/latest` - Return the latest item as JSON. Optional params: `tags=tag1,tag2`, `allow_binary=true|false`
- `GET /api/item/latest/meta` - Return the latest item metadata as JSON. Optional params: `tags=tag1,tag2`
- `GET /api/item/latest/content` - Return the raw content of the latest item (**streamed**). Optional params: `tags=tag1,tag2`, `decompress=true|false`
- `GET /api/item/<#>` - Return the item as JSON. Optional params: `allow_binary=true|false`
- `GET /api/item/<#>/meta` - Return the item metadata as JSON
- `GET /api/item/<#>/content` - Return the raw content of the item
- `GET /api/item/<#>/content` - Return the raw content of the item (**streamed**). Optional params: `decompress=true|false`
- `GET /api/diff` - Diff two items. Params: `id_a`, `id_b` (individual items capped at 10 MB)
### Server Configuration
- `max_body_size` - Maximum POST body size in bytes (default: unlimited). When exceeded, server returns `413 PAYLOAD_TOO_LARGE` while keeping the partial item already saved through the streaming pipeline. Set to `0` for unlimited.
### Server Modes
- **Plain HTTP** (default): `tokio::net::TcpListener` + `axum::serve()`
- **HTTPS** (with `tls` feature): `axum_server::bind_rustls()` with rustls when `--server-cert` and `--server-key` are provided
- Conditional selection at startup: cert+key present → HTTPS, otherwise → HTTP
### Client/Server Protocol
- Smart clients (keep CLI) set `compress=false` and `meta=false` on POST, handling compression and meta plugins locally
- Dumb clients (curl) use defaults (`compress=true`, `meta=true`), server handles everything
- Smart client update: sends `plugins` param to server, server runs plugins on stored content (avoids downloading compressed data)
- GET responses include `X-Keep-Compression` header when `decompress=false`
- Streaming save uses chunked transfer encoding for constant memory usage
- **Universal streaming**: All server paths (POST, GET, diff) use `PIPESIZE` (8192) byte buffers
- **413 partial item**: When `max_body_size` is exceeded, the server returns `413` but keeps the partial item already saved through the pipeline (nonfatal design — pipes continue normally)
### Authentication
- Bearer token authentication: `Authorization: Bearer <password>`
@@ -133,26 +178,25 @@
- None (no compression)
## Supported Meta Plugins
- FileMagic - File type detection using file command
- FileMime - MIME type detection using file command
- FileEncoding - File encoding detection using file command
- LineCount - Line count using wc command
- WordCount - Word count using wc command
- Cwd - Current working directory
- Binary - Binary file detection
- Uid - Current user ID
- User - Current username
- Gid - Current group ID
- Group - Current group name
- Shell - Shell path from SHELL environment variable
- ShellPid - Shell process ID from PPID environment variable
- KeepPid - Keep process ID
- DigestSha256 - SHA-256 digest
- DigestMd5 - MD5 digest using md5sum command
- ReadTime - Time taken to read data
- ReadRate - Rate of data reading
- Hostname - System hostname
- FullHostname - Fully qualified domain name
Meta plugins collect metadata during item save. Each plugin produces one or more key-value pairs:
- `magic_file` - File type detection using libmagic (when `magic` feature enabled)
- `infer` - MIME type detection using infer crate (when `infer` feature enabled)
- `tree_magic_mini` - MIME type detection using tree_magic_mini (when `tree_magic_mini` feature enabled)
- `tokens` - LLM token counting using tiktoken (when `tokens` feature enabled)
- `text` - Text analysis: line count, word count, char count, line average length
- `digest` - SHA-256 and MD5 checksums
- `hostname` - System hostname (full and short)
- `cwd` - Current working directory
- `user` - Current username and UID
- `shell` - Shell path from SHELL environment variable
- `shell_pid` - Shell process ID from PPID
- `keep_pid` - Keep process ID
- `env` - Arbitrary environment variables (via `KEEP_META_ENV_*` prefix)
- `exec` - Execute external commands for custom metadata
- `read_time` - Time taken to read content
- `read_rate` - Content read rate (bytes/second)
## Testing Strategy
- Unit tests for each module in `src/tests/`
@@ -173,5 +217,26 @@
- File permissions are restricted to user only (umask 077)
- Input validation for item IDs to prevent path traversal
- Authentication for server mode with bearer or basic auth
- TLS/HTTPS support via rustls when certificate and key are provided
- Proper resource cleanup using RAII patterns
- Safe handling of external processes with proper stdin/stdout management
- **Streaming architecture**: All server I/O uses fixed-size 8192-byte buffers; no full file contents held in memory
- **XSS protection**: All user-controlled data in HTML pages is escaped via `html-escape`
- **Security headers**: `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin-when-cross-origin`
- **CORS**: Explicit allowed headers only (`Content-Type`, `Authorization`, `Accept`); no wildcard headers
- **Input limits**: Tags (256 chars), metadata keys (128 chars), metadata values (4096 chars), pagination (10,000 max)
- **Config file size**: 4 KB cap with `from_utf8_lossy` for safe UTF-8 handling
- **Error sanitization**: Internal errors never exposed in HTML responses
- **No `unsafe_code`**: Enforced via `#![deny(unsafe_code)]` (exceptions: `libc::umask` in main.rs, `unsafe impl Send` for `SendCookie` in magic_file.rs)
## Feature Flags
- `server` - HTTP REST API server (axum-based)
- `tls` - HTTPS/TLS support for server (axum-server + rustls)
- `client` - HTTP client for remote server (ureq-based, includes streaming save)
- `swagger` - OpenAPI/Swagger UI documentation
- `magic` - File type detection via libmagic
- `lz4` - LZ4 compression (internal)
- `gzip` - GZip compression (internal)
- `bzip2` - BZip2 compression (external)
- `xz` - XZ compression (external)
- `zstd` - ZStd compression (external)

67
Dockerfile Normal file
View File

@@ -0,0 +1,67 @@
# Build stage
FROM rust:1.88-slim AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
cmake \
curl \
make \
gcc \
musl-tools \
pkg-config \
&& rm -rf /var/lib/apt/lists/*
RUN rustup target add x86_64-unknown-linux-musl
WORKDIR /app
# Copy manifests and fetch dependencies (cached layer)
COPY Cargo.toml Cargo.lock ./
RUN mkdir src && echo 'fn main() {}' > src/main.rs && echo '' > src/lib.rs
RUN cargo fetch --target x86_64-unknown-linux-musl
# Copy real source and build static binary
# magic feature excluded (requires shared libmagic; fallback uses `file` command)
COPY src/ src/
RUN cargo build --release --target x86_64-unknown-linux-musl \
--no-default-features --features lz4,gzip,server,swagger,client,tls \
&& strip target/x86_64-unknown-linux-musl/release/keep
# Runtime stage - scratch since binary is fully static
FROM scratch
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/keep /keep
COPY --from=builder /etc/ssl/certs/ /etc/ssl/certs/
EXPOSE 21080
# General options
# ENV KEEP_CONFIG=/config/config.yml
# Mount a volume for persistent storage: -v keep-data:/data
ENV KEEP_DIR=/data
ENV KEEP_LIST_FORMAT="id,time,size,tags,meta:hostname"
# Item options
# ENV KEEP_COMPRESSION=lz4
# ENV KEEP_META_PLUGINS=""
# ENV KEEP_FILTERS=""
# Server options
ENV KEEP_SERVER_ADDRESS=0.0.0.0
ENV KEEP_SERVER_PORT=21080
# ENV KEEP_SERVER_USERNAME="keep"
# ENV KEEP_SERVER_PASSWORD=""
# ENV KEEP_SERVER_PASSWORD_HASH=""
# ENV KEEP_SERVER_JWT_SECRET=""
# ENV KEEP_SERVER_JWT_SECRET_FILE=/config/jwt_secret
# TLS options
# ENV KEEP_SERVER_CERT=/certs/cert.pem
# ENV KEEP_SERVER_KEY=/certs/key.pem
# Client options
# ENV KEEP_CLIENT_URL=""
# ENV KEEP_CLIENT_USERNAME="keep"
# ENV KEEP_CLIENT_PASSWORD=""
# ENV KEEP_CLIENT_JWT=""
ENTRYPOINT ["/keep", "--server"]

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Andrew Phillips
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

959
README.md
View File

@@ -1,16 +1,957 @@
# Keep - Temporary File Management with Compression and Metadata
# Keep
Keep is a command-line tool for managing temporary files with automatic compression, metadata generation, and querying capabilities. It supports various compression algorithms and metadata plugins for rich item inspection.
A command-line utility for storing and retrieving temporary data with automatic compression, metadata extraction, and querying. Pipe any output into `keep` for organized storage — no more losing data in `/tmp` files with cryptic names.
```sh
# Instead of this:
curl -s https://api.example.com/data > /tmp/api-data.json
# Do this:
curl -s https://api.example.com/data | keep --save api-data
keep --get api-data
```
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Usage](#usage)
- [Save Mode](#save-mode)
- [Get Mode](#get-mode)
- [List Mode](#list-mode)
- [Info Mode](#info-mode)
- [Update Mode](#update-mode)
- [Delete Mode](#delete-mode)
- [Diff Mode](#diff-mode)
- [Status Mode](#status-mode)
- [Filters](#filters)
- [Compression](#compression)
- [Meta Plugins](#meta-plugins)
- [Configuration](#configuration)
- [Client/Server Mode](#clientserver-mode)
- [Server Mode](#server-mode)
- [Client Mode](#client-mode)
- [API Endpoints](#api-endpoints)
- [Shell Integration](#shell-integration)
- [Feature Flags](#feature-flags)
- [License](#license)
## Features
- **Store and Retrieve**: Save content with automatic compression and retrieve by ID or tags.
- **Compression Support**: Built-in support for LZ4, GZip, and more via external programs (BZip2, XZ, ZStd).
- **Metadata Plugins**: Automatic extraction of file type, digests, hostname, user info, and custom metadata.
- **Filtering**: Apply filters (head, tail, grep, etc.) when retrieving content.
- **Querying**: List, search, and diff items with flexible formatting.
- **REST API Server**: Optional HTTP server for programmatic access.
- **Modular Design**: Extensible via plugins for compression, metadata, and filtering.
- **Store and retrieve** Save content with tags, retrieve by ID or tag
- **Automatic compression** — LZ4, GZip, BZip2, XZ, ZStd support
- **Metadata plugins** Auto-extract file type, digests, hostname, user info, and more
- **Filters** — Apply transformations (head, tail, grep, strip ANSI) on retrieval
- **Querying** List, search, diff items with flexible formatting
- **Client/server architecture** — Optional HTTP server with streaming support
- **Modular design** Extensible plugin system for compression, metadata, and filtering
## Installation
### From Source
Requires Rust and Cargo.
```sh
cargo build --release
```
### Install via Cargo
```sh
cargo install --path .
```
### Static Binary (Linux)
```sh
./build-static.bash
# Binary at bin/keep
```
### Environment Module
A TCL modulefile is provided at `modulefile`. To use it, copy or symlink the project directory into your modules path:
```sh
# Symlink into an existing module path (e.g., /usr/local/modules)
ln -s /path/to/keep /usr/local/modules/keep
# Load the module
module load keep
# Verify
keep --status
# Source the shell profile (optional, for shell integration)
source $KEEP_BASH_PROFILE # bash
source $KEEP_ZSH_PROFILE # zsh
source $KEEP_SH_PROFILE # sh/dash/ksh
source $KEEP_CSH_PROFILE # csh/tcsh
```
The modulefile prepends `keep/bin` to `PATH` and sets shell-specific profile variables:
| Variable | Profile | Shell |
|----------|---------|-------|
| `KEEP_BASH_PROFILE` | `profile.bash` | bash |
| `KEEP_ZSH_PROFILE` | `profile.zsh` | zsh |
| `KEEP_SH_PROFILE` | `profile.sh` | sh, dash, ksh93, pdksh, mksh |
| `KEEP_CSH_PROFILE` | `profile.csh` | csh, tcsh |
### Shell Completion
Tab completion is available for `bash`, `zsh`, `fish`, `elvish`, and `powershell`. Completions for `@` (save) and `@@` (get) are available for `bash` and `zsh` only.
**Bash** — add to `~/.bashrc`:
```sh
. <(keep --generate-completion bash)
```
**Zsh** — add to `~/.zshrc`:
```sh
. <(keep --generate-completion zsh)
```
**With `profile.bash` or `profile.zsh`**: Completions for `keep`, `@` (save), and `@@` (get) are loaded automatically when sourcing the profile.
### Build with Server/Client Features
```sh
# Server only
cargo build --release --features server
# Client only (for connecting to a remote keep server)
cargo build --release --features client
# Server + client + all optional features
cargo build --release --features server,client,swagger
```
## Quick Start
```sh
# Save content with a tag (--save is optional when piping)
echo "Hello, world!" | keep greeting
# Retrieve by ID (--get is optional for numeric IDs)
keep 1
# Retrieve by tag (--get is required for tags)
keep --get greeting
# List all stored items
keep --list
# Get item details
keep --info greeting
# Delete by ID
keep --delete 1
```
### Real-World Examples
```sh
# Save API response
curl -s https://api.github.com/repos/user/repo | keep --save repo-info
# Save test output with metadata
npm test 2>&1 | keep --save test-results --meta project=myapp --meta env=staging
# Chain commands: process and store
cat data.csv | sort | uniq | keep --save cleaned-data
# Diff two versions
keep --diff 1 5
# Get first 20 lines of an item
keep --get 1 --filters "head_lines(20)"
# List items from a specific project
keep --list --meta project=myapp
```
## Usage
### Save Mode
Save stdin content with tags and metadata. The `--save` flag is optional when piping content.
```sh
# Save (auto-assigned ID, no tag)
echo "data" | keep --save
# Save with a tag (--save is optional when piping)
echo "data" | keep --save my-tag
echo "data" | keep my-tag
# Save with multiple tags and metadata
cat report.pdf | keep --save report --meta project=alpha --meta env=prod
# Specify compression
echo "data" | keep --save my-tag --compression gzip
```
Tags and metadata make items easy to find later. Tags are simple identifiers; metadata is key-value pairs.
### Get Mode
Retrieve items by ID. This is the default mode when numeric IDs are provided.
```sh
# Get by ID (no --get needed for numeric IDs)
keep --get 1
keep 1
# Get by tag (requires --get flag)
keep --get my-tag
# Get with filters applied
keep --get 1 --filters "head_lines(10)"
# Get by metadata filter
keep --get --meta project=alpha
# Force binary output to TTY (override safety check)
keep --get 1 --force
```
### List Mode
List stored items with filtering and formatting.
```sh
# List all items
keep --list
# List by tag
keep --list my-tag
# Filter by metadata
keep --list --meta env=prod
# Custom column format
keep --list --list-format "id,time,size,tags"
# JSON output for scripting
keep --list --output-format json
# Human-readable file sizes
keep --list --human-readable
```
### Info Mode
Show detailed information about an item.
```sh
keep --info 1
keep --info my-tag
keep --info --meta key=value
```
### Update Mode
Update an item's tags, metadata, and re-run meta plugins.
```sh
# Replace tags
keep --update 1 new-tag
# Update metadata
keep --update 1 --meta key=newvalue
# Remove a metadata key
keep --update 1 --meta key
# Re-run meta plugins on stored content
keep --update 1 --meta-plugin digest --meta-plugin text
```
### Delete Mode
Delete items by ID.
```sh
keep --delete 1
keep --delete 1 2 3
```
### Diff Mode
Show differences between two items.
```sh
keep --diff 1 2
```
### Status Mode
Show system status and supported features.
```sh
keep --status
keep --status-plugins
keep --status --verbose
```
## Filters
Apply transformations to item content during retrieval. Filters are chained with `|`.
```sh
# First 10 lines
keep --get 1 --filters "head_lines(10)"
# Skip first 5 lines, then grep for errors
keep --get 1 --filters "skip_lines(5)|grep(pattern=error)"
# Strip ANSI escape codes
keep --get 1 --filters "strip_ansi"
# Last 100 bytes
keep --get 1 --filters "tail_bytes(100)"
# Complex chain
keep --get 1 --filters "skip_lines(10)|grep(pattern=TODO)|head_lines(5)"
```
### Available Filters
| Filter | Description | Parameters |
|--------|-------------|------------|
| `head_bytes(n)` | First n bytes | `count` |
| `head_lines(n)` | First n lines | `count` |
| `tail_bytes(n)` | Last n bytes | `count` |
| `tail_lines(n)` | Last n lines | `count` |
| `skip_bytes(n)` | Skip first n bytes | `count` |
| `skip_lines(n)` | Skip first n lines | `count` |
| `grep(pattern)` | Filter matching lines | `pattern` (regex) |
| `strip_ansi` | Remove ANSI escape codes | none |
Set `KEEP_FILTERS` to apply a default filter chain to all retrievals.
## Compression
Items are compressed automatically on save. Default: LZ4.
| Algorithm | Type | Speed | Ratio |
|-----------|------|-------|-------|
| `lz4` | Internal | Fastest | Lower |
| `gzip` | Internal | Fast | Good |
| `bzip2` | External | Slow | Better |
| `xz` | External | Slowest | Best |
| `zstd` | Internal | Fast | Good |
| `raw` | Internal | N/A | N/A |
```sh
# Specify compression per item
echo "data" | keep --save my-tag --compression zstd
# Set default via environment
export KEEP_COMPRESSION=gzip
```
External compression programs (`bzip2`, `xz`, `zstd`) must be installed on the system.
## Meta Plugins
Metadata is automatically extracted when saving items.
| Plugin | Key | Description |
|--------|-----|-------------|
| `env` | `*` | Capture `KEEP_META_*` environment variables |
| `magic_file` | `file_type` | File type detection (requires `magic` feature) |
| `text` | `text_line_count`, `text_word_count` | Line and word counts |
| `user` | `user_uid`, `user_name`, `user_gid`, `user_group` | Current user info |
| `shell` | `shell` | Current shell path |
| `shell_pid` | `shell_pid` | Shell process ID |
| `keep_pid` | `keep_pid` | Keep process ID |
| `digest` | `digest_sha256`, `digest_md5` | Content digests |
| `read_time` | `read_time` | Time to read content |
| `read_rate` | `read_rate` | Data read rate |
| `hostname` | `hostname`, `hostname_short` | System hostname |
| `exec` | Custom | Run external commands for metadata |
| `cwd` | `cwd` | Current working directory |
```sh
# Use specific plugins (repeatable)
echo "data" | keep --save tag --meta-plugin digest --meta-plugin text --meta-plugin user
# Pass options to a plugin via JSON
echo "data" | keep --save tag --meta-plugin 'tokens:{"options":{"min_length":"2"}}'
# Capture custom metadata via environment
KEEP_META_project=alpha echo "data" | keep --save tag
# Combine environment and CLI metadata
KEEP_META_build=1234 echo "data" | keep --save tag --meta env=staging
```
## Configuration
### Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `KEEP_DIR` | Storage directory | `~/.keep` |
| `KEEP_CONFIG` | Config file path | `~/.config/keep/config.yml` |
| `KEEP_COMPRESSION` | Compression algorithm | `lz4` |
| `KEEP_META_PLUGINS` | Meta plugins to use (JSON format: `name[:{json}]`, comma-separated) | `env` |
| `KEEP_FILTERS` | Default filter chain | none |
| `KEEP_LIST_FORMAT` | List column format | built-in defaults |
| `KEEP_SERVER_ADDRESS` | Server bind address | `127.0.0.1` |
| `KEEP_SERVER_PORT` | Server port | `21080` |
| `KEEP_SERVER_USERNAME` | Server Basic auth username | `keep` |
| `KEEP_SERVER_PASSWORD` | Server password | none |
| `KEEP_SERVER_PASSWORD_HASH` | Server password hash | none |
| `KEEP_SERVER_JWT_SECRET` | JWT secret for token auth | none |
| `KEEP_SERVER_JWT_SECRET_FILE` | Path to JWT secret file | none |
| `KEEP_SERVER_MAX_BODY_SIZE` | Maximum POST body size in bytes (0=unlimited) | unlimited |
| `KEEP_SERVER_CERT` | TLS certificate file path (PEM) | none |
| `KEEP_SERVER_KEY` | TLS private key file path (PEM) | none |
| `KEEP_CLIENT_URL` | Remote keep server URL | none |
| `KEEP_CLIENT_USERNAME` | Remote server username | `keep` |
| `KEEP_CLIENT_PASSWORD` | Remote server password | none |
| `KEEP_CLIENT_JWT` | JWT token for remote server | none |
Any config setting can be overridden with `KEEP__<SETTING>` environment variables (double underscore separator).
### Configuration File
Default location: `~/.config/keep/config.yml`
Generate a default configuration:
```sh
keep --generate-config > ~/.config/keep/config.yml
```
```yaml
# Storage directory
dir: ~/.keep
# List view columns
list_format:
- name: id
label: "Item"
align: right
- name: time
label: "Time"
align: right
- name: size
label: "Size"
align: right
- name: tags
label: "Tags"
align: left
# Table styling
table_config:
style: utf8_full
content_arrangement: dynamic
# Default compression
compression_plugin:
name: gzip
# Default meta plugins
meta_plugins:
- name: env
- name: digest
options:
algorithm: sha256
# Server settings
server:
address: "127.0.0.1"
port: 21080
username: "keep"
password: "secret"
# Maximum POST body size in bytes (0 = unlimited)
# max_body_size: 52428800 # 50 MB
# JWT authentication (takes priority over password)
# jwt_secret: "my-secret-key"
# jwt_secret_file: /path/to/jwt_secret
# TLS (requires tls feature)
# cert_file: /path/to/cert.pem
# key_file: /path/to/key.pem
# Client settings
client:
url: "http://localhost:21080"
username: "keep"
password: "secret"
# Or use JWT token
# jwt: "eyJhbGciOiJIUzI1NiIs..."
human_readable: true
quiet: false
force: false
```
## Client/Server Mode
Keep supports a client/server architecture where one machine runs a keep server and other machines connect as clients. This is useful for:
- Centralizing stored data across multiple machines
- Sharing items between team members
- Offloading storage to a dedicated server
- Piping data from long-running processes without local storage
### Server Mode
Start an HTTP REST API server:
```sh
# Default: 127.0.0.1:21080
keep --server
# Custom address and port
keep --server --server-address 0.0.0.0 --server-port 8080
# With password authentication
keep --server --server-password mypassword
# With custom username
keep --server --server-username admin --server-password mypassword
# With JWT authentication
keep --server --server-jwt-secret my-secret-key
```
#### JWT Authentication
JWT (JSON Web Token) authentication provides permission-based access control. When a JWT secret is configured, the server validates tokens and checks permission claims for each request.
**Configuration:**
```sh
# Via CLI flag
keep --server --server-jwt-secret my-secret-key
# Via environment variable
export KEEP_SERVER_JWT_SECRET=my-secret-key
keep --server
# Via config file (config.yml)
server:
jwt_secret: "my-secret-key"
# Via secret file (for Docker/secrets management)
keep --server --server-jwt-secret-file /path/to/secret
```
**Token format:**
JWTs must use HS256 algorithm with the following claims:
| Claim | Type | Required | Description |
|-------|------|----------|-------------|
| `sub` | string | Yes | Subject (client identifier) |
| `exp` | number | Yes | Expiration time (Unix timestamp) |
| `read` | boolean | No | Permission for GET requests (default: false) |
| `write` | boolean | No | Permission for POST/PUT requests (default: false) |
| `delete` | boolean | No | Permission for DELETE requests (default: false) |
**Permission mapping:**
| HTTP Method | Required Permission |
|-------------|-------------------|
| `GET` | `read` |
| `POST`, `PUT`, `PATCH` | `write` |
| `DELETE` | `delete` |
**Example token payload:**
```json
{
"sub": "ci-pipeline",
"exp": 1735689600,
"read": true,
"write": true,
"delete": false
}
```
**Generating tokens:**
The server does not generate tokens — use any JWT library or tool:
```sh
# Using jwt-cli (https://github.com/mike-engel/jwt-cli)
jwt encode --secret my-secret-key \
--exp=$(date -d '+24 hours' +%s) \
'{"sub":"my-client","read":true,"write":true,"delete":false}'
# Using Python
python3 -c "
import jwt, time
token = jwt.encode({
'sub': 'my-client',
'exp': int(time.time()) + 86400,
'read': True, 'write': True, 'delete': False
}, 'my-secret-key', algorithm='HS256')
print(token)
"
```
**Using tokens:**
```sh
# With curl
curl -H "Authorization: Bearer <jwt-token>" http://localhost:21080/api/item/
# The keep client uses --client-jwt for JWT tokens
keep --client-url http://server:21080 --client-jwt <jwt-token> --save my-tag
```
**Response codes:**
| Code | Meaning |
|------|---------|
| `200` | Authorized |
| `401` | Missing, invalid, or expired token |
| `403` | Valid token but insufficient permissions |
**Notes:**
- When `jwt_secret` is set, password authentication is disabled — all requests must present a valid JWT Bearer token
- JWT and password authentication are mutually exclusive — when both `jwt_secret` and `password` are configured, only JWT is used
- Permission fields default to `false` if omitted — tokens must explicitly grant permissions
- JWT authentication requires the `server` feature (jsonwebtoken is included automatically)
#### HTTPS / TLS
Build with the `tls` feature to enable HTTPS:
```sh
cargo build --release --features server,tls
```
Provide a TLS certificate and private key (both PEM format):
```sh
# Via CLI flags
keep --server \
--server-cert /path/to/cert.pem \
--server-key /path/to/key.pem
# Via environment variables
export KEEP_SERVER_CERT=/path/to/cert.pem
export KEEP_SERVER_KEY=/path/to/key.pem
keep --server
# Via config file (config.yml)
server:
cert_file: /path/to/cert.pem
key_file: /path/to/key.pem
```
When cert and key are provided, the server listens with HTTPS. Without them, it falls back to plain HTTP. The port is controlled by `--server-port` (default: 21080).
**Self-signed certificates** (for development):
```sh
# Generate a self-signed cert
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem \
-days 365 -nodes -subj "/CN=localhost"
# Start server with self-signed cert
keep --server --server-cert cert.pem --server-key key.pem
# Connect client with HTTPS
keep --client-url https://localhost:21080 --save my-tag
```
The server accepts data from both dumb clients (raw HTTP/curl) and smart clients (the keep CLI).
#### Server Streaming
The server streams all data through fixed-size buffers (8192 bytes). At no point is the entire file content held in memory.
- **POST**: Body streams through the compression and storage pipeline in chunks. When `max_body_size` is exceeded, the server returns `413 PAYLOAD_TOO_LARGE` while keeping the partial item already saved through the pipeline.
- **GET**: Content streams from disk through decompression to the client using the same fixed-size buffers.
- **Diff**: Individual items are capped at 10 MB for the diff endpoint to prevent unbounded memory use.
##### Max Body Size
Control the maximum accepted body size with:
```sh
# Via CLI flag (bytes)
keep --server --server-max-body-size 52428800
# Via environment variable
export KEEP_SERVER__MAX_BODY_SIZE=52428800
keep --server
# Via config file (config.yml)
server:
max_body_size: 52428800 # 50 MB
```
When set to `0` or omitted, no limit is enforced.
#### Server Query Parameters
The server supports query parameters that control processing:
| Parameter | Default | Description |
|-----------|---------|-------------|
| `tags` | none | Comma-separated tags |
| `metadata` | none | JSON-encoded metadata |
| `compress` | `true` | `false` = client already compressed, store as-is |
| `meta` | `true` | `false` = client handles metadata, skip server-side plugins |
| `decompress` | `true` | `false` = return raw compressed bytes on GET |
The `POST /api/item/{id}/update` endpoint accepts additional parameters:
| Parameter | Default | Description |
|-----------|---------|-------------|
| `plugins` | none | Comma-separated plugin names to re-run on stored content |
| `metadata` | none | JSON-encoded metadata overrides to apply |
| `tags` | none | Comma-separated tags to add (idempotent) |
When using a smart client, these are set automatically. For curl, the server handles everything by default.
#### Example: Curl as a Dumb Client
```sh
# Save (server handles compression and metadata)
curl -X POST -d "my data" http://localhost:21080/api/item/?tags=my-tag
# Retrieve (server decompresses)
curl http://localhost:21080/api/item/1/content
# Save compressed (client handles compression, server skips)
gzip -c data.txt | curl -X POST -d @- "http://localhost:21080/api/item/?compress=false&tags=my-tag"
```
### Client Mode
The keep CLI can connect to a remote server as a smart client. Build with the `client` feature:
```sh
cargo build --release --features client
```
```sh
# Set server URL via flag or environment
keep --client-url http://server:21080 --save my-tag
export KEEP_CLIENT_URL=http://server:21080
# With password authentication
keep --client-url http://server:21080 --client-password mypassword --save my-tag
export KEEP_CLIENT_PASSWORD=mypassword
# With custom username
keep --client-url http://server:21080 --client-username admin --client-password mypassword --save my-tag
# With JWT authentication
keep --client-url http://server:21080 --client-jwt <jwt-token> --save my-tag
export KEEP_CLIENT_JWT=<jwt-token>
```
#### How Client Mode Works
Client mode uses **local plugins** and **remote storage**:
1. **Save**: Local compression and meta plugins run on the client; compressed data streams to the server. Smart clients set `meta=false` so the server skips its own plugins.
2. **Get**: Server sends raw compressed data; client decompresses locally and applies filters
3. **Update**: Meta plugins run on the server to avoid downloading compressed data for re-processing
4. **Other operations** (list, info, delete, diff): Delegated directly to the server
This means client behavior is consistent with local mode — the same compression settings and filters apply.
#### Streaming Architecture
Client save uses a 3-thread streaming pipeline for constant memory usage regardless of data size:
```
┌───────────────────┐ OS pipe ┌────────────────┐
│ Reader thread ├──────────────────┤ Streamer thread│
│ │ (compressed │ │
│ stdin → tee │ bytes) │ pipe → POST │
│ → hash │ │ (chunked) │
│ → compress │ │ │
│ → meta plugins │ │ │
└───────────────────┘ └────────────────┘
│ │
▼ ▼
stdout + Server stores blob
computed metadata
```
- **Reader thread**: Reads stdin, tees output to stdout, computes SHA-256 via digest plugin, compresses data, runs meta plugins (hostname, text, etc.), writes to OS pipe
- **Streamer thread**: Reads compressed bytes from pipe, streams to server via chunked HTTP POST
- **Main thread**: After streaming completes, sends plugin-collected metadata to server
Memory usage is O(PIPESIZE) — typically 8 KB — regardless of how much data is being stored.
#### Example: Remote Pipeline
```sh
# On a build server, pipe logs to a central keep server
make build 2>&1 | keep --client-url http://logserver:21080 \
--save build-logs \
--meta project=myapp \
--meta branch=$(git branch --show-current)
# Retrieve from any machine
keep --client-url http://logserver:21080 --get build-logs
# List recent builds from a specific project
keep --client-url http://logserver:21080 --list --meta project=myapp
```
### API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/status` | System status |
| `GET` | `/api/plugins/status` | Plugin status |
| `GET` | `/api/item/` | List items (`tags`, `order`, `start`, `count` params) |
| `POST` | `/api/item/` | Create item (body: raw content, params: `tags`, `metadata`, `compress`, `meta`) |
| `GET` | `/api/item/latest/content` | Latest item content |
| `GET` | `/api/item/latest/meta` | Latest item metadata |
| `GET` | `/api/item/{id}` | Item info by ID |
| `GET` | `/api/item/{id}/content` | Item content by ID |
| `GET` | `/api/item/{id}/meta` | Item metadata by ID |
| `GET` | `/api/item/{id}/info` | Item info by ID |
| `POST` | `/api/item/{id}/meta` | Add metadata to existing item (body: JSON object) |
| `POST` | `/api/item/{id}/update` | Re-run meta plugins on stored content (params: `plugins`, `metadata`, `tags`) |
| `DELETE` | `/api/item/{id}` | Delete item by ID |
| `GET` | `/api/diff` | Diff two items (`id_a`, `id_b` params) |
#### Authentication
The server supports three authentication modes:
**1. Password (HTTP Basic auth):**
```sh
# Default username is "keep"
curl -u keep:mypassword http://localhost:21080/api/status
# Custom username
curl -u admin:mypassword http://localhost:21080/api/status
```
**2. JWT (permission-based):**
```sh
# Valid JWT with read permission allows GET requests
curl -H "Authorization: Bearer <jwt-token>" http://localhost:21080/api/item/
```
See [JWT Authentication](#jwt-authentication) for token format and configuration.
**3. No authentication:**
When neither password nor JWT secret is configured, authentication is disabled.
#### Swagger UI
Build with the `swagger` feature to enable OpenAPI documentation:
```sh
cargo build --features server,swagger
```
Swagger UI available at `/swagger`, OpenAPI spec at `/openapi.json`.
#### Security
The server applies the following security measures:
- **Input validation**: Item IDs are validated as positive integers; tags and metadata have length limits (256 and 128 characters respectively).
- **XSS protection**: All user-controlled data rendered into HTML pages is escaped.
- **Security headers**: Responses include `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, and `Referrer-Policy: strict-origin-when-cross-origin`.
- **CORS**: Explicit allowed headers (`Content-Type`, `Authorization`, `Accept`); no wildcard headers.
- **Path traversal**: Item IDs are validated to prevent directory traversal attacks.
- **Internal errors**: Internal error details are never exposed in HTML responses — only generic messages are shown.
## Shell Integration
Profile scripts are provided for several shells. Source the appropriate one to enable shell integration:
| Profile | Shells | Features |
|---------|--------|----------|
| `profile.bash` | bash | Preexec hook, wrapper function, `@`/`@@` aliases, tab completions |
| `profile.zsh` | zsh | Preexec hook, wrapper function, `@`/`@@` aliases, tab completions |
| `profile.sh` | sh, dash, ksh93, pdksh, mksh | Wrapper function, `@`/`@@` aliases |
| `profile.csh` | csh, tcsh | Alias-based `keep` wrapper, `@`/`@@` aliases |
```sh
# bash
source /path/to/keep/profile.bash
# zsh
source /path/to/keep/profile.zsh
# sh, dash, ksh
source /path/to/keep/profile.sh
# csh/tcsh
source /path/to/keep/profile.csh
```
All profiles provide:
- **`@` alias** — Shorthand for `keep --save`
- **`@@` alias** — Shorthand for `keep --get`
Bash and zsh profiles additionally provide:
- **`keep` function** — Captures the current command in metadata automatically
- **Tab completion** — For `keep`, `@`, and `@@`
```sh
# Save with automatic command capture (bash/zsh)
curl -s api.example.com | @ api-response
# Quick retrieve
@@ api-response
```
## Feature Flags
| Feature | Default | Description |
|---------|---------|-------------|
| `magic` | Yes | File type detection via libmagic |
| `lz4` | Yes | LZ4 compression (internal) |
| `gzip` | Yes | GZip compression (internal) |
| `server` | No | HTTP REST API server |
| `tls` | No | HTTPS/TLS server support (requires `server`) |
| `client` | No | HTTP client for remote server |
| `swagger` | No | Swagger UI for API docs |
| `bzip2` | No | BZip2 compression (external program) |
| `xz` | No | XZ compression (external program) |
| `zstd` | No | ZStd compression (external program) |
```sh
# Server with Swagger UI
cargo build --features server,swagger
# Server with HTTPS
cargo build --features server,tls
# Client only
cargo build --features client
# Everything
cargo build --features server,tls,client,swagger,magic
```
## License
MIT License - see [LICENSE](LICENSE) for details.
## Contact
Andrew Phillips - andrew@gt0.ca

View File

@@ -1,141 +0,0 @@
#+TITLE: Keep
#+AUTHOR: Andrew Phillips
* Introduction
Keep is a command-line utility designed to manage temporary files created on the command line. Instead of redirecting output to a temporary file (e.g., =command > ~/whatever.tmp=), you can use =keep= to handle the temporary files for you (e.g., =command | keep=).
* Installation
To install Keep, you need to have Rust and Cargo installed on your system. You can then build and install Keep using the following commands:
#+BEGIN_SRC sh
cargo build --release
cargo install --path .
#+END_SRC
* Usage
Keep provides several subcommands to manage temporary files. Below are some examples of how to use Keep.
** Saving an Item
To save an item with tags and metadata, you can use the =--save= option:
#+BEGIN_SRC sh
echo "Hello, world!" | keep --save example --meta key=value
#+END_SRC
** Getting an Item
To retrieve an item by its ID or by matching tags and metadata, you can use the =--get= option:
#+BEGIN_SRC sh
keep --get 1
keep --get example
keep --get --meta key=value
keep 1
keep example
#+END_SRC
** Listing Items
To list all items or filter them by tags and metadata, you can use the =--list= option:
#+BEGIN_SRC sh
keep --list
keep --list example
keep --list --meta key=value
#+END_SRC
** Updating an Item
To update an item's tags and metadata, you can use the =--update= option:
#+BEGIN_SRC sh
keep --update 1 newtag --meta key=newvalue
#+END_SRC
** Deleting an Item
To delete an item by its ID or by matching tags, you can use the =--delete= option:
#+BEGIN_SRC sh
keep --delete 1
keep --delete example
#+END_SRC
** Showing Status
To show the status of directories and supported compression algorithms, you can use the =--status= option:
#+BEGIN_SRC sh
keep --status
#+END_SRC
** Diffing Items
To show a diff between two items by ID, you can use the =--diff= option:
#+BEGIN_SRC sh
keep --diff 1 2
#+END_SRC
** Getting Information About an Item
To get detailed information about an item by its ID or by matching tags and metadata, you can use the =--info= option:
#+BEGIN_SRC sh
keep --info 1
keep --info example
keep --info --meta key=value
#+END_SRC
* Configuration
Keep can be configured using environment variables and command-line options. The following environment variables are supported:
- =KEEP_DIR=: Specify the directory to use for storage.
- =KEEP_LIST_FORMAT=: A comma-separated list of columns to display with =--list=.
- =KEEP_DIGEST=: Digest algorithm to use when saving items.
- =KEEP_COMPRESSION=: Compression algorithm to use when saving items.
* Examples
Here are some examples of how to use Keep with different options:
** Saving an Item with Compression and Digest
#+BEGIN_SRC sh
echo "Hello, world!" | keep --save example --meta key=value --compression gzip --digest sha256
#+END_SRC
** Getting an Item with Human-Readable Sizes
#+BEGIN_SRC sh
keep --get 1 --human-readable
#+END_SRC
** Listing Items with Custom Format
#+BEGIN_SRC sh
keep --list --list-format "id,time,size,tags,meta:hostname"
#+END_SRC
** Updating an Item with New Tags and Metadata
#+BEGIN_SRC sh
keep --update 1 newtag --meta key=newvalue
#+END_SRC
** Deleting an Item by Tag
#+BEGIN_SRC sh
keep --delete example
#+END_SRC
** Showing Status with Verbose Output
#+BEGIN_SRC sh
keep --status --verbose
#+END_SRC
** Diffing Items with IDs
#+BEGIN_SRC sh
keep --diff 1 2
#+END_SRC
** Getting Information About an Item with Metadata
#+BEGIN_SRC sh
keep --info 1
#+END_SRC
* License
Keep is licensed under the MIT License. See the LICENSE file for more details.
* Contributing
Contributions are welcome! Please open an issue or submit a pull request on the GitHub repository.
* Contact
For more information, please contact Andrew Phillips at andrew@gt0.ca.

View File

@@ -2,7 +2,6 @@
set -ex
export RUSTFLAGS='-C target-feature=+crt-static'
cargo build --release --target x86_64-unknown-linux-gnu
cargo build --release --target x86_64-unknown-linux-musl
mkdir -p bin
cp target/x86_64-unknown-linux-gnu/release/keep ./bin/
cp target/x86_64-unknown-linux-musl/release/keep ./bin/

32
docker-compose.yml Normal file
View File

@@ -0,0 +1,32 @@
services:
keep:
build: .
ports:
- "21080:21080"
volumes:
- keep-data:/data
- keep-config:/config
environment:
- KEEP_SERVER_ADDRESS=0.0.0.0
- KEEP_SERVER_PORT=21080
# - KEEP_SERVER_USERNAME=keep
# - KEEP_SERVER_PASSWORD=changeme
# - KEEP_SERVER_PASSWORD_HASH=
# - KEEP_SERVER_JWT_SECRET=
# - KEEP_SERVER_JWT_SECRET_FILE=/config/jwt_secret
# - KEEP_COMPRESSION=lz4
# - KEEP_META_PLUGINS=
# - KEEP_FILTERS=
- KEEP_CONFIG=/config/config.yml
# - KEEP_SERVER_CERT=/certs/cert.pem
# - KEEP_SERVER_KEY=/certs/key.pem
# - KEEP_CLIENT_USERNAME=keep
# - KEEP_CLIENT_JWT=""
restart: unless-stopped
# For TLS, mount certificate files:
# volumes:
# - ./certs:/certs:ro
volumes:
keep-data:
keep-config:

View File

@@ -15,3 +15,6 @@ module-whatis Keep
prepend-path PATH $mydir/bin
setenv KEEP_BASH_PROFILE ${mydir}/profile.bash
setenv KEEP_ZSH_PROFILE ${mydir}/profile.zsh
setenv KEEP_SH_PROFILE ${mydir}/profile.sh
setenv KEEP_CSH_PROFILE ${mydir}/profile.csh

View File

@@ -6,18 +6,10 @@ function __keep_preexec {
}
function __keep_preexec_init {
local found=false
local f
for f in "${preexec_functions[@]}"; do
if [[ $f = __keep_preexec ]]; then
found=true
break
fi
[[ $f = __keep_preexec ]] && return
done
if [[ $found = false ]]; then
preexec_functions+=(__keep_preexec)
fi
}
function keep {
@@ -40,4 +32,20 @@ function @@ {
keep --get "$@"
}
# Shell completions
. <(command keep --generate-completion bash)
___keep_complete() {
local mode="$1"
COMP_WORDS=(keep "$mode" "${COMP_WORDS[@]:1}")
COMP_CWORD=$((COMP_CWORD + 1))
_keep
}
___keep_save_completion() { ___keep_complete --save; }
___keep_get_completion() { ___keep_complete --get; }
complete -F ___keep_save_completion @
complete -F ___keep_get_completion @@
__keep_preexec_init

11
profile.csh Normal file
View File

@@ -0,0 +1,11 @@
#!/bin/csh
# Profile for csh and tcsh.
# Preexec hooks are not available; KEEP_META_command is not set.
if ( ! $?KEEP_META_tty ) then
setenv KEEP_META_tty `tty`
endif
alias keep 'env KEEP_META_tty=${KEEP_META_tty} command keep \!*'
alias @ 'keep --save \!*'
alias @@ 'keep --get \!*'

13
profile.sh Normal file
View File

@@ -0,0 +1,13 @@
#!/bin/sh
# POSIX-compatible profile for sh, dash, ksh93, pdksh, mksh, and other POSIX shells.
# Preexec hooks are not available in these shells; KEEP_META_command is not set.
KEEP_META_tty=${KEEP_META_tty:-$(tty)}
keep() {
export KEEP_META_tty
command keep "$@"
}
alias @='keep --save'
alias @@='keep --get'

38
profile.zsh Normal file
View File

@@ -0,0 +1,38 @@
#!/bin/zsh
autoload -U add-zsh-hook
__keep_preexec() {
KEEP_META_command="$1"
KEEP_META_tty=${KEEP_META_tty:-$(tty)}
}
add-zsh-hook preexec __keep_preexec
keep() {
if [[ $ZSH_SUBSHELL -le 2 ]]; then
export KEEP_META_command
fi
export KEEP_META_tty
command keep "$@"
}
alias @='keep --save'
alias @@='keep --get'
# Shell completions
. <(command keep --generate-completion zsh)
___keep_complete() {
local mode="$1"
local -a words
words=(keep "$mode" "${words[@]:1}")
((CURRENT++))
_keep
}
___keep_save_completion() { ___keep_complete --save; }
___keep_get_completion() { ___keep_complete --get; }
compdef ___keep_save_completion @
compdef ___keep_get_completion @@

View File

@@ -2,6 +2,7 @@ use std::path::PathBuf;
use std::str::FromStr;
use clap::*;
use clap_complete::Shell;
/// Main struct for command-line arguments, parsed via Clap.
#[derive(Parser, Debug, Clone)]
@@ -23,58 +24,155 @@ pub struct Args {
/// Struct for mode-specific arguments, defining CLI flags for different operations.
#[derive(Parser, Debug, Clone)]
pub struct ModeArgs {
#[arg(group("mode"), help_heading("Mode Options"), short, long, conflicts_with_all(["get", "diff", "list", "delete", "info", "status"]))]
#[arg(group("mode"), help_heading("Mode Options"), short, long)]
#[arg(help("Save an item using any tags or metadata provided"))]
pub save: bool,
#[arg(group("mode"), help_heading("Mode Options"), short, long, conflicts_with_all(["save", "diff", "list", "delete", "info", "status"]))]
#[arg(help(
"Get an item either by it's ID or by a combination of matching tags and metatdata"
))]
#[arg(group("mode"), help_heading("Mode Options"), short, long)]
#[arg(help("Get an item either by its ID or by a combination of matching tags and metadata"))]
pub get: bool,
#[arg(group("mode"), help_heading("Mode Options"), long, conflicts_with_all(["save", "get", "list", "delete", "info", "status"]))]
#[arg(group("mode"), help_heading("Mode Options"), long)]
#[arg(help("Show a diff between two items by ID"))]
pub diff: bool,
#[arg(group("mode"), help_heading("Mode Options"), short, long, conflicts_with_all(["save", "get", "diff", "delete", "info", "status"]))]
#[arg(group("mode"), help_heading("Mode Options"), short, long)]
#[arg(help("List items, filtering on tags or metadata if given"))]
pub list: bool,
#[arg(group("mode"), help_heading("Mode Options"), short, long, conflicts_with_all(["save", "get", "diff", "list", "info", "status"]))]
#[arg(group("mode"), help_heading("Mode Options"), short, long)]
#[arg(help("Delete items either by ID or by matching tags"))]
#[arg(requires = "ids_or_tags")]
pub delete: bool,
#[arg(group("mode"), help_heading("Mode Options"), short, long, conflicts_with_all(["save", "get", "diff", "list", "delete", "status"]))]
#[arg(help(
"Get an item either by it's ID or by a combination of matching tags and metatdata"
))]
#[arg(group("mode"), help_heading("Mode Options"), short, long)]
#[arg(help("Get an item either by its ID or by a combination of matching tags and metadata"))]
pub info: bool,
#[arg(group("mode"), help_heading("Mode Options"), short('S'), long, conflicts_with_all(["save", "get", "diff", "list", "delete", "info", "server", "status_plugins"]))]
#[arg(group("mode"), help_heading("Mode Options"), short('u'), long)]
#[arg(help("Update an item's tags and metadata by ID"))]
pub update: bool,
#[arg(group("mode"), help_heading("Mode Options"), short('S'), long)]
#[arg(help("Show status of directories and supported compression algorithms"))]
pub status: bool,
#[arg(group("mode"), help_heading("Mode Options"), long, conflicts_with_all(["save", "get", "diff", "list", "delete", "info", "status", "server"]))]
#[arg(group("mode"), help_heading("Mode Options"), long)]
#[arg(help("Show available plugins and their configurations"))]
pub status_plugins: bool,
#[arg(group("mode"), help_heading("Mode Options"), long, conflicts_with_all(["save", "get", "diff", "list", "delete", "info", "status"]))]
#[arg(group("mode"), help_heading("Mode Options"), long)]
#[arg(help("Export items to a .keep.tar archive (requires IDs or tags)"))]
pub export: bool,
#[arg(group("mode"), help_heading("Mode Options"), long, value_name("FILE"))]
#[arg(help("Import items from a .keep.tar archive or legacy .meta.yml file"))]
pub import: Option<String>,
#[cfg(feature = "server")]
#[arg(group("mode"), help_heading("Mode Options"), long)]
#[arg(help("Start REST HTTP server"))]
pub server: bool,
#[arg(group("mode"), help_heading("Mode Options"), long, conflicts_with_all(["save", "get", "diff", "list", "delete", "info", "status", "server"]))]
#[arg(group("mode"), help_heading("Mode Options"), long)]
#[arg(help("Generate default configuration and output to stdout"))]
pub generate_config: bool,
#[arg(help_heading("Mode Options"), long)]
#[arg(help("Generate shell completion script"))]
pub generate_completion: Option<Shell>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_ADDRESS"))]
#[arg(help("Server address to bind to"))]
pub server_address: Option<String>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_PORT"))]
#[arg(help("Server port to bind to"))]
pub server_port: Option<u16>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_CERT"))]
#[arg(help("Path to TLS certificate file (PEM) for HTTPS"))]
pub server_cert: Option<PathBuf>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_KEY"))]
#[arg(help("Path to TLS private key file (PEM) for HTTPS"))]
pub server_key: Option<PathBuf>,
}
/// Represents a meta plugin argument with optional JSON config.
///
/// Parsed from `name` or `name:{"options":{...},"outputs":{...}}` syntax.
#[derive(Debug, Clone)]
pub struct MetaPluginArg {
pub name: String,
pub options: Option<serde_json::Value>,
}
impl FromStr for MetaPluginArg {
type Err = anyhow::Error;
fn from_str(s: &str) -> Result<Self, Self::Err> {
if let Some((name, json_str)) = s.split_once(':') {
let value: serde_json::Value = serde_json::from_str(json_str)
.map_err(|e| anyhow::anyhow!("Invalid JSON for meta plugin '{}': {}", name, e))?;
Ok(MetaPluginArg {
name: name.to_string(),
options: Some(value),
})
} else {
Ok(MetaPluginArg {
name: s.to_string(),
options: None,
})
}
}
}
/// Represents a metadata key-value argument.
///
/// Parsed from `key=value` (set) or `key` (delete/filter by existence).
#[derive(Debug, Clone)]
pub enum MetaArg {
/// Set metadata with a value.
Set { key: String, value: String },
/// Bare key without a value (delete in update mode, filter by existence otherwise).
Key(String),
}
impl MetaArg {
/// Returns the key.
pub fn key(&self) -> &str {
match self {
MetaArg::Set { key, .. } | MetaArg::Key(key) => key,
}
}
/// Returns the value if this is a Set variant.
pub fn value(&self) -> Option<&str> {
match self {
MetaArg::Set { value, .. } => Some(value),
MetaArg::Key(_) => None,
}
}
}
impl FromStr for MetaArg {
type Err = anyhow::Error;
fn from_str(s: &str) -> Result<Self, Self::Err> {
if let Some((key, value)) = s.split_once('=') {
Ok(MetaArg::Set {
key: key.to_string(),
value: value.to_string(),
})
} else {
Ok(MetaArg::Key(s.to_string()))
}
}
}
/// Struct for item-specific arguments, such as compression and plugins.
@@ -87,15 +185,32 @@ pub struct ItemArgs {
#[arg(
help_heading("Item Options"),
short('M'),
long,
long = "meta-plugin",
value_parser = clap::value_parser!(MetaPluginArg),
env("KEEP_META_PLUGINS")
)]
#[arg(help("Meta plugins to use when saving items"))]
pub meta_plugins: Vec<String>,
#[arg(help("Meta plugin to use (repeatable): name or name:{json}"))]
pub meta_plugins: Vec<MetaPluginArg>,
#[arg(help_heading("Item Options"), long)]
#[arg(help("Metadata key=value to set (or key to delete in --update)"))]
pub meta: Vec<String>,
#[arg(help_heading("Item Options"), long, env("KEEP_FILTERS"))]
#[arg(help("Filter string to apply to content when getting items"))]
pub filters: Option<String>,
#[arg(help_heading("Export Options"), long, default_value = "{name}_{ts}")]
#[arg(help("Template for export tar filename (appends .keep.tar). Variables: {name} {ts}"))]
pub export_filename_format: String,
#[arg(help_heading("Export Options"), long, value_name("NAME"))]
#[arg(help("Export name used for {name} variable (default: export_<common-tags>)"))]
pub export_name: Option<String>,
#[arg(help_heading("Import Options"), long, value_name("DATA_FILE"))]
#[arg(help("Data file for import (reads from stdin if omitted)"))]
pub import_data_file: Option<PathBuf>,
}
/// Struct for general options, including verbosity, paths, and output settings.
@@ -112,7 +227,7 @@ pub struct OptionsArgs {
#[arg(
long,
env("KEEP_LIST_FORMAT"),
default_value("id,time,size,tags,meta:hostname")
default_value("id,time,size,meta:text_line_count,tags,meta:hostname_short,meta:command")
)]
#[arg(help("A comma separated list of columns to display with --list"))]
pub list_format: String,
@@ -121,6 +236,10 @@ pub struct OptionsArgs {
#[arg(help("Display file sizes with units"))]
pub human_readable: bool,
#[arg(long)]
#[arg(help("Only output item IDs (for scripting)"))]
pub ids_only: bool,
#[arg(short, long, action = clap::ArgAction::Count, conflicts_with("quiet"))]
#[arg(help("Increase message verbosity, can be given more than once"))]
pub verbose: u8,
@@ -133,14 +252,62 @@ pub struct OptionsArgs {
#[arg(help("Output format (only works with --info, --status, --list)"))]
pub output_format: Option<String>,
#[arg(long, env("KEEP_SERVER_PASSWORD"))]
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_PASSWORD"))]
#[arg(help("Password for server authentication (requires --server)"))]
pub server_password: Option<String>,
#[arg(long, env("KEEP_SERVER_PASSWORD_HASH"))]
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_PASSWORD_HASH"))]
#[arg(help("Password hash for server authentication (requires --server)"))]
pub server_password_hash: Option<String>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_USERNAME"))]
#[arg(help(
"Username for server Basic authentication (requires --server, defaults to 'keep')"
))]
pub server_username: Option<String>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_JWT_SECRET"))]
#[arg(help("JWT secret for token-based authentication (requires --server)"))]
pub server_jwt_secret: Option<String>,
#[cfg(feature = "server")]
#[arg(
help_heading("Server Options"),
long,
env("KEEP_SERVER_JWT_SECRET_FILE")
)]
#[arg(help("Path to file containing JWT secret (requires --server)"))]
pub server_jwt_secret_file: Option<PathBuf>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_MAX_BODY_SIZE"))]
#[arg(help("Maximum request body size in bytes (requires --server, default: unlimited)"))]
pub server_max_body_size: Option<u64>,
#[cfg(feature = "client")]
#[arg(long, env("KEEP_CLIENT_URL"), help_heading("Client Options"))]
#[arg(help("Remote keep server URL for client mode"))]
pub client_url: Option<String>,
#[cfg(feature = "client")]
#[arg(long, env("KEEP_CLIENT_PASSWORD"), help_heading("Client Options"))]
#[arg(help("Password for remote keep server authentication"))]
pub client_password: Option<String>,
#[cfg(feature = "client")]
#[arg(long, env("KEEP_CLIENT_USERNAME"), help_heading("Client Options"))]
#[arg(help("Username for remote keep server authentication (defaults to 'keep')"))]
pub client_username: Option<String>,
#[cfg(feature = "client")]
#[arg(long, env("KEEP_CLIENT_JWT"), help_heading("Client Options"))]
#[arg(help("JWT token for remote keep server authentication"))]
pub client_jwt: Option<String>,
#[arg(
long,
help("Force output even when binary data would be sent to a TTY")

514
src/client.rs Normal file
View File

@@ -0,0 +1,514 @@
use crate::services::{ItemInfo, error::CoreError};
use base64::Engine;
use serde::de::DeserializeOwned;
use std::collections::HashMap;
use std::io::Read;
/// Percent-encode a value for use in a URL query string.
fn url_encode(s: &str) -> String {
let mut result = String::with_capacity(s.len() * 3);
for byte in s.bytes() {
match byte {
b'A'..=b'Z' | b'a'..=b'z' | b'0'..=b'9' | b'-' | b'_' | b'.' | b'~' => {
result.push(byte as char);
}
_ => {
result.push('%');
result.push(char::from_digit((byte >> 4) as u32, 16).unwrap());
result.push(char::from_digit((byte & 0xF) as u32, 16).unwrap());
}
}
}
result
}
fn append_query_params(url: &mut String, params: &[(&str, &str)]) {
if !params.is_empty() {
url.push('?');
for (i, (key, value)) in params.iter().enumerate() {
if i > 0 {
url.push('&');
}
url.push_str(&format!("{}={}", url_encode(key), url_encode(value)));
}
}
}
pub struct KeepClient {
base_url: String,
agent: ureq::Agent,
username: Option<String>,
password: Option<String>,
jwt: Option<String>,
}
impl KeepClient {
pub fn new(
base_url: &str,
username: Option<String>,
password: Option<String>,
jwt: Option<String>,
) -> Result<Self, CoreError> {
let base_url = base_url.trim_end_matches('/').to_string();
let agent = ureq::Agent::new_with_defaults();
Ok(Self {
base_url,
agent,
username,
password,
jwt,
})
}
pub fn base_url(&self) -> &str {
&self.base_url
}
pub fn username(&self) -> Option<&String> {
self.username.as_ref()
}
pub fn password(&self) -> Option<&String> {
self.password.as_ref()
}
pub fn jwt(&self) -> Option<&String> {
self.jwt.as_ref()
}
fn url(&self, path: &str) -> String {
format!("{}{}", self.base_url, path)
}
/// Get the Authorization header value for the current credentials.
///
/// JWT token is sent as `Bearer <token>`.
/// Password is sent as `Basic base64(username:password)`
/// where username defaults to "keep".
fn auth_header(&self) -> Option<String> {
if let Some(ref jwt) = self.jwt {
Some(format!("Bearer {jwt}"))
} else if let Some(ref password) = self.password {
let username = self.username.as_deref().unwrap_or("keep");
let credentials = format!("{username}:{password}");
let encoded = base64::engine::general_purpose::STANDARD.encode(&credentials);
Some(format!("Basic {encoded}"))
} else {
None
}
}
fn handle_error<T>(&self, result: Result<T, ureq::Error>) -> Result<T, CoreError> {
match result {
Ok(v) => Ok(v),
Err(ureq::Error::StatusCode(code)) => Err(CoreError::Other(anyhow::anyhow!(
"Server returned error: HTTP {}",
code
))),
Err(e) => Err(CoreError::Other(anyhow::anyhow!("Request failed: {}", e))),
}
}
pub fn get_json<T: DeserializeOwned>(&self, path: &str) -> Result<T, CoreError> {
let url = self.url(path);
let mut req = self.agent.get(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
let response = self.handle_error(req.call())?;
let body: T = self.handle_error(response.into_body().read_json())?;
Ok(body)
}
pub fn get_json_with_query<T: DeserializeOwned>(
&self,
path: &str,
params: &[(&str, &str)],
) -> Result<T, CoreError> {
let mut url = self.url(path);
append_query_params(&mut url, params);
let mut req = self.agent.get(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
let response = self.handle_error(req.call())?;
let body: T = self.handle_error(response.into_body().read_json())?;
Ok(body)
}
pub fn get_bytes(&self, path: &str) -> Result<Vec<u8>, CoreError> {
let url = self.url(path);
let mut req = self.agent.get(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
let response = self.handle_error(req.call())?;
let mut body = response.into_body();
let bytes = body
.read_to_vec()
.map_err(|e| CoreError::Other(anyhow::anyhow!("{}", e)))?;
Ok(bytes)
}
pub fn post_bytes(
&self,
path: &str,
body_bytes: &[u8],
params: &[(&str, &str)],
) -> Result<ItemInfo, CoreError> {
let mut cursor = std::io::Cursor::new(body_bytes);
self.post_stream(path, &mut cursor, params)
}
/// Stream data from a reader to the server using chunked transfer encoding.
///
/// The reader is consumed in chunks and sent to the server without buffering
/// the entire body in memory. This enables true streaming for large payloads.
pub fn post_stream(
&self,
path: &str,
body_reader: &mut dyn Read,
params: &[(&str, &str)],
) -> Result<ItemInfo, CoreError> {
let mut url = self.url(path);
append_query_params(&mut url, params);
let mut req = self.agent.post(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
req = req.header("Content-Type", "application/octet-stream");
let response = self.handle_error(req.send(ureq::SendBody::from_reader(body_reader)))?;
#[derive(serde::Deserialize)]
struct ApiResponse {
data: Option<ItemInfo>,
error: Option<String>,
}
let api_response: ApiResponse = self.handle_error(response.into_body().read_json())?;
if let Some(error) = api_response.error {
return Err(CoreError::Other(anyhow::anyhow!("Server error: {}", error)));
}
api_response
.data
.ok_or_else(|| CoreError::Other(anyhow::anyhow!("No data in response")))
}
pub fn delete(&self, path: &str) -> Result<(), CoreError> {
let url = self.url(path);
let mut req = self.agent.delete(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
self.handle_error(req.call())?;
Ok(())
}
pub fn get_status(&self) -> Result<crate::common::status::StatusInfo, CoreError> {
#[derive(serde::Deserialize)]
struct ApiResponse {
data: Option<crate::common::status::StatusInfo>,
error: Option<String>,
}
let response: ApiResponse = self.get_json("/api/status")?;
response.data.ok_or_else(|| {
CoreError::Other(anyhow::anyhow!(
"{}",
response
.error
.unwrap_or_else(|| "No status data returned".to_string())
))
})
}
pub fn get_item_info(&self, id: i64) -> Result<ItemInfo, CoreError> {
#[derive(serde::Deserialize)]
struct ApiResponse {
data: Option<ItemInfo>,
error: Option<String>,
}
let response: ApiResponse = self.get_json(&format!("/api/item/{id}/info"))?;
response.data.ok_or_else(|| {
CoreError::Other(anyhow::anyhow!(
"{}",
response
.error
.unwrap_or_else(|| "Item not found".to_string())
))
})
}
pub fn list_items(
&self,
ids: &[i64],
tags: &[String],
order: &str,
start: u64,
count: u64,
meta: &HashMap<String, Option<String>>,
) -> Result<Vec<ItemInfo>, CoreError> {
#[derive(serde::Deserialize)]
struct ApiResponse {
data: Option<Vec<ItemInfo>>,
error: Option<String>,
}
let mut params: Vec<(String, String)> = Vec::new();
params.push(("order".to_string(), order.to_string()));
params.push(("start".to_string(), start.to_string()));
params.push(("count".to_string(), count.to_string()));
if !ids.is_empty() {
params.push((
"ids".to_string(),
ids.iter()
.map(|i| i.to_string())
.collect::<Vec<_>>()
.join(","),
));
}
if !tags.is_empty() {
params.push(("tags".to_string(), tags.join(",")));
}
if !meta.is_empty() {
let meta_json = serde_json::to_string(meta).map_err(|e| {
CoreError::Other(anyhow::anyhow!("Failed to serialize meta filter: {}", e))
})?;
params.push(("meta".to_string(), meta_json));
}
let param_refs: Vec<(&str, &str)> = params
.iter()
.map(|(k, v)| (k.as_str(), v.as_str()))
.collect();
let response: ApiResponse = self.get_json_with_query("/api/item/", &param_refs)?;
if let Some(data) = response.data {
return Ok(data);
}
if let Some(err) = response.error {
return Err(CoreError::Other(anyhow::anyhow!("Server error: {err}")));
}
Ok(Vec::new())
}
pub fn save_item(
&self,
content: &[u8],
tags: &[String],
metadata: &HashMap<String, String>,
compress: bool,
meta: bool,
) -> Result<ItemInfo, CoreError> {
let mut params: Vec<(String, String)> = Vec::new();
if !tags.is_empty() {
params.push(("tags".to_string(), tags.join(",")));
}
if !metadata.is_empty() {
let meta_json = serde_json::to_string(metadata).map_err(|e| {
CoreError::Other(anyhow::anyhow!("Failed to serialize metadata: {}", e))
})?;
params.push(("metadata".to_string(), meta_json));
}
params.push(("compress".to_string(), compress.to_string()));
params.push(("meta".to_string(), meta.to_string()));
let param_refs: Vec<(&str, &str)> = params
.iter()
.map(|(k, v)| (k.as_str(), v.as_str()))
.collect();
self.post_bytes("/api/item/", content, &param_refs)
}
pub fn delete_item(&self, id: i64) -> Result<(), CoreError> {
self.delete(&format!("/api/item/{id}"))
}
/// Add metadata to an existing item.
pub fn post_metadata(
&self,
id: i64,
metadata: &HashMap<String, String>,
) -> Result<(), CoreError> {
let url = self.url(&format!("/api/item/{id}/meta"));
let mut req = self.agent.post(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
req = req.header("Content-Type", "application/json");
let body = serde_json::to_vec(metadata)
.map_err(|e| CoreError::Other(anyhow::anyhow!("Failed to serialize metadata: {e}")))?;
let mut cursor = std::io::Cursor::new(body);
self.handle_error(req.send(ureq::SendBody::from_reader(&mut cursor)))?;
Ok(())
}
/// Set the uncompressed size for an item.
pub fn set_item_size(&self, id: i64, size: u64) -> Result<(), CoreError> {
let url = format!(
"{}?uncompressed_size={}",
self.url(&format!("/api/item/{id}/update")),
url_encode(&size.to_string())
);
let mut req = self.agent.post(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
self.handle_error(req.send(ureq::SendBody::from_reader(&mut std::io::empty())))?;
Ok(())
}
pub fn get_item_content_raw(&self, id: i64) -> Result<(Vec<u8>, String), CoreError> {
let (mut reader, compression) = self.get_item_content_stream(id)?;
let mut bytes = Vec::new();
reader
.read_to_end(&mut bytes)
.map_err(|e| CoreError::Other(anyhow::anyhow!("{}", e)))?;
Ok((bytes, compression))
}
/// Get a streaming reader for item content without decompression.
///
/// Returns a reader over the HTTP response body and the compression type
/// from the X-Keep-Compression header. The caller can stream through
/// decompression readers without buffering the entire file in memory.
pub fn get_item_content_stream(&self, id: i64) -> Result<(Box<dyn Read>, String), CoreError> {
let url = format!(
"{}?decompress=false",
self.url(&format!("/api/item/{id}/content"))
);
let mut req = self.agent.get(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
let response = self.handle_error(req.call())?;
let compression = response
.headers()
.get("X-Keep-Compression")
.and_then(|v| v.to_str().ok())
.unwrap_or("raw")
.to_string();
let reader = response.into_body().into_reader();
Ok((Box::new(reader), compression))
}
pub fn diff_items(&self, id_a: i64, id_b: i64) -> Result<Vec<String>, CoreError> {
#[derive(serde::Deserialize)]
struct ApiResponse {
data: Option<Vec<String>>,
}
let params = [("id_a", id_a.to_string()), ("id_b", id_b.to_string())];
let param_refs: Vec<(&str, &str)> = params.iter().map(|(k, v)| (*k, v.as_str())).collect();
let response: ApiResponse = self.get_json_with_query("/api/diff", &param_refs)?;
Ok(response.data.unwrap_or_default())
}
/// Export items to a tar archive, streaming the response to a file.
///
/// # Arguments
///
/// * `ids` - Item IDs to export (mutually exclusive with tags).
/// * `tags` - Tags to search for items (mutually exclusive with ids).
/// * `dest` - Destination file path.
pub fn export_items_to_file(
&self,
ids: &[i64],
tags: &[String],
dest: &std::path::Path,
) -> Result<(), CoreError> {
let mut params: Vec<(String, String)> = Vec::new();
if !ids.is_empty() {
let id_strs: Vec<String> = ids.iter().map(|id| id.to_string()).collect();
params.push(("ids".to_string(), id_strs.join(",")));
}
if !tags.is_empty() {
params.push(("tags".to_string(), tags.join(",")));
}
let param_refs: Vec<(&str, &str)> = params
.iter()
.map(|(k, v)| (k.as_str(), v.as_str()))
.collect();
let mut url = self.url("/api/export");
append_query_params(&mut url, &param_refs);
let mut req = self.agent.get(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
let response = self.handle_error(req.call())?;
let mut reader = response.into_body().into_reader();
let mut file = std::fs::File::create(dest).map_err(CoreError::Io)?;
let mut buf = [0u8; crate::common::PIPESIZE];
loop {
let n = reader.read(&mut buf).map_err(CoreError::Io)?;
if n == 0 {
break;
}
std::io::Write::write_all(&mut file, &buf[..n]).map_err(CoreError::Io)?;
}
Ok(())
}
/// Import items from a tar archive, streaming the file to the server.
///
/// # Arguments
///
/// * `tar_path` - Path to the `.keep.tar` file.
///
/// # Returns
///
/// A list of newly assigned item IDs.
pub fn import_tar_file(&self, tar_path: &std::path::Path) -> Result<Vec<i64>, CoreError> {
#[derive(serde::Deserialize)]
struct ApiResponse {
data: Option<ImportResponse>,
error: Option<String>,
}
#[derive(serde::Deserialize)]
struct ImportResponse {
ids: Vec<i64>,
}
let mut file = std::fs::File::open(tar_path).map_err(CoreError::Io)?;
let url = self.url("/api/import");
let mut req = self.agent.post(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
req = req.header("Content-Type", "application/x-tar");
let response = self.handle_error(req.send(ureq::SendBody::from_reader(&mut file)))?;
let body = response
.into_body()
.read_to_string()
.map_err(|e| CoreError::InvalidInput(format!("Cannot read response: {e}")))?;
let api_response: ApiResponse = serde_json::from_str(&body)
.map_err(|e| CoreError::InvalidInput(format!("Cannot parse response: {e}")))?;
if let Some(error) = api_response.error {
return Err(CoreError::InvalidInput(error));
}
Ok(api_response.data.map(|d| d.ids).unwrap_or_default())
}
}

View File

@@ -1,130 +0,0 @@
use crate::services::async_item_service::AsyncItemService;
use crate::services::error::CoreError;
use axum::http::StatusCode;
use std::collections::HashMap;
/// Check if content is binary when allow_binary is false
///
/// # Arguments
///
/// * `item_service` - Reference to the async item service
/// * `item_id` - The ID of the item to check
/// * `metadata` - Metadata associated with the item
/// * `allow_binary` - Whether binary content is allowed
///
/// # Returns
///
/// * `Result<(), StatusCode>` -
/// * `Ok(())` if binary content is allowed or content is not binary
/// * `Err(StatusCode::BAD_REQUEST)` if binary content is not allowed and content is binary
/// Check if content is binary when allow_binary is false
///
/// Validates whether binary content is permitted for the item. If not allowed and content
/// is detected as binary, returns a bad request status. Uses metadata or streams content
/// for detection if needed.
///
/// # Arguments
///
/// * `item_service` - Reference to the async item service for content access.
/// * `item_id` - The ID of the item to check.
/// * `metadata` - Metadata associated with the item (checked for "text" key).
/// * `allow_binary` - Whether binary content is allowed (bypasses check if true).
///
/// # Returns
///
/// * `Result<(), StatusCode>` -
/// * `Ok(())` if binary content is allowed or content is not binary.
/// * `Err(StatusCode::BAD_REQUEST)` if binary content is not allowed and content is binary.
///
/// # Errors
///
/// Propagates `StatusCode` for validation failures.
///
/// # Examples
///
/// ```
/// // If allow_binary = false and content is text
/// check_binary_content_allowed(&service, 1, &metadata, false)?;
/// // Succeeds
///
/// // If allow_binary = false and content is binary
/// // Returns Err(StatusCode::BAD_REQUEST)
/// ```
pub async fn check_binary_content_allowed(
item_service: &AsyncItemService,
item_id: i64,
metadata: &HashMap<String, String>,
allow_binary: bool,
) -> Result<(), StatusCode> {
if !allow_binary {
let is_binary = is_content_binary(item_service, item_id, metadata).await?;
if is_binary {
return Err(StatusCode::BAD_REQUEST);
}
}
Ok(())
}
/// Helper function to determine if content is binary
///
/// # Arguments
///
/// * `item_service` - Reference to the async item service
/// * `item_id` - The ID of the item to check
/// * `metadata` - Metadata associated with the item
///
/// # Returns
///
/// * `Result<bool, StatusCode>` -
/// * `Ok(true)` if content is binary
/// * `Ok(false)` if content is text
/// * `Err(StatusCode)` if an error occurs during checking
/// Helper function to determine if content is binary
///
/// Checks existing "text" metadata first; if absent or unset, streams and analyzes
/// the content to detect binary nature. Logs warnings on detection failures.
///
/// # Arguments
///
/// * `item_service` - Reference to the async item service for content access.
/// * `item_id` - The ID of the item to check.
/// * `metadata` - Metadata associated with the item (checked for "text" key).
///
/// # Returns
///
/// * `Result<bool, StatusCode>` -
/// * `Ok(true)` if content is binary.
/// * `Ok(false)` if content is text.
/// * `Err(StatusCode)` if an error occurs during checking (e.g., INTERNAL_SERVER_ERROR).
///
/// # Errors
///
/// * `StatusCode::INTERNAL_SERVER_ERROR` if content access fails.
///
/// # Examples
///
/// ```
/// let is_bin = is_content_binary(&service, 1, &metadata).await?;
/// assert!(is_bin == false); // For text content
/// ```
pub async fn is_content_binary(
item_service: &AsyncItemService,
item_id: i64,
metadata: &HashMap<String, String>,
) -> Result<bool, StatusCode> {
if let Some(text_val) = metadata.get("text") {
Ok(text_val == "false")
} else {
// If text metadata isn't set, we need to check the content using streaming approach
match item_service.get_item_content_info_streaming(
item_id,
None
).await {
Ok((_, _, is_binary)) => Ok(is_binary),
Err(e) => {
log::warn!("Failed to get content info for binary check for item {}: {}", item_id, e);
Err(StatusCode::INTERNAL_SERVER_ERROR)
}
}
}
}

View File

@@ -192,15 +192,15 @@ fn looks_like_tar(data: &[u8]) -> bool {
}
// Check file mode field (should be octal digits)
for i in 100..108 {
if data[i] != 0 && (data[i] < b'0' || data[i] > b'7') && data[i] != b' ' {
for byte in data.iter().skip(100).take(8) {
if *byte != 0 && !(b'0'..=b'7').contains(byte) && *byte != b' ' {
return false;
}
}
// Check checksum field (should be octal digits or spaces)
for &b in &data[148..156] {
if b != 0 && (b < b'0' || b > b'7') && b != b' ' {
if b != 0 && !(b'0'..=b'7').contains(&b) && b != b' ' {
return false;
}
}
@@ -229,3 +229,25 @@ fn calculate_printable_ratio(data: &[u8]) -> f64 {
printable_count as f64 / data.len() as f64
}
/// Check if content is binary, using metadata as a fast path.
///
/// First checks for a "text" metadata field:
/// - "false" means binary
/// - "true" means text
/// - Absent or other values fall back to byte sampling
///
/// # Arguments
///
/// * `metadata` - Key-value metadata map (e.g., from `meta_as_map()`)
/// * `data` - Byte sample to analyze if metadata is inconclusive
pub fn is_content_binary_from_metadata(
metadata: &std::collections::HashMap<String, String>,
data: &[u8],
) -> bool {
if let Some(text_val) = metadata.get("text") {
text_val == "false"
} else {
is_binary(data)
}
}

View File

@@ -3,5 +3,89 @@ pub mod is_binary;
/// Detects if data is binary or text based on signatures and printable ratios.
pub mod status;
/// Plugin schema types and discovery functions.
pub mod schema;
/// Standard buffer size for I/O operations (8KB)
pub const PIPESIZE: usize = 8192;
/// Reads chunks from `reader` until EOF, passing each chunk to `f`.
///
/// Uses a fixed PIPESIZE buffer to ensure bounded memory usage.
pub fn stream_copy<R: std::io::Read + ?Sized>(
reader: &mut R,
mut f: impl FnMut(&[u8]) -> std::io::Result<()>,
) -> std::io::Result<()> {
let mut buffer = [0u8; PIPESIZE];
loop {
let n = reader.read(&mut buffer)?;
if n == 0 {
break;
}
f(&buffer[..n])?;
}
Ok(())
}
/// Reads content from a reader with offset and length bounds.
///
/// Skips `offset` bytes from the reader, then reads up to `length` bytes
/// (or all remaining if `length` is 0). Uses PIPESIZE buffers throughout.
///
/// # Arguments
///
/// * `reader` - The source reader positioned at the start.
/// * `offset` - Number of bytes to skip before reading.
/// * `length` - Maximum bytes to read (0 = read all remaining).
/// * `content_len` - Total content size (used to cap skip/read amounts).
///
/// # Returns
///
/// A `Vec<u8>` containing the requested byte range.
pub fn read_with_bounds<R: std::io::Read>(
reader: &mut R,
offset: u64,
length: u64,
content_len: u64,
) -> std::io::Result<Vec<u8>> {
// Skip offset bytes
let skip = std::cmp::min(offset, content_len);
let mut remaining = skip;
let mut buf = [0u8; PIPESIZE];
while remaining > 0 {
let to_read = std::cmp::min(remaining, buf.len() as u64) as usize;
match reader.read(&mut buf[..to_read]) {
Ok(0) => break,
Ok(n) => remaining -= n as u64,
Err(e) => return Err(e),
}
}
// Read bounded content
let max_bytes = if length > 0 {
std::cmp::min(length, content_len.saturating_sub(offset))
} else {
content_len.saturating_sub(offset)
};
let mut result = Vec::with_capacity(std::cmp::min(max_bytes, 64 * 1024) as usize);
let mut bytes_read = 0u64;
while bytes_read < max_bytes {
let to_read = std::cmp::min(max_bytes - bytes_read, buf.len() as u64) as usize;
match reader.read(&mut buf[..to_read]) {
Ok(0) => break,
Ok(n) => {
result.extend_from_slice(&buf[..n]);
bytes_read += n as u64;
}
Err(e) => return Err(e),
}
}
Ok(result)
}
/// Sanitize a timestamp string for use in filenames.
///
/// Replaces colons with hyphens (e.g., `2026-03-17T12:00:00Z` → `2026-03-17T12-00-00Z`).
pub fn sanitize_ts_string(ts: &str) -> String {
ts.replace(':', "-")
}

166
src/common/schema.rs Normal file
View File

@@ -0,0 +1,166 @@
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use strum::IntoEnumIterator;
/// Value type for a plugin option.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum OptionType {
String,
Integer,
Boolean,
Any,
}
impl OptionType {
/// Infer the option type from a YAML value.
pub fn from_yaml_value(value: &serde_yaml::Value) -> Self {
match value {
serde_yaml::Value::Bool(_) => OptionType::Boolean,
serde_yaml::Value::Number(_) => OptionType::Integer,
serde_yaml::Value::String(_) => OptionType::String,
_ => OptionType::Any,
}
}
}
/// Schema for a single plugin option.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OptionSchema {
pub name: String,
pub option_type: OptionType,
pub default: Option<serde_yaml::Value>,
pub required: bool,
}
/// Schema for a single plugin output.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OutputSchema {
pub name: String,
pub description: String,
}
/// Schema describing a plugin's configuration requirements.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PluginSchema {
pub name: String,
pub description: String,
pub options: Vec<OptionSchema>,
pub outputs: Vec<OutputSchema>,
}
/// Gathers schemas from all registered meta plugins.
///
/// Iterates all `MetaPluginType` variants, attempts to create a default instance,
/// and collects their schemas. Plugins that fail to register (e.g., feature-gated)
/// are silently skipped.
pub fn gather_meta_plugin_schemas() -> Vec<PluginSchema> {
use crate::meta_plugin::{MetaPluginType, get_meta_plugin};
let mut schemas = Vec::new();
let mut sorted_types: Vec<MetaPluginType> = MetaPluginType::iter().collect();
sorted_types.sort_by_key(|t| t.to_string());
for plugin_type in sorted_types {
let plugin = match get_meta_plugin(plugin_type.clone(), None, None) {
Ok(p) => p,
Err(_) => continue,
};
let name = plugin.meta_type().to_string();
let options: Vec<OptionSchema> = plugin
.options()
.iter()
.map(|(key, value)| {
let option_type = OptionType::from_yaml_value(value);
let (default, required) = if value.is_null() {
(None, true)
} else {
(Some(value.clone()), false)
};
OptionSchema {
name: key.clone(),
option_type,
default,
required,
}
})
.collect();
let mut outputs: Vec<OutputSchema> = Vec::new();
for (key, value) in plugin.outputs() {
if !value.is_null() {
outputs.push(OutputSchema {
name: key.clone(),
description: key.clone(),
});
}
}
// Also include default outputs if outputs map is empty
if outputs.is_empty() {
for output_name in plugin.default_outputs() {
outputs.push(OutputSchema {
name: output_name.clone(),
description: output_name,
});
}
}
schemas.push(PluginSchema {
name,
description: plugin.description().to_string(),
options,
outputs,
});
}
schemas
}
/// Gathers schemas from all registered filter plugins.
///
/// Uses the global filter plugin registry to discover all registered filters,
/// creates a default instance of each, and collects their option schemas.
pub fn gather_filter_plugin_schemas() -> Vec<PluginSchema> {
use crate::services::filter_service::get_available_filter_plugins;
let plugins = get_available_filter_plugins().unwrap_or_default();
let mut schemas: Vec<PluginSchema> = plugins
.into_iter()
.map(|(name, creator)| {
let plugin = creator();
let options: Vec<OptionSchema> = plugin
.options()
.iter()
.map(|opt| {
let option_type = match &opt.default {
Some(serde_json::Value::Bool(_)) => OptionType::Boolean,
Some(serde_json::Value::Number(_)) => OptionType::Integer,
Some(serde_json::Value::String(_)) => OptionType::String,
_ => OptionType::Any,
};
OptionSchema {
name: opt.name.clone(),
option_type,
default: opt.default.as_ref().map(|v| {
// Convert serde_json::Value to serde_yaml::Value
serde_yaml::to_value(v).unwrap_or(serde_yaml::Value::Null)
}),
required: opt.required,
}
})
.collect();
PluginSchema {
name: name.clone(),
description: plugin.description().to_string(),
options,
outputs: Vec::new(),
}
})
.collect();
schemas.sort_by(|a, b| a.name.cmp(&b.name));
schemas
}

View File

@@ -8,7 +8,7 @@ use crate::meta_plugin::MetaPluginType;
use crate::filter_plugin::FilterOption;
#[derive(serde::Serialize, serde::Deserialize, Clone)]
#[derive(serde::Serialize, serde::Deserialize, Clone, Debug)]
#[cfg_attr(feature = "server", derive(ToSchema))]
pub struct FilterPluginInfo {
pub name: String,
@@ -27,6 +27,22 @@ pub struct StatusInfo {
pub configured_meta_plugins: Option<Vec<crate::config::MetaPluginConfig>>,
}
impl Default for StatusInfo {
fn default() -> Self {
Self {
paths: PathInfo {
data: String::new(),
database: String::new(),
},
compression: Vec::new(),
meta_plugins: std::collections::HashMap::new(),
enabled_meta_plugins: Vec::new(),
filter_plugins: Vec::new(),
configured_meta_plugins: None,
}
}
}
#[derive(serde::Serialize, serde::Deserialize)]
#[cfg_attr(feature = "server", derive(ToSchema))]
pub struct PathInfo {
@@ -34,7 +50,8 @@ pub struct PathInfo {
pub database: String,
}
#[derive(serde::Serialize, serde::Deserialize)]
#[derive(serde::Serialize, serde::Deserialize, Debug)]
#[cfg_attr(feature = "server", derive(ToSchema))]
pub struct CompressionInfo {
#[serde(rename = "type")]
pub compression_type: String,
@@ -45,7 +62,7 @@ pub struct CompressionInfo {
pub decompress: String,
}
#[derive(serde::Serialize, serde::Deserialize, Clone)]
#[derive(serde::Serialize, serde::Deserialize, Clone, Debug)]
#[cfg_attr(feature = "server", derive(ToSchema))]
pub struct MetaPluginInfo {
pub meta_name: String,
@@ -58,21 +75,21 @@ pub fn generate_status_info(
db_path: PathBuf,
enabled_meta_plugins: &[MetaPluginType],
enabled_compression_type: Option<CompressionType>,
) -> StatusInfo {
) -> anyhow::Result<StatusInfo> {
log::debug!("STATUS: Starting status info generation");
let path_info = PathInfo {
data: data_path
.into_os_string()
.into_string()
.expect("Unable to convert data path to string"),
.map_err(|_| anyhow::anyhow!("Unable to convert data path to string"))?,
database: db_path
.into_os_string()
.into_string()
.expect("Unable to convert DB path to string"),
.map_err(|_| anyhow::anyhow!("Unable to convert DB path to string"))?,
};
let _default_type = crate::compression_engine::default_compression_type();
let mut compression_info = Vec::new();
let mut compression_info = Vec::with_capacity(CompressionType::iter().count());
// Sort compression types by their string representation
let mut sorted_compression_types: Vec<CompressionType> = CompressionType::iter().collect();
@@ -124,7 +141,8 @@ pub fn generate_status_info(
});
}
let mut meta_plugins_map = std::collections::HashMap::new();
let mut meta_plugins_map =
std::collections::HashMap::with_capacity(MetaPluginType::iter().count());
let mut enabled_meta_plugins_vec = Vec::new();
// Sort meta plugin types by their string representation to avoid creating plugins just for sorting
@@ -132,18 +150,22 @@ pub fn generate_status_info(
sorted_meta_plugins.sort_by_key(|meta_plugin_type| meta_plugin_type.to_string());
for meta_plugin_type in sorted_meta_plugins {
log::debug!(
"STATUS: Processing meta plugin type: {:?}",
meta_plugin_type
log::debug!("STATUS: Processing meta plugin type: {meta_plugin_type:?}");
let meta_plugin =
match crate::meta_plugin::get_meta_plugin(meta_plugin_type.clone(), None, None) {
Ok(p) => p,
Err(e) => {
log::warn!(
"STATUS: Skipping unregistered meta plugin {meta_plugin_type:?}: {e}"
);
log::debug!("STATUS: About to call get_meta_plugin");
let meta_plugin = crate::meta_plugin::get_meta_plugin(meta_plugin_type.clone(), None, None);
log::debug!("STATUS: Created meta plugin instance");
continue;
}
};
// Get meta name first to avoid borrowing issues
log::debug!("STATUS: Getting meta name...");
let meta_name = meta_plugin.meta_type().to_string();
log::debug!("STATUS: Got meta name: {}", meta_name);
log::debug!("STATUS: Got meta name: {meta_name}");
// Check if this plugin is enabled
let is_enabled = enabled_meta_plugins.contains(&meta_plugin_type);
@@ -177,12 +199,26 @@ pub fn generate_status_info(
);
}
StatusInfo {
// Populate filter plugin info from the global registry
let filter_plugins_map = crate::services::filter_service::get_available_filter_plugins()?;
let filter_plugins_info: Vec<FilterPluginInfo> = filter_plugins_map
.into_iter()
.map(|(name, creator)| {
let plugin = creator();
FilterPluginInfo {
name: name.clone(),
options: plugin.options(),
description: format!("{name} filter plugin"),
}
})
.collect();
Ok(StatusInfo {
paths: path_info,
compression: compression_info,
meta_plugins: meta_plugins_map,
enabled_meta_plugins: enabled_meta_plugins_vec,
filter_plugins: Vec::new(),
filter_plugins: filter_plugins_info,
configured_meta_plugins: None,
}
})
}

View File

@@ -42,7 +42,7 @@ impl CompressionEngine for CompressionEngineGZip {
("<INTERNAL>".to_string(), "".to_string(), "".to_string())
}
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read>> {
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read + Send>> {
debug!("COMPRESSION: Opening {:?} using {:?}", file_path, *self);
let file = File::open(file_path)?;
@@ -84,7 +84,7 @@ impl<W: Write> Drop for AutoFinishGzEncoder<W> {
if let Some(encoder) = self.encoder.take() {
debug!("COMPRESSION: Finishing");
if let Err(e) = encoder.finish() {
warn!("Failed to finish GZip encoder: {}", e);
warn!("Failed to finish GZip encoder: {e}");
}
}
}
@@ -93,10 +93,22 @@ impl<W: Write> Drop for AutoFinishGzEncoder<W> {
#[cfg(feature = "gzip")]
impl<W: Write> Write for AutoFinishGzEncoder<W> {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
self.encoder.as_mut().unwrap().write(buf)
match self.encoder.as_mut() {
Some(encoder) => encoder.write(buf),
None => Err(io::Error::new(
io::ErrorKind::BrokenPipe,
"encoder already finished",
)),
}
}
fn flush(&mut self) -> io::Result<()> {
self.encoder.as_mut().unwrap().flush()
match self.encoder.as_mut() {
Some(encoder) => encoder.flush(),
None => Err(io::Error::new(
io::ErrorKind::BrokenPipe,
"encoder already finished",
)),
}
}
}

View File

@@ -1,25 +1,36 @@
#[cfg(feature = "lz4")]
use anyhow::Result;
#[cfg(feature = "lz4")]
use log::*;
#[cfg(feature = "lz4")]
use std::io::Write;
#[cfg(feature = "lz4")]
use lz4_flex::frame::{FrameDecoder, FrameEncoder};
#[cfg(feature = "lz4")]
use std::fs::File;
#[cfg(feature = "lz4")]
use std::io::Read;
#[cfg(feature = "lz4")]
use std::path::PathBuf;
#[cfg(feature = "lz4")]
use crate::compression_engine::CompressionEngine;
#[cfg(feature = "lz4")]
#[derive(Debug, Eq, PartialEq, Clone, Default)]
pub struct CompressionEngineLZ4 {}
#[cfg(feature = "lz4")]
impl CompressionEngineLZ4 {
pub fn new() -> CompressionEngineLZ4 {
CompressionEngineLZ4 {}
}
}
#[cfg(feature = "lz4")]
impl CompressionEngine for CompressionEngineLZ4 {
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read>> {
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read + Send>> {
debug!("COMPRESSION: Opening {:?} using {:?}", file_path, *self);
let file = File::open(file_path)?;

View File

@@ -7,16 +7,15 @@ use strum::{Display, EnumIter, EnumString};
use log::*;
use lazy_static::lazy_static;
extern crate enum_map;
use enum_map::enum_map;
use enum_map::{Enum, EnumMap};
pub mod gzip;
pub mod lz4;
pub mod none;
pub mod program;
pub mod raw;
pub mod zstd;
use crate::compression_engine::program::CompressionEngineProgram;
@@ -28,19 +27,24 @@ use crate::compression_engine::program::CompressionEngineProgram;
///
/// # Examples
///
/// ```
/// use keep::compression_engine::CompressionType;
/// ```ignore
/// assert_eq!(CompressionType::GZip.to_string(), "gzip");
/// ```
#[derive(Debug, Eq, PartialEq, Clone, EnumIter, Display, EnumString, enum_map::Enum)]
#[strum(ascii_case_insensitive)]
pub enum CompressionType {
#[strum(serialize = "lz4")]
LZ4,
#[strum(serialize = "gzip")]
GZip,
#[strum(serialize = "bzip2")]
BZip2,
#[strum(serialize = "xz")]
XZ,
#[strum(serialize = "zstd")]
ZStd,
None,
#[strum(to_string = "raw", serialize = "raw", serialize = "none")]
Raw,
}
/// Trait defining the interface for compression engines.
@@ -73,14 +77,14 @@ pub trait CompressionEngine: Send + Sync {
///
/// # Returns
///
/// * `Result<Box<dyn Read>>` - A boxed reader that decompresses the file on read,
/// * `Result<Box<dyn Read + Send>>` - A boxed reader that decompresses the file on read,
/// or an error if the file cannot be opened or is invalid.
///
/// # Errors
///
/// Returns an error if the file does not exist, is not a valid compressed file,
/// or if decompression fails.
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read>>;
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read + Send>>;
/// Creates a new compressed file for writing.
///
@@ -174,10 +178,14 @@ impl Clone for Box<dyn CompressionEngine> {
}
}
lazy_static! {
static ref COMPRESSION_ENGINES: EnumMap<CompressionType, Box<dyn CompressionEngine>> = {
let mut em = enum_map! {
CompressionType::LZ4 => Box::new(crate::compression_engine::lz4::CompressionEngineLZ4::new()) as Box<dyn CompressionEngine>,
fn init_compression_engines() -> EnumMap<CompressionType, Box<dyn CompressionEngine>> {
#[allow(unused_mut)]
let mut em: EnumMap<CompressionType, Box<dyn CompressionEngine>> = enum_map! {
CompressionType::LZ4 => Box::new(crate::compression_engine::program::CompressionEngineProgram::new(
"lz4",
vec!["-c"],
vec!["-d", "-c"]
)) as Box<dyn CompressionEngine>,
CompressionType::GZip => Box::new(crate::compression_engine::program::CompressionEngineProgram::new(
"gzip",
vec!["-c"],
@@ -198,7 +206,7 @@ lazy_static! {
vec!["-c"],
vec!["-d", "-c"]
)) as Box<dyn CompressionEngine>,
CompressionType::None => Box::new(crate::compression_engine::none::CompressionEngineNone::new()) as Box<dyn CompressionEngine>
CompressionType::Raw => Box::new(crate::compression_engine::raw::CompressionEngineRaw::new()) as Box<dyn CompressionEngine>
};
#[cfg(feature = "gzip")]
@@ -208,10 +216,27 @@ lazy_static! {
as Box<dyn CompressionEngine>;
}
#[cfg(feature = "lz4")]
{
em[CompressionType::LZ4] =
Box::new(crate::compression_engine::lz4::CompressionEngineLZ4::new())
as Box<dyn CompressionEngine>;
}
#[cfg(feature = "zstd")]
{
em[CompressionType::ZStd] =
Box::new(crate::compression_engine::zstd::CompressionEngineZstd::new())
as Box<dyn CompressionEngine>;
}
em
};
}
static COMPRESSION_ENGINES: std::sync::LazyLock<
EnumMap<CompressionType, Box<dyn CompressionEngine>>,
> = std::sync::LazyLock::new(init_compression_engines);
pub fn default_compression_type() -> CompressionType {
CompressionType::LZ4
}
@@ -221,9 +246,6 @@ pub fn get_compression_engine(ct: CompressionType) -> Result<Box<dyn Compression
if engine.is_supported() {
Ok(engine.clone())
} else {
Err(anyhow!(
"Compression engine for {} is not supported",
ct.to_string()
))
Err(anyhow!("Compression engine for {ct} is not supported",))
}
}

View File

@@ -15,7 +15,13 @@ pub struct ProgramReader {
impl Read for ProgramReader {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
self.stdout.as_mut().unwrap().read(buf)
match self.stdout.as_mut() {
Some(stdout) => stdout.read(buf),
None => Err(std::io::Error::new(
std::io::ErrorKind::BrokenPipe,
"stdout already taken",
)),
}
}
}
@@ -33,11 +39,23 @@ pub struct ProgramWriter {
impl Write for ProgramWriter {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
self.stdin.as_mut().unwrap().write(buf)
match self.stdin.as_mut() {
Some(stdin) => stdin.write(buf),
None => Err(std::io::Error::new(
std::io::ErrorKind::BrokenPipe,
"stdin already taken",
)),
}
}
fn flush(&mut self) -> std::io::Result<()> {
self.stdin.as_mut().unwrap().flush()
match self.stdin.as_mut() {
Some(stdin) => stdin.flush(),
None => Err(std::io::Error::new(
std::io::ErrorKind::BrokenPipe,
"stdin already taken",
)),
}
}
}
@@ -94,16 +112,13 @@ impl CompressionEngine for CompressionEngineProgram {
)
}
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read>> {
debug!("COMPRESSION: Opening {:?} using {:?}", file_path, *self);
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read + Send>> {
debug!("COMPRESSION: Opening {file_path:?} using {self:?}");
let program = self.program.clone();
let args = self.decompress.clone();
debug!(
"COMPRESSION: Executing command: {:?} {:?} reading from {:?}",
program, args, file_path
);
debug!("COMPRESSION: Executing command: {program:?} {args:?} reading from {file_path:?}");
let file = File::open(file_path).context("Unable to open file for reading")?;
@@ -130,15 +145,12 @@ impl CompressionEngine for CompressionEngineProgram {
}
fn create(&self, file_path: PathBuf) -> Result<Box<dyn Write>> {
debug!("COMPRESSION: Writing to {:?} using {:?}", file_path, *self);
debug!("COMPRESSION: Writing to {file_path:?} using {self:?}");
let program = self.program.clone();
let args = self.compress.clone();
debug!(
"COMPRESSION: Executing command: {:?} {:?} writing to {:?}",
program, args, file_path
);
debug!("COMPRESSION: Executing command: {program:?} {args:?} writing to {file_path:?}");
let file = File::create(file_path).context("Unable to open file for writing")?;

View File

@@ -7,15 +7,15 @@ use std::path::PathBuf;
use crate::compression_engine::CompressionEngine;
#[derive(Debug, Eq, PartialEq, Clone, Default)]
pub struct CompressionEngineNone {}
pub struct CompressionEngineRaw {}
impl CompressionEngineNone {
pub fn new() -> CompressionEngineNone {
CompressionEngineNone {}
impl CompressionEngineRaw {
pub fn new() -> CompressionEngineRaw {
CompressionEngineRaw {}
}
}
impl CompressionEngine for CompressionEngineNone {
impl CompressionEngine for CompressionEngineRaw {
fn is_supported(&self) -> bool {
true
}
@@ -24,7 +24,7 @@ impl CompressionEngine for CompressionEngineNone {
("<INTERNAL>".to_string(), "".to_string(), "".to_string())
}
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read>> {
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read + Send>> {
debug!("COMPRESSION: Opening {:?} using {:?}", file_path, *self);
Ok(Box::new(File::open(file_path)?))
}

View File

@@ -0,0 +1,54 @@
#[cfg(feature = "zstd")]
use anyhow::Result;
#[cfg(feature = "zstd")]
use log::*;
#[cfg(feature = "zstd")]
use std::io::Write;
#[cfg(feature = "zstd")]
use std::fs::File;
#[cfg(feature = "zstd")]
use std::io::Read;
#[cfg(feature = "zstd")]
use std::path::PathBuf;
#[cfg(feature = "zstd")]
use zstd::stream::read::Decoder;
#[cfg(feature = "zstd")]
use zstd::stream::write::Encoder;
#[cfg(feature = "zstd")]
use crate::compression_engine::CompressionEngine;
#[cfg(feature = "zstd")]
#[derive(Debug, Eq, PartialEq, Clone, Default)]
pub struct CompressionEngineZstd {}
#[cfg(feature = "zstd")]
impl CompressionEngineZstd {
pub fn new() -> CompressionEngineZstd {
CompressionEngineZstd {}
}
}
#[cfg(feature = "zstd")]
impl CompressionEngine for CompressionEngineZstd {
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read + Send>> {
debug!("COMPRESSION: Opening {:?} using {:?}", file_path, *self);
let file = File::open(file_path)?;
Ok(Box::new(Decoder::new(file)?))
}
fn create(&self, file_path: PathBuf) -> Result<Box<dyn Write>> {
debug!("COMPRESSION: Writing to {:?} using {:?}", file_path, *self);
let file = File::create(file_path)?;
let zstd_write = Encoder::new(file, 3)?.auto_finish();
Ok(Box::new(zstd_write))
}
fn clone_box(&self) -> Box<dyn CompressionEngine> {
Box::new(self.clone())
}
}

View File

@@ -4,7 +4,7 @@ use dirs;
use log::{debug, error};
use serde::{Deserialize, Serialize};
use std::fs;
use std::path::PathBuf;
use std::path::{Path, PathBuf};
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
#[serde(rename_all = "lowercase")]
@@ -143,9 +143,16 @@ impl<'de> serde::Deserialize<'de> for ColumnConfig {
pub struct ServerConfig {
pub address: Option<String>,
pub port: Option<u16>,
pub username: Option<String>,
pub password_file: Option<PathBuf>,
pub password: Option<String>,
pub password_hash: Option<String>,
pub jwt_secret: Option<String>,
pub jwt_secret_file: Option<PathBuf>,
pub cert_file: Option<PathBuf>,
pub key_file: Option<PathBuf>,
pub cors_origin: Option<String>,
pub max_body_size: Option<u64>,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
@@ -153,6 +160,14 @@ pub struct CompressionPluginConfig {
pub name: String,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct ClientConfig {
pub url: Option<String>,
pub username: Option<String>,
pub password: Option<String>,
pub jwt: Option<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
#[cfg_attr(feature = "server", derive(utoipa::ToSchema))]
pub struct MetaPluginConfig {
@@ -170,11 +185,14 @@ pub struct MetaPluginConfig {
pub struct Settings {
#[serde(default)]
pub dir: PathBuf,
#[serde(default)]
pub list_format: Vec<ColumnConfig>,
#[serde(default)]
pub table_config: TableConfig,
#[serde(default)]
pub human_readable: bool,
#[serde(default)]
pub ids_only: bool,
pub output_format: Option<String>,
#[serde(default)]
pub quiet: bool,
@@ -183,45 +201,62 @@ pub struct Settings {
pub server: Option<ServerConfig>,
pub compression_plugin: Option<CompressionPluginConfig>,
pub meta_plugins: Option<Vec<MetaPluginConfig>>,
pub client: Option<ClientConfig>,
// Non-serializable fields populated from CLI args
#[serde(skip)]
pub client_url: Option<String>,
#[serde(skip)]
pub client_username: Option<String>,
#[serde(skip)]
pub client_password: Option<String>,
#[serde(skip)]
pub client_jwt: Option<String>,
// Metadata key-value pairs from --meta CLI flag
#[serde(skip)]
pub meta: Vec<(String, Option<String>)>,
// Export filename format template (--export-filename-format)
#[serde(skip)]
pub export_filename_format: String,
// Export name for {name} variable (--export-name)
#[serde(skip)]
pub export_name: Option<String>,
// Import data file path (--import-data-file)
#[serde(skip)]
pub import_data_file: Option<std::path::PathBuf>,
}
impl Settings {
/// Create unified settings from config and args with proper priority
pub fn new(args: &Args, default_dir: PathBuf) -> Result<Self> {
debug!(
"CONFIG: Creating settings with default dir: {:?}",
default_dir
);
debug!("CONFIG: Creating settings with default dir: {default_dir:?}");
let config_path = if let Some(config_path) = &args.options.config {
config_path.clone()
} else if let Ok(env_config) = std::env::var("KEEP_CONFIG") {
PathBuf::from(env_config)
} else {
let default_path = if let Ok(home_dir) = std::env::var("HOME") {
let mut path = PathBuf::from(home_dir);
path.push(".config");
path.push("keep");
path.push("config.yml");
path
} else {
PathBuf::from("~/.config/keep/config.yml")
};
debug!("CONFIG: Using default config path: {:?}", default_path);
let default_path = dirs::config_dir()
.map(|mut p| {
p.push("keep");
p.push("config.yml");
p
})
.unwrap_or_else(|| PathBuf::from("~/.config/keep/config.yml"));
debug!("CONFIG: Using default config path: {default_path:?}");
default_path
};
debug!("CONFIG: Using config path: {:?}", config_path);
debug!("CONFIG: Using config path: {config_path:?}");
let mut config_builder = config::Config::builder();
// Load config file if it exists
if config_path.exists() {
debug!("CONFIG: Loading config file: {:?}", config_path);
debug!("CONFIG: Loading config file: {config_path:?}");
config_builder =
config_builder.add_source(config::File::from(config_path.clone()).required(false));
} else {
debug!("CONFIG: Config file does not exist: {:?}", config_path);
debug!("CONFIG: Config file does not exist: {config_path:?}");
}
// Add environment variables
@@ -233,14 +268,22 @@ impl Settings {
// Override with CLI args
if let Some(dir) = &args.options.dir {
debug!("CONFIG: Overriding dir with CLI arg: {:?}", dir);
config_builder = config_builder.set_override("dir", dir.to_str().unwrap())?;
debug!("CONFIG: Overriding dir with CLI arg: {dir:?}");
config_builder = config_builder.set_override(
"dir",
dir.to_str()
.ok_or_else(|| anyhow::anyhow!("non-UTF-8 directory path"))?,
)?;
}
if args.options.human_readable {
config_builder = config_builder.set_override("human_readable", true)?;
}
if args.options.ids_only {
config_builder = config_builder.set_override("ids_only", true)?;
}
if let Some(output_format) = &args.options.output_format {
config_builder =
config_builder.set_override("output_format", output_format.as_str())?;
@@ -258,50 +301,66 @@ impl Settings {
config_builder = config_builder.set_override("force", true)?;
}
#[cfg(feature = "server")]
if let Some(server_password) = &args.options.server_password {
config_builder =
config_builder.set_override("server.password", server_password.as_str())?;
}
#[cfg(feature = "server")]
if let Some(server_password_hash) = &args.options.server_password_hash {
config_builder = config_builder
.set_override("server.password_hash", server_password_hash.as_str())?;
}
#[cfg(feature = "server")]
if let Some(server_username) = &args.options.server_username {
config_builder =
config_builder.set_override("server.username", server_username.as_str())?;
}
#[cfg(feature = "server")]
if let Some(server_address) = &args.mode.server_address {
config_builder =
config_builder.set_override("server.address", server_address.as_str())?;
}
#[cfg(feature = "server")]
if let Some(server_port) = args.mode.server_port {
config_builder = config_builder.set_override("server.port", server_port)?;
}
#[cfg(feature = "server")]
if let Some(server_cert) = &args.mode.server_cert {
config_builder = config_builder
.set_override("server.cert_file", server_cert.to_string_lossy().as_ref())?;
}
#[cfg(feature = "server")]
if let Some(server_key) = &args.mode.server_key {
config_builder = config_builder
.set_override("server.key_file", server_key.to_string_lossy().as_ref())?;
}
#[cfg(feature = "server")]
if let Some(max_body_size) = args.options.server_max_body_size {
config_builder = config_builder.set_override("server.max_body_size", max_body_size)?;
}
if let Some(compression) = &args.item.compression {
config_builder =
config_builder.set_override("compression_plugin.name", compression.as_str())?;
}
if !args.item.meta_plugins.is_empty() {
let meta_plugins: Vec<std::collections::HashMap<String, String>> = args
.item
.meta_plugins
.iter()
.map(|name| {
let mut map = std::collections::HashMap::new();
map.insert("name".to_string(), name.clone());
map
})
.collect();
config_builder = config_builder.set_override("meta_plugins", meta_plugins)?;
}
// Build MetaPluginConfig entries from --meta-plugin args (name[:json])
// These are handled after config deserialization (see below).
let config = config_builder.build()?;
debug!("CONFIG: Built config, attempting to deserialize");
match config.try_deserialize::<Settings>() {
Ok(mut settings) => {
debug!("CONFIG: Successfully deserialized settings: {:?}", settings);
debug!("CONFIG: Successfully deserialized settings: {settings:?}");
// Set defaults for list_format if not provided
if settings.list_format.is_empty() {
@@ -390,17 +449,133 @@ impl Settings {
}]);
}
// Override meta_plugins from --meta-plugin CLI args
if !args.item.meta_plugins.is_empty() {
debug!("CONFIG: Overriding meta_plugins from --meta-plugin CLI args");
let cli_plugins: Vec<MetaPluginConfig> = args
.item
.meta_plugins
.iter()
.map(|arg| {
let mut options = std::collections::HashMap::new();
let mut outputs = std::collections::HashMap::new();
if let Some(serde_json::Value::Object(obj)) = &arg.options {
// Extract options and outputs from JSON value
if let Some(serde_json::Value::Object(opts_obj)) =
obj.get("options")
{
for (k, v) in opts_obj {
let yaml_str = serde_json::to_string(v).unwrap_or_default();
let yaml_val: serde_yaml::Value =
serde_yaml::from_str(&yaml_str)
.unwrap_or(serde_yaml::Value::Null);
options.insert(k.clone(), yaml_val);
}
}
if let Some(serde_json::Value::Object(outs_obj)) =
obj.get("outputs")
{
for (k, v) in outs_obj {
let val_str = match v {
serde_json::Value::String(s) => s.clone(),
_ => v.to_string(),
};
outputs.insert(k.clone(), val_str);
}
}
}
MetaPluginConfig {
name: arg.name.clone(),
options,
outputs,
}
})
.collect();
settings.meta_plugins = Some(cli_plugins);
}
// Override list_format from --list-format CLI arg
if args.options.list_format
!= "id,time,size,meta:text_line_count,tags,meta:hostname_short,meta:command"
{
debug!("CONFIG: Overriding list_format from --list-format CLI arg");
settings.list_format = Settings::parse_list_format(&args.options.list_format);
}
// Set dir to default if not provided or is empty
if settings.dir == PathBuf::new() {
debug!("CONFIG: Setting default dir: {:?}", default_dir);
debug!("CONFIG: Setting default dir: {default_dir:?}");
settings.dir = default_dir;
}
debug!("CONFIG: Final settings: {:?}", settings);
// Populate client settings from CLI args and config
#[cfg(feature = "client")]
{
settings.client_url = args
.options
.client_url
.clone()
.or_else(|| settings.client.as_ref().and_then(|c| c.url.clone()));
settings.client_username = args
.options
.client_username
.clone()
.or_else(|| settings.client.as_ref().and_then(|c| c.username.clone()));
settings.client_password = args
.options
.client_password
.clone()
.or_else(|| settings.client.as_ref().and_then(|c| c.password.clone()));
settings.client_jwt = args
.options
.client_jwt
.clone()
.or_else(|| settings.client.as_ref().and_then(|c| c.jwt.clone()));
}
// Parse --meta key=value and bare key arguments
settings.meta = args
.item
.meta
.iter()
.map(|s| {
if let Some((key, value)) = s.split_once('=') {
(key.to_string(), Some(value.to_string()))
} else {
(s.to_string(), None)
}
})
.collect();
// Set export filename format from CLI args
settings.export_filename_format = args.item.export_filename_format.clone();
settings.export_name = args.item.export_name.clone();
settings.import_data_file = args.item.import_data_file.clone();
// Expand ~ in all path fields
settings.dir = Settings::expand_tilde(&settings.dir);
settings.import_data_file = settings
.import_data_file
.as_ref()
.map(|p| Settings::expand_tilde(p));
if let Some(ref mut server) = settings.server {
server.password_file = server
.password_file
.as_ref()
.map(|p| Settings::expand_tilde(p));
server.jwt_secret_file = server
.jwt_secret_file
.as_ref()
.map(|p| Settings::expand_tilde(p));
server.cert_file = server.cert_file.as_ref().map(|p| Settings::expand_tilde(p));
server.key_file = server.key_file.as_ref().map(|p| Settings::expand_tilde(p));
}
debug!("CONFIG: Final settings: {settings:?}");
Ok(settings)
}
Err(e) => {
error!("CONFIG: Failed to deserialize settings: {}", e);
error!("CONFIG: Failed to deserialize settings: {e}");
Err(e.into())
}
}
@@ -408,24 +583,42 @@ impl Settings {
pub fn default_dir() -> anyhow::Result<PathBuf> {
let mut path =
dirs::home_dir().ok_or_else(|| anyhow::anyhow!("No home directory found"))?;
path.push(".keep");
dirs::data_dir().ok_or_else(|| anyhow::anyhow!("No data directory found"))?;
path.push("keep");
if !path.exists() {
std::fs::create_dir_all(&path)?;
}
Ok(path)
}
/// Expand a leading `~` in a path to the user's home directory.
///
/// Returns the path unchanged if it doesn't start with `~` or if the
/// home directory cannot be determined.
fn expand_tilde(path: &Path) -> PathBuf {
let path_str = path.to_string_lossy();
if let Some(rest) = path_str.strip_prefix("~/") {
if let Some(home) = dirs::home_dir() {
return home.join(rest);
}
} else if path_str == "~" {
if let Some(home) = dirs::home_dir() {
return home;
}
}
path.to_path_buf()
}
/// Get server password from password_file or directly from config if configured
pub fn get_server_password(&self) -> Result<Option<String>> {
if let Some(server) = &self.server {
// First check for password_file
if let Some(password_file) = &server.password_file {
debug!("CONFIG: Reading password from file: {:?}", password_file);
let password = fs::read_to_string(password_file)
.with_context(|| format!("Failed to read password file: {:?}", password_file))?
.trim()
.to_string();
debug!("CONFIG: Reading password from file: {password_file:?}");
let password = fs::read(password_file)
.with_context(|| format!("Failed to read password file: {password_file:?}"))?;
let end = password.len().min(4096);
let password = String::from_utf8_lossy(&password[..end]).trim().to_string();
return Ok(Some(password));
}
@@ -447,6 +640,37 @@ impl Settings {
self.server.as_ref().and_then(|s| s.password_hash.clone())
}
pub fn server_username(&self) -> Option<String> {
self.server.as_ref().and_then(|s| s.username.clone())
}
/// Get JWT secret from jwt_secret_file or directly from config if configured
pub fn get_server_jwt_secret(&self) -> Result<Option<String>> {
if let Some(server) = &self.server {
// First check for jwt_secret_file
if let Some(jwt_secret_file) = &server.jwt_secret_file {
debug!("CONFIG: Reading JWT secret from file: {jwt_secret_file:?}");
let secret = fs::read(jwt_secret_file).with_context(|| {
format!("Failed to read JWT secret file: {jwt_secret_file:?}")
})?;
let end = secret.len().min(4096);
let secret = String::from_utf8_lossy(&secret[..end]).trim().to_string();
return Ok(Some(secret));
}
// Fall back to direct jwt_secret field
if let Some(secret) = &server.jwt_secret {
debug!("CONFIG: Using JWT secret from config");
return Ok(Some(secret.clone()));
}
}
Ok(None)
}
pub fn server_jwt_secret(&self) -> Option<String> {
self.get_server_jwt_secret().ok().flatten()
}
pub fn server_address(&self) -> Option<String> {
self.server.as_ref().and_then(|s| s.address.clone())
}
@@ -455,6 +679,18 @@ impl Settings {
self.server.as_ref().and_then(|s| s.port)
}
pub fn server_cert_file(&self) -> Option<PathBuf> {
self.server.as_ref().and_then(|s| s.cert_file.clone())
}
pub fn server_key_file(&self) -> Option<PathBuf> {
self.server.as_ref().and_then(|s| s.key_file.clone())
}
pub fn server_cors_origin(&self) -> Option<String> {
self.server.as_ref().and_then(|s| s.cors_origin.clone())
}
pub fn compression(&self) -> Option<String> {
self.compression_plugin.as_ref().map(|c| c.name.clone())
}
@@ -465,4 +701,142 @@ impl Settings {
.map(|plugins| plugins.iter().map(|p| p.name.clone()).collect())
.unwrap_or_default()
}
/// Returns the metadata filter as a HashMap.
///
/// Converts the `meta` field (list of key-value pairs from CLI --meta flags)
/// into a `HashMap<String, Option<String>>` suitable for filtering.
pub fn meta_filter(&self) -> std::collections::HashMap<String, Option<String>> {
self.meta.iter().cloned().collect()
}
/// Validates the configuration against plugin schemas.
///
/// Checks that:
/// - All configured meta plugin names are valid and registered
/// - Required options are present for each meta plugin
/// - Compression plugin name (if set) is a valid compression type
///
/// Returns a list of warning strings. An empty list means the config is valid.
pub fn validate_config(&self) -> Vec<String> {
use crate::common::schema::gather_meta_plugin_schemas;
use crate::compression_engine::CompressionType;
use strum::IntoEnumIterator;
let mut warnings = Vec::new();
// Validate compression plugin
if let Some(ref comp) = self.compression_plugin {
let valid_types: Vec<String> =
CompressionType::iter().map(|ct| ct.to_string()).collect();
if !valid_types.contains(&comp.name) {
warnings.push(format!(
"Unknown compression_plugin.name: '{}'. Valid types: {}",
comp.name,
valid_types.join(", ")
));
}
}
// Validate meta plugins
if let Some(ref plugins) = self.meta_plugins {
let schemas = gather_meta_plugin_schemas();
let schema_map: std::collections::HashMap<&str, &crate::common::schema::PluginSchema> =
schemas.iter().map(|s| (s.name.as_str(), s)).collect();
for plugin in plugins {
match schema_map.get(plugin.name.as_str()) {
Some(schema) => {
// Check required options
for opt in &schema.options {
if opt.required && !plugin.options.contains_key(&opt.name) {
warnings.push(format!(
"Meta plugin '{}': missing required option '{}'",
plugin.name, opt.name
));
}
}
}
None => {
warnings.push(format!(
"Unknown meta plugin: '{}'. Available: {}",
plugin.name,
schema_map.keys().copied().collect::<Vec<_>>().join(", ")
));
}
}
}
}
warnings
}
/// Parse a comma-separated column list string into Vec<ColumnConfig>.
///
/// Maps known column names to their default labels and alignment.
/// For unknown names (including meta:* columns), uses the name as its own label.
fn parse_list_format(input: &str) -> Vec<ColumnConfig> {
input
.split(',')
.map(|s| s.trim())
.filter(|s| !s.is_empty())
.map(|name| {
let (label, align) = match name {
"id" => ("Item", ColumnAlignment::Right),
"time" => ("Time", ColumnAlignment::Right),
"size" => ("Size", ColumnAlignment::Right),
"meta:text_line_count" => ("Lines", ColumnAlignment::Right),
"meta:token_count" => ("Tokens", ColumnAlignment::Right),
"tags" => ("Tags", ColumnAlignment::Left),
"meta:hostname_short" => ("Host", ColumnAlignment::Left),
"meta:hostname" => ("Host", ColumnAlignment::Left),
"meta:command" => ("Command", ColumnAlignment::Left),
"compression" => ("Compression", ColumnAlignment::Left),
other if other.starts_with("meta:") => {
let sub = other.strip_prefix("meta:").unwrap_or(other);
(sub, ColumnAlignment::Left)
}
other => (other, ColumnAlignment::Left),
};
ColumnConfig {
name: name.to_string(),
label: label.to_string(),
align,
..Default::default()
}
})
.collect()
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::path::Path;
#[test]
fn test_expand_tilde_with_slash() {
let home = dirs::home_dir().unwrap();
let result = Settings::expand_tilde(Path::new("~/foo/bar"));
assert_eq!(result, home.join("foo/bar"));
}
#[test]
fn test_expand_tilde_bare() {
let home = dirs::home_dir().unwrap();
let result = Settings::expand_tilde(Path::new("~"));
assert_eq!(result, home);
}
#[test]
fn test_expand_tilde_absolute() {
let result = Settings::expand_tilde(Path::new("/etc/keep"));
assert_eq!(result, PathBuf::from("/etc/keep"));
}
#[test]
fn test_expand_tilde_relative() {
let result = Settings::expand_tilde(Path::new("foo/bar"));
assert_eq!(result, PathBuf::from("foo/bar"));
}
}

671
src/db.rs

File diff suppressed because it is too large Load Diff

167
src/export_tar.rs Normal file
View File

@@ -0,0 +1,167 @@
use anyhow::{Context, Result, anyhow};
use log::debug;
use std::collections::HashSet;
use std::fs;
use std::io::{Read, Seek, Write};
use std::path::Path;
use tar::{Builder, Header};
use crate::filter_plugin::FilterChain;
use crate::modes::common::ExportMeta;
use crate::services::item_service::ItemService;
use crate::services::types::ItemWithMeta;
/// Compute the intersection of all items' tag sets.
///
/// Returns sorted tags that are present on ALL items.
pub fn common_tags(items: &[ItemWithMeta]) -> Vec<String> {
if items.is_empty() {
return Vec::new();
}
let mut common: HashSet<String> = items[0].tag_names().into_iter().collect();
for item in items.iter().skip(1) {
let item_tags: HashSet<String> = item.tag_names().into_iter().collect();
common = common.intersection(&item_tags).cloned().collect();
}
let mut result: Vec<String> = common.into_iter().collect();
result.sort();
result
}
/// Resolve the export name from the CLI arg or compute default from common tags.
///
/// If `arg` is Some, uses that value directly.
/// Otherwise, computes `export_<common-tags>` or just `export` if no common tags.
pub fn export_name(arg: &Option<String>, items: &[ItemWithMeta]) -> String {
if let Some(name) = arg {
return name.clone();
}
let tags = common_tags(items);
if tags.is_empty() {
"export".to_string()
} else {
format!("export_{}", tags.join("_"))
}
}
/// Write items to a tar archive, streaming data without loading files into memory.
///
/// The archive contains `<dir_name>/<id>.data.<compression>` and
/// `<dir_name>/<id>.meta.yml` for each item.
///
/// # Arguments
///
/// * `writer` - The output writer (e.g., a File).
/// * `dir_name` - Top-level directory name inside the tar.
/// * `items` - Items to export.
/// * `data_path` - Path to the data storage directory.
/// * `filter_chain` - Optional filter chain for transforming content on export.
/// * `item_service` - Item service for streaming content.
/// * `conn` - Database connection for filter chain operations.
pub fn write_export_tar<W: Write>(
writer: W,
dir_name: &str,
items: &[ItemWithMeta],
data_path: &Path,
filter_chain: Option<&FilterChain>,
item_service: &ItemService,
conn: &rusqlite::Connection,
) -> Result<()> {
let mut builder = Builder::new(writer);
for item_with_meta in items {
let item_id = item_with_meta.item.id.context("Item missing ID")?;
let compression = &item_with_meta.item.compression;
let item_tags = item_with_meta.tag_names();
let meta_map = item_with_meta.meta_as_map();
let data_path_entry = format!("{dir_name}/{item_id}.data.{compression}");
let meta_path_entry = format!("{dir_name}/{item_id}.meta.yml");
// Meta entry (small, in-memory is fine)
let export_meta = ExportMeta {
ts: item_with_meta.item.ts,
compression: compression.clone(),
uncompressed_size: item_with_meta.item.uncompressed_size,
tags: item_tags,
metadata: meta_map,
};
let meta_yaml = serde_yaml::to_string(&export_meta)?;
let meta_bytes = meta_yaml.into_bytes();
let meta_len = meta_bytes.len() as u64;
let mut meta_header = Header::new_gnu();
meta_header.set_size(meta_len);
meta_header.set_mode(0o644);
meta_header.set_path(&meta_path_entry)?;
meta_header.set_cksum();
builder
.append(&meta_header, meta_bytes.as_slice())
.with_context(|| format!("Cannot write meta entry for item {item_id}"))?;
debug!("EXPORT_TAR: Wrote meta entry {meta_path_entry}");
// Data entry
let mut item_file_path = data_path.to_path_buf();
item_file_path.push(item_id.to_string());
if let Some(chain) = filter_chain {
// Filtered export: spool through filter chain to a temp file,
// then stream the temp file into the tar with known size.
let (mut reader, _, _) = item_service.get_item_content_info_streaming_with_chain(
conn,
item_id,
Some(chain),
)?;
let mut tmp = tempfile::NamedTempFile::new()
.context("Cannot create temp file for filtered export")?;
let mut buf = [0u8; crate::common::PIPESIZE];
loop {
let n = reader.read(&mut buf)?;
if n == 0 {
break;
}
tmp.write_all(&buf[..n])?;
}
tmp.flush()?;
let total_size = tmp.as_file().metadata()?.len();
tmp.rewind()?;
let mut data_header = Header::new_gnu();
data_header.set_size(total_size);
data_header.set_mode(0o644);
data_header.set_path(&data_path_entry)?;
data_header.set_cksum();
builder
.append(&data_header, &mut tmp)
.with_context(|| format!("Cannot write data entry for item {item_id}"))?;
debug!("EXPORT_TAR: Wrote filtered data entry {data_path_entry} ({total_size} bytes)");
} else {
// Unfiltered export: stream raw compressed file
let file = fs::File::open(&item_file_path)
.with_context(|| format!("Cannot open data file: {}", item_file_path.display()))?;
let file_size = file.metadata()?.len();
let mut data_header = Header::new_gnu();
data_header.set_size(file_size);
data_header.set_mode(0o644);
data_header.set_path(&data_path_entry)?;
data_header.set_cksum();
builder
.append(&data_header, file)
.with_context(|| format!("Cannot write data entry for item {item_id}"))?;
debug!("EXPORT_TAR: Wrote data entry {data_path_entry} ({file_size} bytes)");
}
}
builder.finish().context("Cannot finalize tar archive")?;
debug!("EXPORT_TAR: Archive finalized");
Ok(())
}

View File

@@ -1,47 +0,0 @@
# This Pest grammar defines the syntax for filter chains used in the Keep application.
# Filters can be chained with commas and may have named or unnamed options with JSON-like values.
WHITESPACE = _{ " " | "\t" | "\n" | "\r" }
# Top-level rule for parsing multiple filters separated by commas.
filters = { filter ~ ("," ~ filters)? }
# A single filter consisting of a name optionally followed by parenthesized options.
filter = { filter_name ~ ("(" ~ options ~ ")")? }
# The name of a filter, starting with an ASCII letter followed by alphanumeric characters or underscores.
filter_name = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
# A list of comma-separated options within parentheses.
options = { option ~ ("," ~ options)? }
# A single option, optionally with a name followed by an equals sign and a value.
option = { (option_name ~ "=")? ~ option_value }
# The name of an option, starting with an ASCII letter followed by alphanumeric characters or underscores.
option_name = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
# The value of an option, which can be a JSON number, string, or boolean.
option_value = {
JSON_NUMBER |
JSON_STRING |
JSON_BOOLEAN
}
# JSON number format supporting integers, decimals, and scientific notation.
JSON_NUMBER = @{
("-")? ~
("0" | ASCII_NONZERO_DIGIT ~ ASCII_DIGIT*) ~
("." ~ ASCII_DIGIT*)? ~
(("e" | "E") ~ ("+" | "-")? ~ ASCII_DIGIT+)?
}
# JSON string format with escaped characters.
JSON_STRING = ${
"\"" ~
(("\\" ~ ANY) | (!("\"" | "\\") ~ ANY))* ~
"\""
}
# JSON boolean values: true or false.
JSON_BOOLEAN = ${ "true" | "false" }

View File

@@ -1,131 +0,0 @@
use pest::Parser;
use pest_derive::Parser;
use std::collections::HashMap;
#[derive(Parser)]
#[grammar = "filter.pest"]
pub struct FilterParser;
#[derive(Debug)]
pub struct Filter {
pub name: String,
pub options: HashMap<String, serde_json::Value>,
}
pub fn parse_filter_string(input: &str) -> Result<Vec<Filter>, Box<dyn std::error::Error>> {
let mut filters = Vec::new();
let pairs = FilterParser::parse(Rule::filters, input)?;
for pair in pairs {
if pair.as_rule() == Rule::filter {
let mut name = String::new();
let mut options = HashMap::new();
for inner_pair in pair.into_inner() {
match inner_pair.as_rule() {
Rule::filter_name => {
name = inner_pair.as_str().to_string();
}
Rule::options => {
for option_pair in inner_pair.into_inner() {
if option_pair.as_rule() == Rule::option {
let mut option_name = None;
let mut option_value = None;
for option_inner in option_pair.into_inner() {
match option_inner.as_rule() {
Rule::option_name => {
option_name = Some(option_inner.as_str().to_string());
}
Rule::option_value => {
option_value = Some(parse_option_value(option_inner.as_str())?);
}
_ => {}
}
}
if let Some(value) = option_value {
// If no name is provided, use the filter name as the key
let key = option_name.unwrap_or_else(|| name.clone());
options.insert(key, value);
}
}
}
}
_ => {}
}
}
filters.push(Filter { name, options });
}
}
Ok(filters)
}
fn parse_option_value(input: &str) -> Result<serde_json::Value, Box<dyn std::error::Error>> {
// Try to parse as number
if let Ok(num) = input.parse::<i64>() {
return Ok(serde_json::Value::Number(num.into()));
}
if let Ok(num) = input.parse::<f64>() {
if let Some(number) = serde_json::Number::from_f64(num) {
return Ok(serde_json::Value::Number(number));
}
}
// Try to parse as boolean
if let Ok(boolean) = input.parse::<bool>() {
return Ok(serde_json::Value::Bool(boolean));
}
// Treat as string (remove quotes if present)
let value = if input.starts_with('"') && input.ends_with('"') {
input[1..input.len()-1].to_string()
} else if input.starts_with('\'') && input.ends_with('\'') {
input[1..input.len()-1].to_string()
} else {
input.to_string()
};
Ok(serde_json::Value::String(value))
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_simple_filter() {
let result = parse_filter_string("grep").unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].name, "grep");
assert!(result[0].options.is_empty());
}
#[test]
fn test_parse_filter_with_options() {
let result = parse_filter_string("head_lines(10)").unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].name, "head_lines");
assert_eq!(result[0].options["head_lines"], 10);
}
#[test]
fn test_parse_filter_with_named_options() {
let result = parse_filter_string("grep(pattern=\"error\")").unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].name, "grep");
assert_eq!(result[0].options["pattern"], "error");
}
#[test]
fn test_parse_multiple_filters() {
let result = parse_filter_string("head_lines(10), grep(pattern=\"error\")").unwrap();
assert_eq!(result.len(), 2);
assert_eq!(result[0].name, "head_lines");
assert_eq!(result[0].options["head_lines"], 10);
assert_eq!(result[1].name, "grep");
assert_eq!(result[1].options["pattern"], "error");
}
}

View File

@@ -1,8 +1,8 @@
use super::{FilterPlugin, FilterOption};
use std::io::{Result, Read, Write};
use std::process::{Command, Stdio, Child};
use which::which;
use super::{FilterOption, FilterPlugin};
use log::*;
use std::io::{Read, Result, Write};
use std::process::{Child, Command, Stdio};
use which::which;
/// A filter that executes an external program and pipes input through it.
///
@@ -43,16 +43,13 @@ impl ExecFilter {
/// let filter = ExecFilter::new("grep", vec!["-i", "error"], false);
/// assert!(filter.supported);
/// ```
pub fn new(
program: &str,
args: Vec<&str>,
split_whitespace: bool,
) -> ExecFilter {
pub fn new(program: &str, args: Vec<&str>, split_whitespace: bool) -> ExecFilter {
let program_path = which(program);
let supported = program_path.is_ok();
ExecFilter {
program: program_path.map_or_else(|| program.to_string(), |p| p.to_string_lossy().to_string()),
program: program_path
.map_or_else(|| program.to_string(), |p| p.to_string_lossy().to_string()),
args: args.iter().map(|s| s.to_string()).collect(),
supported,
split_whitespace,
@@ -101,7 +98,10 @@ impl FilterPlugin for ExecFilter {
));
}
debug!("FILTER_EXEC: Executing command: {} {:?}", self.program, self.args);
debug!(
"FILTER_EXEC: Executing command: {} {:?}",
self.program, self.args
);
// Read all input first
let mut input_data = Vec::new();
@@ -142,8 +142,7 @@ impl FilterPlugin for ExecFilter {
std::io::copy(&mut stdout, writer)?;
// Wait for the child process to finish
let output = child.wait_with_output()
.map_err(|e| {
let output = child.wait_with_output().map_err(|e| {
std::io::Error::new(
std::io::ErrorKind::Other,
format!("Failed to wait on child process: {}", e),
@@ -165,13 +164,6 @@ impl FilterPlugin for ExecFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// Creates a new instance without active process handles.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(ExecFilter {
program: self.program.clone(),
@@ -205,6 +197,10 @@ impl FilterPlugin for ExecFilter {
},
]
}
fn description(&self) -> &str {
"Pipe input through an external command"
}
}
// Register the plugin at module initialization time
@@ -221,5 +217,6 @@ fn register_exec_filter() {
stdin_writer: None,
stdout_reader: None,
})
});
})
.expect("Failed to register exec filter");
}

View File

@@ -34,7 +34,9 @@ pub struct GrepFilter {
/// # Examples
///
/// ```
/// # use keep::filter_plugin::GrepFilter;
/// let filter = GrepFilter::new("error|warn".to_string())?;
/// # Ok::<(), std::io::Error>(())
/// ```
impl GrepFilter {
pub fn new(pattern: String) -> Result<Self> {
@@ -65,7 +67,13 @@ impl GrepFilter {
/// # Examples
///
/// ```
/// # use std::io::{Read, Write, Cursor};
/// # use keep::filter_plugin::{FilterPlugin, GrepFilter};
/// # let mut filter = GrepFilter::new("error".to_string())?;
/// let mut input: &mut dyn Read = &mut Cursor::new(b"error: something failed\nok: all good\n");
/// let mut output = Vec::new();
/// filter.filter(&mut input, &mut output)?;
/// # Ok::<(), std::io::Error>(())
/// ```
impl FilterPlugin for GrepFilter {
fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
@@ -73,25 +81,12 @@ impl FilterPlugin for GrepFilter {
for line in buf_reader.by_ref().lines() {
let line = line?;
if self.regex.is_match(&line) {
writeln!(writer, "{}", line)?;
writeln!(writer, "{line}")?;
}
}
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// Creates a new GrepFilter with the same regex pattern.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
///
/// # Examples
///
/// ```
/// let cloned = filter.clone_box();
/// ```
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
regex: self.regex.clone(),
@@ -109,15 +104,17 @@ impl FilterPlugin for GrepFilter {
/// # Examples
///
/// ```
/// # use keep::filter_plugin::{FilterPlugin, GrepFilter};
/// let filter = GrepFilter::new("test".to_string()).unwrap();
/// let opts = filter.options();
/// assert_eq!(opts.len(), 1);
/// assert!(opts[0].required);
/// ```
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "pattern".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::pattern_option()
}
fn description(&self) -> &str {
"Filter lines matching a regex pattern"
}
}

View File

@@ -3,14 +3,7 @@ use crate::common::PIPESIZE;
use crate::services::filter_service::register_filter_plugin;
use std::io::{BufRead, Read, Result, Write};
/// A filter that reads the first N bytes from the input stream.
///
/// Limits the output to the initial bytes specified in the configuration.
/// Useful for previewing file contents without reading everything.
///
/// # Fields
///
/// * `remaining` - Number of bytes left to read before stopping.
#[derive(Clone)]
pub struct HeadBytesFilter {
remaining: usize,
}
@@ -37,8 +30,8 @@ impl HeadBytesFilter {
/// # Examples
///
/// ```
/// # use keep::filter_plugin::HeadBytesFilter;
/// let filter = HeadBytesFilter::new(1024);
/// assert_eq!(filter.remaining, 1024);
/// ```
pub fn new(count: usize) -> Self {
Self { remaining: count }
@@ -66,8 +59,14 @@ impl HeadBytesFilter {
/// # Examples
///
/// ```
/// // Assuming a filter chain with head_bytes(5)
/// // Input "Hello World" becomes "Hello"
/// # use std::io::{Read, Write, Cursor};
/// # use keep::filter_plugin::{FilterPlugin, HeadBytesFilter};
/// # let mut filter = HeadBytesFilter::new(5);
/// let mut input: &mut dyn Read = &mut Cursor::new(b"Hello World");
/// let mut output = Vec::new();
/// filter.filter(&mut input, &mut output)?;
/// assert_eq!(output, b"Hello");
/// # Ok::<(), std::io::Error>(())
/// ```
impl FilterPlugin for HeadBytesFilter {
fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
@@ -88,13 +87,6 @@ impl FilterPlugin for HeadBytesFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// Creates an independent copy with the same configuration.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` clone.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
@@ -108,16 +100,27 @@ impl FilterPlugin for HeadBytesFilter {
/// # Returns
///
/// Vector of `FilterOption` describing parameters.
///
/// # Examples
///
/// ```
/// # use keep::filter_plugin::{FilterPlugin, HeadBytesFilter};
/// let filter = HeadBytesFilter::new(100);
/// let opts = filter.options();
/// assert_eq!(opts.len(), 1);
/// assert_eq!(opts[0].name, "count");
/// assert!(opts[0].required);
/// ```
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Read the first N bytes"
}
}
/// A filter that reads the first N lines from the input stream.
#[derive(Clone)]
pub struct HeadLinesFilter {
remaining: usize,
}
@@ -144,8 +147,8 @@ impl HeadLinesFilter {
/// # Examples
///
/// ```
/// # use keep::filter_plugin::HeadLinesFilter;
/// let filter = HeadLinesFilter::new(3);
/// assert_eq!(filter.remaining, 3);
/// ```
pub fn new(count: usize) -> Self {
Self { remaining: count }
@@ -172,8 +175,14 @@ impl HeadLinesFilter {
/// # Examples
///
/// ```
/// // Assuming a filter chain with head_lines(2)
/// // Input: "Line1\nLine2\nLine3" becomes "Line1\nLine2\n"
/// # use std::io::{Read, Write, Cursor};
/// # use keep::filter_plugin::{FilterPlugin, HeadLinesFilter};
/// # let mut filter = HeadLinesFilter::new(2);
/// let mut input: &mut dyn Read = &mut Cursor::new(b"Line1\nLine2\nLine3\n");
/// let mut output = Vec::new();
/// filter.filter(&mut input, &mut output)?;
/// assert_eq!(output, b"Line1\nLine2\n");
/// # Ok::<(), std::io::Error>(())
/// ```
impl FilterPlugin for HeadLinesFilter {
fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
@@ -184,7 +193,7 @@ impl FilterPlugin for HeadLinesFilter {
let mut buf_reader = std::io::BufReader::new(reader);
for line in buf_reader.by_ref().lines() {
let line = line?;
writeln!(writer, "{}", line)?;
writeln!(writer, "{line}")?;
self.remaining -= 1;
if self.remaining == 0 {
break;
@@ -193,13 +202,6 @@ impl FilterPlugin for HeadLinesFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// Creates an independent copy with the same configuration.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` clone.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
@@ -207,24 +209,20 @@ impl FilterPlugin for HeadLinesFilter {
}
/// Returns the configuration options for this filter.
///
/// Defines the "count" parameter as required with no default.
///
/// # Returns
///
/// Vector of `FilterOption` describing parameters.
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Read the first N lines"
}
}
// Register the plugin at module initialization time
#[ctor::ctor]
fn register_head_filters() {
register_filter_plugin("head_bytes", || Box::new(HeadBytesFilter::new(0)));
register_filter_plugin("head_lines", || Box::new(HeadLinesFilter::new(0)));
register_filter_plugin("head_bytes", || Box::new(HeadBytesFilter::new(0)))
.expect("Failed to register head_bytes filter");
register_filter_plugin("head_lines", || Box::new(HeadLinesFilter::new(0)))
.expect("Failed to register head_lines filter");
}

View File

@@ -2,6 +2,7 @@ use std::io::{Read, Result, Write};
use std::str::FromStr;
use strum::EnumString;
#[cfg(feature = "filter_grep")]
pub mod grep;
/// Filter plugin module for processing input streams.
///
@@ -14,17 +15,25 @@ pub mod grep;
/// Parse a filter string and apply to a reader:
///
/// ```
/// let chain = parse_filter_string("head_lines(10)|grep(pattern=error)")?;
/// chain.filter(&mut reader, &mut writer)?;
/// # use std::io::{Read, Write};
/// # use keep::filter_plugin::parse_filter_string;
/// let mut chain = parse_filter_string("head_lines(10)|tail_lines(5)")?;
/// # let mut reader: &mut dyn Read = &mut std::io::empty();
/// # let mut writer: Vec<u8> = Vec::new();
/// # chain.filter(&mut reader, &mut writer)?;
/// # Ok::<(), std::io::Error>(())
/// ```
pub mod head;
pub mod skip;
pub mod strip_ansi;
pub mod tail;
#[cfg(feature = "meta_tokens")]
pub mod tokens;
pub mod utils;
use std::collections::HashMap;
#[cfg(feature = "filter_grep")]
pub use grep::GrepFilter;
pub use head::{HeadBytesFilter, HeadLinesFilter};
pub use skip::{SkipBytesFilter, SkipLinesFilter};
@@ -62,11 +71,20 @@ pub struct FilterOption {
/// # Examples
///
/// ```
/// # use std::io::{Read, Write, Result};
/// # use keep::filter_plugin::{FilterPlugin, FilterOption};
/// struct MyFilter;
/// impl FilterPlugin for MyFilter {
/// fn filter(&mut self, reader: Box<&mut dyn Read>, writer: Box<&mut dyn Write>) -> Result<()> {
/// fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
/// // Implementation
/// Ok(())
/// }
/// fn clone_box(&self) -> Box<dyn FilterPlugin> {
/// Box::new(MyFilter)
/// }
/// fn options(&self) -> Vec<FilterOption> {
/// vec![]
/// }
/// // ...
/// }
/// ```
pub trait FilterPlugin: Send {
@@ -77,8 +95,8 @@ pub trait FilterPlugin: Send {
///
/// # Arguments
///
/// * `reader` - A boxed mutable reference to the input reader providing the data to filter.
/// * `writer` - A boxed mutable reference to the output writer where the processed data is written.
/// * `reader` - A mutable reference to the input reader providing the data to filter.
/// * `writer` - A mutable reference to the output writer where the processed data is written.
///
/// # Returns
///
@@ -87,18 +105,25 @@ pub trait FilterPlugin: Send {
/// # Examples
///
/// ```
/// # use std::io::{Read, Write, Result};
/// # use keep::filter_plugin::{FilterPlugin, FilterOption};
/// struct MyFilter;
/// impl FilterPlugin for MyFilter {
/// fn filter(&mut self, reader: Box<&mut dyn Read>, writer: Box<&mut dyn Write>) -> Result<()> {
/// // Read and filter data
/// fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
/// let mut buf = [0; 1024];
/// while let Ok(n) = reader.as_mut().read(&mut buf) {
/// loop {
/// let n = reader.read(&mut buf)?;
/// if n == 0 { break; }
/// // Apply filter logic to buf[0..n]
/// writer.as_mut().write_all(&buf[0..n])?;
/// writer.write_all(&buf[0..n])?;
/// }
/// Ok(())
/// }
/// // ... other methods
/// fn clone_box(&self) -> Box<dyn FilterPlugin> {
/// Box::new(Self)
/// }
/// fn options(&self) -> Vec<FilterOption> {
/// vec![]
/// }
/// }
/// ```
fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
@@ -106,21 +131,6 @@ pub trait FilterPlugin: Send {
Ok(())
}
/// Clones this plugin into a new boxed instance.
///
/// This method is required for dynamic dispatch and cloning in filter chains.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` clone of the current plugin.
///
/// # Examples
///
/// ```
/// fn clone_box(&self) -> Box<dyn FilterPlugin> {
/// Box::new(self.clone())
/// }
/// ```
fn clone_box(&self) -> Box<dyn FilterPlugin>;
/// Returns the configuration options for this plugin.
@@ -134,7 +144,8 @@ pub trait FilterPlugin: Send {
/// # Examples
///
/// ```
/// fn options(&self) -> Vec<FilterOption> {
/// # use keep::filter_plugin::FilterOption;
/// fn example_options() -> Vec<FilterOption> {
/// vec![
/// FilterOption {
/// name: "pattern".to_string(),
@@ -145,6 +156,31 @@ pub trait FilterPlugin: Send {
/// }
/// ```
fn options(&self) -> Vec<FilterOption>;
/// Returns a human-readable description of this filter.
///
/// # Returns
///
/// A description string (empty by default).
fn description(&self) -> &str {
""
}
}
pub fn count_option() -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
}
pub fn pattern_option() -> Vec<FilterOption> {
vec![FilterOption {
name: "pattern".to_string(),
default: None,
required: true,
}]
}
/// Enum representing the different types of filters.
@@ -165,8 +201,57 @@ pub enum FilterType {
TailLines,
SkipBytes,
SkipLines,
#[cfg(feature = "filter_grep")]
Grep,
StripAnsi,
#[cfg(feature = "meta_tokens")]
HeadTokens,
#[cfg(feature = "meta_tokens")]
SkipTokens,
#[cfg(feature = "meta_tokens")]
TailTokens,
}
/// Maximum buffer size (256 MB) for filter chain intermediate results.
/// Prevents OOM on large files by rejecting inputs that exceed this limit.
const MAX_FILTER_BUFFER_SIZE: usize = 256 * 1024 * 1024;
struct BoundedVecWriter {
data: Vec<u8>,
limit: usize,
}
impl BoundedVecWriter {
fn new(limit: usize) -> Self {
Self {
data: Vec::new(),
limit,
}
}
fn into_inner(self) -> Vec<u8> {
self.data
}
}
impl std::io::Write for BoundedVecWriter {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
if self.data.len() + buf.len() > self.limit {
return Err(std::io::Error::new(
std::io::ErrorKind::InvalidData,
format!(
"Input size exceeds maximum filter buffer size ({} bytes)",
MAX_FILTER_BUFFER_SIZE
),
));
}
self.data.write_all(buf)?;
Ok(buf.len())
}
fn flush(&mut self) -> std::io::Result<()> {
Ok(())
}
}
/// A chain of filter plugins applied sequentially.
@@ -191,9 +276,14 @@ pub struct FilterChain {
/// # Examples
///
/// ```
/// # use std::io::{Read, Write, Result};
/// # use keep::filter_plugin::{FilterChain, HeadLinesFilter};
/// let mut chain = FilterChain::new();
/// chain.add_plugin(Box::new(HeadLinesFilter::new(10)));
/// chain.filter(&mut reader, &mut writer)?;
/// # let mut reader: &mut dyn Read = &mut std::io::empty();
/// # let mut writer: Vec<u8> = Vec::new();
/// # chain.filter(&mut reader, &mut writer)?;
/// # Ok::<(), std::io::Error>(())
/// ```
impl Clone for FilterChain {
/// Clones this filter chain.
@@ -211,16 +301,27 @@ impl Clone for FilterChain {
}
impl Clone for Box<dyn FilterPlugin> {
/// Clones the boxed filter plugin.
///
/// # Returns
///
/// A new boxed clone of the filter plugin.
fn clone(&self) -> Self {
self.clone_box()
}
}
#[macro_export]
macro_rules! filter_clone_box {
($self:expr) => {
Box::new($self.clone())
};
($self:expr, $field:ident) => {
Box::new(Self { $field: $self.$field.clone() })
};
($self:expr, $field:ident, $($rest:ident),+) => {
Box::new(Self {
$field: $self.$field.clone(),
$($rest: $self.$rest.clone()),+
})
};
}
impl Default for FilterChain {
fn default() -> Self {
Self::new()
@@ -237,8 +338,9 @@ impl FilterChain {
/// # Examples
///
/// ```
/// # use keep::filter_plugin::FilterChain;
/// let chain = FilterChain::new();
/// assert!(chain.plugins.is_empty());
/// // Chain starts empty
/// ```
pub fn new() -> Self {
Self {
@@ -257,8 +359,8 @@ impl FilterChain {
/// # Examples
///
/// ```
/// # use keep::filter_plugin::FilterChain;
/// let mut chain = FilterChain::new();
/// chain.add_plugin(Box::new(GrepFilter::new("error".to_string())));
/// ```
pub fn add_plugin(&mut self, plugin: Box<dyn FilterPlugin>) {
self.plugins.push(plugin);
@@ -281,9 +383,14 @@ impl FilterChain {
/// # Examples
///
/// ```
/// # use std::io::{Read, Write, Result};
/// # use keep::filter_plugin::{FilterChain, HeadBytesFilter};
/// let mut chain = FilterChain::new();
/// chain.add_plugin(Box::new(HeadBytesFilter::new(100)));
/// chain.filter(&mut input_reader, &mut output_writer)?;
/// # let mut input_reader: &mut dyn Read = &mut std::io::empty();
/// # let mut output_writer: Vec<u8> = Vec::new();
/// # chain.filter(&mut input_reader, &mut output_writer)?;
/// # Ok::<(), std::io::Error>(())
/// ```
pub fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
if self.plugins.is_empty() {
@@ -293,9 +400,10 @@ impl FilterChain {
}
// For multiple plugins, we need to chain them together
// We'll use a temporary buffer to hold intermediate results
let mut current_data = Vec::new();
std::io::copy(reader, &mut current_data)?;
// We'll use a bounded buffer to hold intermediate results
let mut bounded_writer = BoundedVecWriter::new(MAX_FILTER_BUFFER_SIZE);
std::io::copy(reader, &mut bounded_writer)?;
let mut current_data = bounded_writer.into_inner();
// Store the plugins length to avoid borrowing issues
let plugins_len = self.plugins.len();
@@ -311,6 +419,18 @@ impl FilterChain {
// For intermediate plugins, write to a buffer
let mut output_vec = Vec::new();
self.plugins[i].filter(&mut input, &mut output_vec)?;
if output_vec.len() > MAX_FILTER_BUFFER_SIZE {
return Err(std::io::Error::new(
std::io::ErrorKind::InvalidData,
format!(
"Filter output size ({} bytes) exceeds maximum filter buffer size ({} bytes).",
output_vec.len(),
MAX_FILTER_BUFFER_SIZE
),
));
}
current_data = output_vec;
}
}
@@ -381,7 +501,7 @@ pub fn parse_filter_string(filter_str: &str) -> Result<FilterChain> {
_ => {
return Err(std::io::Error::new(
std::io::ErrorKind::InvalidInput,
format!("Filter '{}' requires parameters", part),
format!("Filter '{part}' requires parameters"),
));
}
}
@@ -391,7 +511,7 @@ pub fn parse_filter_string(filter_str: &str) -> Result<FilterChain> {
// If we get here, the filter wasn't recognized
return Err(std::io::Error::new(
std::io::ErrorKind::InvalidInput,
format!("Unknown filter: {}", part),
format!("Unknown filter: {part}"),
));
}
@@ -417,6 +537,7 @@ fn create_filter_with_options(
// Get the default options for this filter type by creating a temporary instance
// To do this, we need to create a default instance of the appropriate filter
let option_defs = match filter_type {
#[cfg(feature = "filter_grep")]
FilterType::Grep => grep::GrepFilter::new("".to_string())?.options(),
FilterType::HeadBytes => head::HeadBytesFilter::new(0).options(),
FilterType::HeadLines => head::HeadLinesFilter::new(0).options(),
@@ -425,6 +546,12 @@ fn create_filter_with_options(
FilterType::SkipBytes => skip::SkipBytesFilter::new(0).options(),
FilterType::SkipLines => skip::SkipLinesFilter::new(0).options(),
FilterType::StripAnsi => strip_ansi::StripAnsiFilter::new().options(),
#[cfg(feature = "meta_tokens")]
FilterType::HeadTokens => tokens::HeadTokensFilter::new(0).options(),
#[cfg(feature = "meta_tokens")]
FilterType::SkipTokens => tokens::SkipTokensFilter::new(0).options(),
#[cfg(feature = "meta_tokens")]
FilterType::TailTokens => tokens::TailTokensFilter::new(0).options(),
};
let mut options = HashMap::new();
@@ -454,7 +581,7 @@ fn create_filter_with_options(
if !option_defs.iter().any(|opt| &opt.name == key) {
return Err(std::io::Error::new(
std::io::ErrorKind::InvalidInput,
format!("Unknown option '{}'", key),
format!("Unknown option '{key}'"),
));
}
options.insert(key.clone(), value.clone());
@@ -493,6 +620,7 @@ fn create_specific_filter(
options: &HashMap<String, serde_json::Value>,
) -> Result<Box<dyn FilterPlugin>> {
match filter_type {
#[cfg(feature = "filter_grep")]
FilterType::Grep => {
let pattern = options
.get("pattern")
@@ -593,7 +721,74 @@ fn create_specific_filter(
}
Ok(Box::new(strip_ansi::StripAnsiFilter::new()))
}
#[cfg(feature = "meta_tokens")]
FilterType::HeadTokens => {
let count = options
.get("count")
.and_then(|v| v.as_u64())
.map(|n| n as usize)
.ok_or_else(|| {
std::io::Error::new(
std::io::ErrorKind::InvalidInput,
"head_tokens filter requires 'count' parameter",
)
})?;
let (encoding, tokenizer) = parse_encoding_option(options);
let mut f = tokens::HeadTokensFilter::new(count);
f.tokenizer = tokenizer;
f.encoding = encoding;
Ok(Box::new(f))
}
#[cfg(feature = "meta_tokens")]
FilterType::SkipTokens => {
let count = options
.get("count")
.and_then(|v| v.as_u64())
.map(|n| n as usize)
.ok_or_else(|| {
std::io::Error::new(
std::io::ErrorKind::InvalidInput,
"skip_tokens filter requires 'count' parameter",
)
})?;
let (encoding, tokenizer) = parse_encoding_option(options);
let mut f = tokens::SkipTokensFilter::new(count);
f.tokenizer = tokenizer;
f.encoding = encoding;
Ok(Box::new(f))
}
#[cfg(feature = "meta_tokens")]
FilterType::TailTokens => {
let count = options
.get("count")
.and_then(|v| v.as_u64())
.map(|n| n as usize)
.ok_or_else(|| {
std::io::Error::new(
std::io::ErrorKind::InvalidInput,
"tail_tokens filter requires 'count' parameter",
)
})?;
let (encoding, tokenizer) = parse_encoding_option(options);
let mut f = tokens::TailTokensFilter::new(count);
f.tokenizer = tokenizer;
f.encoding = encoding;
Ok(Box::new(f))
}
}
}
#[cfg(feature = "meta_tokens")]
fn parse_encoding_option(
options: &std::collections::HashMap<String, serde_json::Value>,
) -> (crate::tokenizer::TokenEncoding, crate::tokenizer::Tokenizer) {
let encoding = options
.get("encoding")
.and_then(|v| v.as_str())
.and_then(|s| s.parse::<crate::tokenizer::TokenEncoding>().ok())
.unwrap_or_default();
let tokenizer = crate::tokenizer::get_tokenizer(encoding).clone();
(encoding, tokenizer)
}
/// Parses an option value from a string into a JSON value.

View File

@@ -4,6 +4,7 @@ use crate::services::filter_service::register_filter_plugin;
use std::io::{BufRead, Read, Result, Write};
/// A filter that skips the first N bytes from the input stream.
#[derive(Clone)]
pub struct SkipBytesFilter {
remaining: usize,
}
@@ -49,11 +50,6 @@ impl FilterPlugin for SkipBytesFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
@@ -61,20 +57,17 @@ impl FilterPlugin for SkipBytesFilter {
}
/// Returns the configuration options for this filter.
///
/// # Returns
///
/// A vector of `FilterOption` describing the filter's configurable parameters.
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Skip the first N bytes"
}
}
/// A filter that skips the first N lines from the input stream.
#[derive(Clone)]
pub struct SkipLinesFilter {
remaining: usize,
}
@@ -108,17 +101,12 @@ impl FilterPlugin for SkipLinesFilter {
if self.remaining > 0 {
self.remaining -= 1;
} else {
writeln!(writer, "{}", line)?;
writeln!(writer, "{line}")?;
}
}
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
@@ -126,22 +114,20 @@ impl FilterPlugin for SkipLinesFilter {
}
/// Returns the configuration options for this filter.
///
/// # Returns
///
/// A vector of `FilterOption` describing the filter's configurable parameters.
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Skip the first N lines"
}
}
// Register the plugin at module initialization time
#[ctor::ctor]
fn register_skip_filters() {
register_filter_plugin("skip_bytes", || Box::new(SkipBytesFilter::new(0)));
register_filter_plugin("skip_lines", || Box::new(SkipLinesFilter::new(0)));
register_filter_plugin("skip_bytes", || Box::new(SkipBytesFilter::new(0)))
.expect("Failed to register skip_bytes filter");
register_filter_plugin("skip_lines", || Box::new(SkipLinesFilter::new(0)))
.expect("Failed to register skip_lines filter");
}

View File

@@ -7,7 +7,7 @@ use strip_ansi_escapes::Writer;
/// # Fields
///
/// None, stateless filter.
#[derive(Default)]
#[derive(Default, Clone)]
pub struct StripAnsiFilter;
impl StripAnsiFilter {
@@ -39,21 +39,15 @@ impl FilterPlugin for StripAnsiFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self)
}
/// Returns the configuration options for this filter (none required).
///
/// # Returns
///
/// An empty vector since this filter has no configurable options.
fn options(&self) -> Vec<FilterOption> {
Vec::new() // strip_ansi doesn't take any options
Vec::new()
}
fn description(&self) -> &str {
"Strip ANSI escape sequences"
}
}

View File

@@ -4,7 +4,7 @@ use crate::services::filter_service::register_filter_plugin;
use std::collections::VecDeque;
use std::io::{BufRead, Read, Result, Write};
/// A filter that reads the last N bytes from the input stream.
#[derive(Clone)]
pub struct TailBytesFilter {
buffer: VecDeque<u8>,
count: usize,
@@ -58,11 +58,6 @@ impl FilterPlugin for TailBytesFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
buffer: self.buffer.clone(),
@@ -71,20 +66,17 @@ impl FilterPlugin for TailBytesFilter {
}
/// Returns the configuration options for this filter.
///
/// # Returns
///
/// A vector of `FilterOption` describing the filter's configurable parameters.
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Read the last N bytes"
}
}
/// A filter that reads the last N lines from the input stream.
#[derive(Clone)]
pub struct TailLinesFilter {
lines: VecDeque<String>,
count: usize,
@@ -127,16 +119,11 @@ impl FilterPlugin for TailLinesFilter {
// Write the buffered lines
for line in &self.lines {
writeln!(writer, "{}", line)?;
writeln!(writer, "{line}")?;
}
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
lines: self.lines.clone(),
@@ -145,22 +132,20 @@ impl FilterPlugin for TailLinesFilter {
}
/// Returns the configuration options for this filter.
///
/// # Returns
///
/// A vector of `FilterOption` describing the filter's configurable parameters.
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Read the last N lines"
}
}
// Register the plugin at module initialization time
#[ctor::ctor]
fn register_tail_filters() {
register_filter_plugin("tail_bytes", || Box::new(TailBytesFilter::new(0)));
register_filter_plugin("tail_lines", || Box::new(TailLinesFilter::new(0)));
register_filter_plugin("tail_bytes", || Box::new(TailBytesFilter::new(0)))
.expect("Failed to register tail_bytes filter");
register_filter_plugin("tail_lines", || Box::new(TailLinesFilter::new(0)))
.expect("Failed to register tail_lines filter");
}

500
src/filter_plugin/tokens.rs Normal file
View File

@@ -0,0 +1,500 @@
use super::{FilterOption, FilterPlugin};
use crate::common::PIPESIZE;
use crate::services::filter_service::register_filter_plugin;
use crate::tokenizer::{TokenEncoding, Tokenizer, get_tokenizer};
use std::io::{Read, Result, Write};
// ---------------------------------------------------------------------------
// head_tokens
// ---------------------------------------------------------------------------
#[derive(Clone)]
pub struct HeadTokensFilter {
pub remaining: usize,
pub tokenizer: Tokenizer,
pub encoding: TokenEncoding,
}
impl HeadTokensFilter {
pub fn new(count: usize) -> Self {
let encoding = TokenEncoding::default();
Self {
remaining: count,
tokenizer: get_tokenizer(encoding).clone(),
encoding,
}
}
}
impl FilterPlugin for HeadTokensFilter {
fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
if self.remaining == 0 {
return Ok(());
}
let tokenizer = &self.tokenizer;
let mut buffer = vec![0u8; PIPESIZE];
let mut total_tokens = 0usize;
loop {
let n = reader.read(&mut buffer)?;
if n == 0 {
break;
}
let chunk = &buffer[..n];
let text = String::from_utf8_lossy(chunk);
let chunk_tokens = tokenizer.count(&text);
if total_tokens + chunk_tokens <= self.remaining {
// Entire chunk fits — write it directly
writer.write_all(chunk)?;
total_tokens += chunk_tokens;
if total_tokens >= self.remaining {
break;
}
} else {
// Cutoff is within this chunk — use iterator to find exact
// boundary without allocating all token strings
let tokens_to_write = self.remaining - total_tokens;
let mut byte_pos = 0usize;
for token_str in tokenizer.split_by_token_iter(&text).take(tokens_to_write) {
byte_pos += token_str
.map_err(|e| std::io::Error::other(e.to_string()))?
.len();
}
let write_len = map_lossy_pos_to_bytes(chunk, &text, byte_pos);
writer.write_all(&chunk[..write_len])?;
break;
}
}
Ok(())
}
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
tokenizer: self.tokenizer.clone(),
encoding: self.encoding,
})
}
fn options(&self) -> Vec<FilterOption> {
vec![
FilterOption {
name: "count".to_string(),
default: None,
required: true,
},
FilterOption {
name: "encoding".to_string(),
default: Some(serde_json::Value::String("cl100k_base".to_string())),
required: false,
},
]
}
fn description(&self) -> &str {
"Read the first N LLM tokens"
}
}
// ---------------------------------------------------------------------------
// skip_tokens
// ---------------------------------------------------------------------------
#[derive(Clone)]
pub struct SkipTokensFilter {
pub remaining: usize,
pub tokenizer: Tokenizer,
pub encoding: TokenEncoding,
}
impl SkipTokensFilter {
pub fn new(count: usize) -> Self {
let encoding = TokenEncoding::default();
Self {
remaining: count,
tokenizer: get_tokenizer(encoding).clone(),
encoding,
}
}
}
impl FilterPlugin for SkipTokensFilter {
fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
if self.remaining == 0 {
return std::io::copy(reader, writer).map(|_| ());
}
let tokenizer = &self.tokenizer;
let mut buffer = vec![0u8; PIPESIZE];
let mut total_tokens = 0usize;
let mut done_skipping = false;
loop {
let n = reader.read(&mut buffer)?;
if n == 0 {
break;
}
if done_skipping {
writer.write_all(&buffer[..n])?;
continue;
}
let chunk = &buffer[..n];
let text = String::from_utf8_lossy(chunk);
let chunk_tokens = tokenizer.count(&text);
if total_tokens + chunk_tokens <= self.remaining {
// Entire chunk is skipped
total_tokens += chunk_tokens;
if total_tokens >= self.remaining {
done_skipping = true;
}
} else {
// Cutoff is within this chunk — use iterator to skip past
// the boundary without allocating all token strings
let tokens_to_skip = self.remaining - total_tokens;
let mut byte_pos = 0usize;
for token_str in tokenizer.split_by_token_iter(&text).take(tokens_to_skip) {
byte_pos += token_str
.map_err(|e| std::io::Error::other(e.to_string()))?
.len();
}
let skip_len = map_lossy_pos_to_bytes(chunk, &text, byte_pos);
if skip_len < n {
writer.write_all(&chunk[skip_len..])?;
}
done_skipping = true;
}
}
Ok(())
}
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
tokenizer: self.tokenizer.clone(),
encoding: self.encoding,
})
}
fn options(&self) -> Vec<FilterOption> {
vec![
FilterOption {
name: "count".to_string(),
default: None,
required: true,
},
FilterOption {
name: "encoding".to_string(),
default: Some(serde_json::Value::String("cl100k_base".to_string())),
required: false,
},
]
}
fn description(&self) -> &str {
"Skip the first N LLM tokens"
}
}
// ---------------------------------------------------------------------------
// tail_tokens
// ---------------------------------------------------------------------------
/// A filter that outputs only the last N tokens of the input stream.
///
#[derive(Clone)]
pub struct TailTokensFilter {
pub count: usize,
/// Buffer holding all bytes from the stream.
buffer: Vec<u8>,
pub tokenizer: Tokenizer,
pub encoding: TokenEncoding,
}
impl TailTokensFilter {
pub fn new(count: usize) -> Self {
let encoding = TokenEncoding::default();
Self {
count,
buffer: Vec::with_capacity(PIPESIZE),
tokenizer: get_tokenizer(encoding).clone(),
encoding,
}
}
}
impl FilterPlugin for TailTokensFilter {
fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
if self.count == 0 {
return Ok(());
}
let tokenizer = &self.tokenizer;
// Buffer all bytes from the stream
std::io::copy(reader, &mut self.buffer)?;
if self.buffer.is_empty() {
return Ok(());
}
let text = String::from_utf8_lossy(&self.buffer);
let token_strs = tokenizer
.split_by_token(&text)
.map_err(|e| std::io::Error::other(e.to_string()))?;
if token_strs.len() <= self.count {
// All tokens fit — write everything
writer.write_all(&self.buffer)?;
} else {
// Write only the last N tokens
let skip = token_strs.len() - self.count;
let mut byte_offset = 0usize;
for token_str in token_strs.iter().take(skip) {
byte_offset += token_str.len();
}
let write_len = map_lossy_pos_to_bytes(&self.buffer, &text, byte_offset);
if write_len < self.buffer.len() {
writer.write_all(&self.buffer[write_len..])?;
}
}
Ok(())
}
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
count: self.count,
buffer: Vec::new(),
tokenizer: self.tokenizer.clone(),
encoding: self.encoding,
})
}
fn options(&self) -> Vec<FilterOption> {
vec![
FilterOption {
name: "count".to_string(),
default: None,
required: true,
},
FilterOption {
name: "encoding".to_string(),
default: Some(serde_json::Value::String("cl100k_base".to_string())),
required: false,
},
]
}
fn description(&self) -> &str {
"Read the last N LLM tokens"
}
}
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
/// Map a byte position in a lossy string back to a position in the original byte slice.
///
/// `String::from_utf8_lossy` replaces invalid UTF-8 bytes with the Unicode
/// replacement character (U+FFFD), which encodes to 3 bytes in UTF-8. This
/// function walks both the original bytes and the lossy string in lockstep,
/// finding the original byte position that corresponds to `lossy_pos`.
fn map_lossy_pos_to_bytes(original: &[u8], lossy: &str, lossy_pos: usize) -> usize {
if lossy_pos == 0 {
return 0;
}
let replacement = '\u{FFFD}';
let replacement_len = replacement.len_utf8(); // 3 bytes
let mut orig_idx = 0usize;
let mut lossy_idx = 0usize;
let lossy_bytes = lossy.as_bytes();
while lossy_idx < lossy_pos && orig_idx < original.len() {
// Try to decode the next character from the original bytes
match std::str::from_utf8(&original[orig_idx..]) {
Ok("") => break,
Ok(s) => {
let ch = s.chars().next().unwrap();
let ch_len = ch.len_utf8();
// Check if this is a replacement character in the lossy string
if ch == replacement
&& lossy_idx + replacement_len <= lossy_pos
&& lossy_bytes[lossy_idx..].starts_with(
&replacement.encode_utf8(&mut [0; 4]).as_bytes()[..replacement_len],
)
{
// Could be a real U+FFFD or a replacement of invalid bytes.
// If the original byte at this position is valid UTF-8 start, it's real.
if original[orig_idx] < 0x80 || original[orig_idx] >= 0xC0 {
// Real character
orig_idx += ch_len;
lossy_idx += ch_len;
} else {
// Invalid byte that was replaced — advance original by 1
orig_idx += 1;
lossy_idx += replacement_len;
}
} else {
orig_idx += ch_len;
lossy_idx += ch_len;
}
}
Err(e) => {
let valid = e.valid_up_to();
if valid > 0 {
// Some valid bytes, then invalid
orig_idx += valid;
lossy_idx += valid;
} else {
// Invalid byte — in lossy it becomes 3-byte replacement char
orig_idx += 1;
lossy_idx += replacement_len;
}
}
}
}
orig_idx.min(original.len())
}
// ---------------------------------------------------------------------------
// Registration
// ---------------------------------------------------------------------------
#[ctor::ctor]
fn register_token_filters() {
register_filter_plugin("head_tokens", || Box::new(HeadTokensFilter::new(0)))
.expect("Failed to register head_tokens filter");
register_filter_plugin("skip_tokens", || Box::new(SkipTokensFilter::new(0)))
.expect("Failed to register skip_tokens filter");
register_filter_plugin("tail_tokens", || Box::new(TailTokensFilter::new(0)))
.expect("Failed to register tail_tokens filter");
}
#[cfg(test)]
mod tests {
use super::*;
use std::io::Cursor;
fn make_tokenizer() -> Tokenizer {
get_tokenizer(TokenEncoding::Cl100kBase).clone()
}
#[test]
fn test_head_tokens_basic() {
let mut filter = HeadTokensFilter::new(3);
filter.tokenizer = make_tokenizer();
let input = b"The quick brown fox";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
let result = String::from_utf8_lossy(&output);
// "The quick brown" is typically 3 tokens
assert!(!result.is_empty());
assert!(result.len() <= input.len());
}
#[test]
fn test_head_tokens_zero() {
let mut filter = HeadTokensFilter::new(0);
filter.tokenizer = make_tokenizer();
let input = b"The quick brown fox";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
assert!(output.is_empty());
}
#[test]
fn test_head_tokens_more_than_available() {
let mut filter = HeadTokensFilter::new(1000);
filter.tokenizer = make_tokenizer();
let input = b"Hello world";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
assert_eq!(output, input);
}
#[test]
fn test_skip_tokens_basic() {
let mut filter = SkipTokensFilter::new(2);
filter.tokenizer = make_tokenizer();
let input = b"The quick brown fox";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
let result = String::from_utf8_lossy(&output);
// Should have skipped some tokens
assert!(result.len() < input.len());
}
#[test]
fn test_skip_tokens_zero() {
let mut filter = SkipTokensFilter::new(0);
filter.tokenizer = make_tokenizer();
let input = b"Hello world";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
assert_eq!(output, input);
}
#[test]
fn test_tail_tokens_basic() {
let mut filter = TailTokensFilter::new(2);
filter.tokenizer = make_tokenizer();
let input = b"The quick brown fox jumps over the lazy dog";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
let result = String::from_utf8_lossy(&output);
// Should only have last 2 tokens
assert!(!result.is_empty());
assert!(result.len() < input.len());
}
#[test]
fn test_tail_tokens_zero() {
let mut filter = TailTokensFilter::new(0);
filter.tokenizer = make_tokenizer();
let input = b"Hello world";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
assert!(output.is_empty());
}
#[test]
fn test_map_lossy_pos_ascii() {
let original = b"Hello world";
let lossy = String::from_utf8_lossy(original);
assert_eq!(map_lossy_pos_to_bytes(original, &lossy, 5), 5);
assert_eq!(map_lossy_pos_to_bytes(original, &lossy, 0), 0);
assert_eq!(map_lossy_pos_to_bytes(original, &lossy, 11), 11);
}
#[test]
fn test_map_lossy_pos_with_invalid_utf8() {
let original = b"Hello\x80world";
let lossy = String::from_utf8_lossy(original);
// lossy = "Hello\u{FFFD}world" (13 bytes)
// Position 5 in lossy = after "Hello" = position 5 in original
assert_eq!(map_lossy_pos_to_bytes(original, &lossy, 5), 5);
// Position 8 in lossy = "Hello\u{FFFD}" = position 6 in original
// (the invalid byte \x80 at position 5 was replaced)
assert_eq!(map_lossy_pos_to_bytes(original, &lossy, 8), 6);
}
}

225
src/import_tar.rs Normal file
View File

@@ -0,0 +1,225 @@
use anyhow::{Context, Result, anyhow};
use log::debug;
use std::collections::HashMap;
use std::fs;
use std::io::{Read, Write};
use std::path::Path;
use std::str::FromStr;
use tempfile::TempDir;
use tar::Archive;
use crate::common::PIPESIZE;
use crate::compression_engine::CompressionType;
use crate::db;
use crate::modes::common::ImportMeta;
/// Represents a parsed tar entry from an export archive.
struct TarEntry {
/// Path to the extracted data file in the temp directory.
data_path: Option<std::path::PathBuf>,
/// Path to the extracted meta file in the temp directory.
meta_path: Option<std::path::PathBuf>,
}
/// Import all items from a `.keep.tar` archive.
///
/// Items are imported in ascending order of their original IDs,
/// ensuring chronological ordering is preserved. Each imported item
/// receives a new auto-incremented ID from the target database.
///
/// # Arguments
///
/// * `tar_path` - Path to the `.keep.tar` file.
/// * `conn` - Mutable database connection.
/// * `data_path` - Path to the data storage directory.
///
/// # Returns
///
/// A list of newly assigned item IDs.
pub fn import_from_tar(
tar_path: &Path,
conn: &mut rusqlite::Connection,
data_path: &Path,
) -> Result<Vec<i64>> {
let file = fs::File::open(tar_path)
.with_context(|| format!("Cannot open tar file: {}", tar_path.display()))?;
let mut archive = Archive::new(file);
let tmp_dir = TempDir::new().context("Cannot create temporary directory for import")?;
let tmp_path = tmp_dir.path();
// Extract entries to temp dir
let mut entries_map: HashMap<i64, TarEntry> = HashMap::new();
for entry_result in archive.entries().context("Cannot read tar entries")? {
let mut entry = entry_result.context("Cannot read tar entry")?;
let entry_path = entry.path().context("Cannot get entry path")?.to_path_buf();
let path_str = entry_path.to_string_lossy().replace('\\', "/");
// Reject path traversal attempts
if path_str.starts_with('/') || path_str.starts_with("..") || path_str.contains("/../") {
return Err(anyhow!("Rejected path traversal entry: {path_str}"));
}
// Skip directory entries
if entry.header().entry_type().is_dir() {
debug!("IMPORT_TAR: Skipping directory entry: {path_str}");
continue;
}
// Parse: <dir>/<id>.data.<compression> or <dir>/<id>.meta.yml
let filename = entry_path
.file_name()
.ok_or_else(|| anyhow!("Invalid entry path: {path_str}"))?
.to_string_lossy();
let (orig_id, is_data) = if let Some(id_str) = filename.strip_suffix(".meta.yml") {
let id: i64 = id_str
.parse()
.with_context(|| format!("Invalid ID in entry: {path_str}"))?;
(id, false)
} else if let Some(dot_pos) = filename.find(".data.") {
let id_str = &filename[..dot_pos];
let id: i64 = id_str
.parse()
.with_context(|| format!("Invalid ID in entry: {path_str}"))?;
(id, true)
} else {
debug!("IMPORT_TAR: Skipping unrecognized entry: {path_str}");
continue;
};
let entry_ref = entries_map.entry(orig_id).or_insert_with(|| TarEntry {
data_path: None,
meta_path: None,
});
if is_data {
let dest = tmp_path.join(format!("{orig_id}.data"));
let mut dest_file = fs::File::create(&dest).context("Cannot create temp data file")?;
let mut buf = [0u8; PIPESIZE];
loop {
let n = entry.read(&mut buf)?;
if n == 0 {
break;
}
dest_file.write_all(&buf[..n])?;
}
entry_ref.data_path = Some(dest);
debug!("IMPORT_TAR: Extracted data for original ID {orig_id}");
} else {
let dest = tmp_path.join(format!("{orig_id}.meta.yml"));
let mut dest_file = fs::File::create(&dest).context("Cannot create temp meta file")?;
let mut buf = [0u8; PIPESIZE];
loop {
let n = entry.read(&mut buf)?;
if n == 0 {
break;
}
dest_file.write_all(&buf[..n])?;
}
entry_ref.meta_path = Some(dest);
debug!("IMPORT_TAR: Extracted meta for original ID {orig_id}");
}
}
if entries_map.is_empty() {
return Err(anyhow!("No items found in archive"));
}
// Sort by original ID ascending
let mut sorted_ids: Vec<i64> = entries_map.keys().copied().collect();
sorted_ids.sort_unstable();
let mut imported_ids = Vec::new();
for orig_id in sorted_ids {
let entry = entries_map.get(&orig_id).expect("ID should exist in map");
let meta_path = entry
.meta_path
.as_ref()
.ok_or_else(|| anyhow!("Item {orig_id} missing .meta.yml entry"))?;
let data_path_entry = entry
.data_path
.as_ref()
.ok_or_else(|| anyhow!("Item {orig_id} missing .data entry"))?;
// Parse metadata
let meta_yaml = fs::read_to_string(meta_path)
.with_context(|| format!("Cannot read meta file for item {orig_id}"))?;
let import_meta: ImportMeta = serde_yaml::from_str(&meta_yaml)
.with_context(|| format!("Cannot parse meta file for item {orig_id}"))?;
// Validate compression type
CompressionType::from_str(&import_meta.compression).map_err(|_| {
anyhow!(
"Invalid compression type '{}' for item {}",
import_meta.compression,
orig_id
)
})?;
// Create item with original timestamp
let item = db::insert_item_with_ts(conn, import_meta.ts, &import_meta.compression)?;
let new_id = item.id.context("New item missing ID")?;
// Set tags
let tags = if !import_meta.tags.is_empty() {
db::set_item_tags(conn, item.clone(), &import_meta.tags)?;
import_meta.tags.clone()
} else {
Vec::new()
};
// Stream data to storage
let mut storage_path = data_path.to_path_buf();
storage_path.push(new_id.to_string());
let mut reader = fs::File::open(data_path_entry)
.with_context(|| format!("Cannot read data file for item {orig_id}"))?;
let mut writer = fs::File::create(&storage_path)
.with_context(|| format!("Cannot create storage file for item {new_id}"))?;
let mut buf = [0u8; PIPESIZE];
let mut total = 0i64;
loop {
let n = reader.read(&mut buf)?;
if n == 0 {
break;
}
writer.write_all(&buf[..n])?;
total += n as i64;
}
if total == 0 {
return Err(anyhow!("Item {orig_id} has empty data file"));
}
// Set metadata
for (key, value) in &import_meta.metadata {
db::query_upsert_meta(
conn,
db::Meta {
id: new_id,
name: key.clone(),
value: value.clone(),
},
)?;
}
// Update item sizes
let size_to_record = import_meta.uncompressed_size.unwrap_or(total);
let mut updated_item = item;
updated_item.uncompressed_size = Some(size_to_record);
updated_item.compressed_size = Some(std::fs::metadata(&storage_path)?.len() as i64);
updated_item.closed = true;
db::update_item(conn, updated_item)?;
log::info!("KEEP: Imported item {new_id} (was {orig_id}) tags: {tags:?}");
imported_ids.push(new_id);
}
Ok(imported_ids)
}

View File

@@ -18,7 +18,8 @@
//! ```
//!
//! ```rust
//! use keep::Args;
//! # use keep::Args;
//! # use clap::Parser;
//! let args = Args::parse();
//! ```
//!
@@ -34,32 +35,65 @@ pub mod common;
pub mod compression_engine;
pub mod config;
pub mod db;
pub mod export_tar;
pub mod filter_plugin;
pub mod import_tar;
pub mod meta_plugin;
pub mod modes;
pub mod services;
#[cfg(feature = "client")]
pub mod client;
#[cfg(feature = "meta_tokens")]
pub mod tokenizer;
// Re-export Args struct for library usage
pub use args::Args;
// Re-export PIPESIZE constant
pub use common::PIPESIZE;
pub use services::CoreError;
// Import all filter plugins to ensure they register themselves
#[allow(unused_imports)]
use filter_plugin::{grep, head, skip, strip_ansi, tail};
#[cfg(feature = "filter_grep")]
use filter_plugin::grep;
#[allow(unused_imports)]
use filter_plugin::{head, skip, strip_ansi, tail};
#[cfg(feature = "meta_tokens")]
#[allow(unused_imports)]
use filter_plugin::tokens as token_filters;
use crate::meta_plugin::{
cwd, digest, env, exec, hostname, keep_pid, read_rate, read_time, shell, shell_pid, user,
};
#[cfg(feature = "magic")]
#[cfg(feature = "meta_magic")]
#[allow(unused_imports)]
use crate::meta_plugin::magic_file;
#[cfg(feature = "meta_tokens")]
#[allow(unused_imports)]
use crate::meta_plugin::tokens;
#[cfg(feature = "meta_infer")]
#[allow(unused_imports)]
use crate::meta_plugin::infer_plugin;
#[cfg(feature = "meta_tree_magic_mini")]
#[allow(unused_imports)]
use crate::meta_plugin::tree_magic_mini;
/// Initializes plugins at library load time.
///
/// Ensures all filter and meta plugins are registered via their ctors.
/// Call this early in application startup if needed (though ctors handle most cases).
/// Plugin registration happens automatically via `#[ctor]` constructors
/// when each plugin module is loaded. The explicit module imports in
/// `lib.rs` guarantee this happens at library initialization time.
///
/// This function exists as a public API entry point for callers that
/// want to explicitly ensure plugins are ready. It intentionally does
/// no additional work.
///
/// # Examples
///
@@ -67,8 +101,8 @@ use crate::meta_plugin::magic_file;
/// keep::init_plugins();
/// ```
pub fn init_plugins() {
// This will be expanded in Step 3 implementation
// For now, the ctors handle registration
// Plugins self-register via #[ctor] on module load.
// The use-statements in lib.rs guarantee module inclusion.
}
#[cfg(test)]

View File

@@ -1,3 +1,6 @@
use std::io::Write;
use std::time::Instant;
use anyhow::{Context, Error, Result, anyhow};
use clap::error::ErrorKind;
use clap::*;
@@ -25,13 +28,42 @@ fn main() -> Result<(), Error> {
cmd.error(ErrorKind::ValueValidation, e).exit();
}
stderrlog::new()
.module(module_path!())
.quiet(args.options.quiet)
.verbosity(usize::from(args.options.verbose + 2))
//.timestamp(stderrlog::Timestamp::Second)
.init()
.unwrap();
// Handle --generate-completion early (prints to stdout and exits)
if let Some(shell) = args.mode.generate_completion {
clap_complete::generate(shell, &mut Args::command(), "keep", &mut std::io::stdout());
std::process::exit(0);
}
let start = Instant::now();
let mut builder = env_logger::Builder::new();
let show_module = args.options.verbose >= 2;
builder.format(move |buf, record| {
let elapsed = start.elapsed();
let ts = format!("[{:>6}.{:03}]", elapsed.as_secs(), elapsed.subsec_millis());
if show_module {
writeln!(
buf,
"{} {:<5} {}: {}",
ts,
record.level(),
record.module_path().unwrap_or("?"),
record.args()
)
} else {
writeln!(buf, "{} {:<5} {}", ts, record.level(), record.args())
}
});
let max_level = if args.options.quiet {
LevelFilter::Off
} else {
match args.options.verbose {
0 => LevelFilter::Warn,
1 => LevelFilter::Debug,
_ => LevelFilter::Trace,
}
};
builder.filter_module("keep", max_level);
builder.init();
debug!("MAIN: Start");
@@ -44,37 +76,30 @@ fn main() -> Result<(), Error> {
// Create unified settings using the new config system
let settings = Settings::new(&args, default_dir)?;
debug!("MAIN: Loaded settings: {:?}", settings);
debug!("MAIN: Loaded settings: {settings:?}");
let ids = &mut Vec::new();
let tags = &mut Vec::new();
// For --info and --get modes, treat numeric strings as IDs
// For --info, --get, --export, and --list modes, treat numeric strings as IDs
for v in args.ids_or_tags.iter() {
debug!("MAIN: Parsed value: {:?}", v);
debug!("MAIN: Parsed value: {v:?}");
match v.clone() {
NumberOrString::Number(num) => {
debug!("MAIN: Adding to ids: {}", num);
debug!("MAIN: Adding to ids: {num}");
ids.push(num)
}
NumberOrString::Str(str) => {
// For --info and --get, try to parse strings as numbers to treat them as IDs
if args.mode.info || args.mode.get {
if let Ok(num) = str.parse::<i64>() {
debug!("MAIN: Adding parsed string to ids: {}", num);
// For --info, --get, --export, and --list, try to parse strings as numbers to treat them as IDs
if (args.mode.info || args.mode.get || args.mode.export || args.mode.list)
&& let Ok(num) = str.parse::<i64>()
{
debug!("MAIN: Adding parsed string to ids: {num}");
ids.push(num);
continue;
} else if args.mode.info {
// --info only accepts numeric IDs
cmd.error(
ErrorKind::InvalidValue,
format!("--info requires numeric IDs, found: '{}'", str),
)
.exit();
}
}
// If not a number, or not using --info/--get, treat as tag
debug!("MAIN: Adding to tags: {}", str);
// If not a number, or not using --info/--get/--export/--list, treat as tag
debug!("MAIN: Adding to tags: {str}");
tags.push(str)
}
}
@@ -92,8 +117,12 @@ fn main() -> Result<(), Error> {
List,
Delete,
Info,
Update,
Export,
Import,
Status,
StatusPlugins,
#[cfg(feature = "server")]
Server,
GenerateConfig,
}
@@ -112,13 +141,24 @@ fn main() -> Result<(), Error> {
mode = KeepModes::Delete;
} else if args.mode.info {
mode = KeepModes::Info;
} else if args.mode.update {
mode = KeepModes::Update;
} else if args.mode.export {
mode = KeepModes::Export;
} else if args.mode.import.is_some() {
mode = KeepModes::Import;
} else if args.mode.status {
mode = KeepModes::Status;
} else if args.mode.status_plugins {
mode = KeepModes::StatusPlugins;
} else if args.mode.server {
}
#[cfg(feature = "server")]
{
if args.mode.server {
mode = KeepModes::Server;
} else if args.mode.generate_config {
}
}
if args.mode.generate_config {
mode = KeepModes::GenerateConfig;
}
@@ -154,6 +194,7 @@ fn main() -> Result<(), Error> {
}
// Validate server password usage
#[cfg(feature = "server")]
if settings.server_password().is_some() && mode != KeepModes::Server {
cmd.error(
ErrorKind::InvalidValue,
@@ -162,29 +203,20 @@ fn main() -> Result<(), Error> {
.exit();
}
debug!("MAIN: args: {:?}", args);
debug!("MAIN: ids: {:?}", ids);
debug!("MAIN: tags: {:?}", tags);
debug!("MAIN: mode: {:?}", mode);
debug!("MAIN: settings: {:?}", settings);
unsafe {
libc::umask(0o077);
// Validate ids-only usage
if settings.ids_only && mode != KeepModes::List {
cmd.error(
ErrorKind::InvalidValue,
"--ids-only can only be used with --list mode",
)
.exit();
}
let data_path = settings.dir.clone();
let mut db_path = data_path.clone();
db_path.push("keep-1.db");
debug!("MAIN: Data directory: {:?}", data_path);
debug!("MAIN: DB file: {:?}", db_path);
// Ensure data directory exists
fs::create_dir_all(&data_path)
.with_context(|| format!("Unable to create data directory {:?}", data_path))?;
// Initialize database
let mut conn = db::open(db_path.clone())?;
debug!("MAIN: args: {args:?}");
debug!("MAIN: ids: {ids:?}");
debug!("MAIN: tags: {tags:?}");
debug!("MAIN: mode: {mode:?}");
debug!("MAIN: settings: {settings:?}");
// Parse filter chain early for better error reporting
let filter_chain = if let Some(filter_str) = &args.item.filters {
@@ -193,7 +225,7 @@ fn main() -> Result<(), Error> {
Err(e) => {
cmd.error(
ErrorKind::InvalidValue,
format!("Invalid filter string: {}", e),
format!("Invalid filter string: {e}"),
)
.exit();
}
@@ -202,6 +234,91 @@ fn main() -> Result<(), Error> {
None
};
// Check for client mode
#[cfg(feature = "client")]
{
if let Some(ref client_url) = settings.client_url {
let client = keep::client::KeepClient::new(
client_url,
settings.client_username.clone(),
settings.client_password.clone(),
settings.client_jwt.clone(),
)?;
return match mode {
KeepModes::Save => {
let metadata: std::collections::HashMap<String, String> = settings
.meta
.iter()
.filter_map(|(k, v)| v.as_ref().map(|val| (k.clone(), val.clone())))
.collect();
keep::modes::client::save::mode(&client, &mut cmd, &settings, tags, metadata)
}
KeepModes::Get => keep::modes::client::get::mode(
&client,
&mut cmd,
&settings,
ids,
tags,
filter_chain,
),
KeepModes::List => {
keep::modes::client::list::mode(&client, &mut cmd, &settings, ids, tags)
}
KeepModes::Delete => {
keep::modes::client::delete::mode(&client, &mut cmd, &settings, ids)
}
KeepModes::Info => {
keep::modes::client::info::mode(&client, &mut cmd, &settings, ids, tags)
}
KeepModes::Diff => {
keep::modes::client::diff::mode(&client, &mut cmd, &settings, ids)
}
KeepModes::Status => {
keep::modes::client::status::mode(&client, &mut cmd, &settings)
}
KeepModes::Update => {
keep::modes::client::update::mode(&client, &mut cmd, &settings, ids, tags)
}
KeepModes::Export => {
keep::modes::client::export::mode(&client, &mut cmd, &settings, ids, tags)
}
KeepModes::Import => {
let meta_file = args.mode.import.as_ref().unwrap();
keep::modes::client::import::mode(&client, &mut cmd, &settings, meta_file)
}
_ => {
cmd.error(
ErrorKind::InvalidValue,
format!("Mode {mode:?} is not supported in client mode"),
)
.exit();
}
};
}
}
// SAFETY: umask is thread-safe by POSIX spec, and we invoke it exactly once
// before any file operations to set a secure default mask. No other threads
// exist yet at this point in main(), so there is no data race.
unsafe {
libc::umask(0o077);
}
let data_path = settings.dir.clone();
let mut db_path = data_path.clone();
db_path.push("keep-1.db");
debug!("MAIN: Data directory: {data_path:?}");
debug!("MAIN: DB file: {db_path:?}");
// Ensure data directory exists
fs::create_dir_all(&data_path)
.with_context(|| format!("Unable to create data directory {data_path:?}"))?;
// Initialize database
let mut conn = db::open(db_path.clone())?;
match mode {
KeepModes::Save => {
modes::save::mode_save(&mut cmd, &settings, ids, tags, &mut conn, data_path)
@@ -225,23 +342,28 @@ fn main() -> Result<(), Error> {
KeepModes::Info => {
modes::info::mode_info(&mut cmd, &settings, ids, tags, &mut conn, data_path)
}
KeepModes::Update => {
modes::update::mode_update(&mut cmd, &settings, ids, tags, &mut conn, data_path)
}
KeepModes::Export => modes::export::mode_export(
&mut cmd,
&settings,
ids,
tags,
&mut conn,
data_path,
filter_chain,
),
KeepModes::Import => {
let meta_file = args.mode.import.as_ref().unwrap();
modes::import::mode_import(&mut cmd, &settings, meta_file, &mut conn, data_path)
}
KeepModes::Status => modes::status::mode_status(&mut cmd, &settings, data_path, db_path),
KeepModes::StatusPlugins => {
modes::status_plugins::mode_status_plugins(&mut cmd, &settings, data_path, db_path)
}
KeepModes::Server => {
#[cfg(feature = "server")]
{
modes::server::mode_server(&mut cmd, &settings, &mut conn, data_path)
}
#[cfg(not(feature = "server"))]
{
cmd.error(
ErrorKind::MissingRequiredArgument,
"This binary was not compiled with server support. Recompile with --features server"
).exit();
}
}
KeepModes::Server => modes::server::mode_server(&mut cmd, &settings, &mut conn, data_path),
KeepModes::GenerateConfig => {
modes::generate_config::mode_generate_config(&mut cmd, &settings)
}

View File

@@ -49,6 +49,14 @@ impl MetaPlugin for CwdMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn finalize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process again
if self.is_finalized {
@@ -105,16 +113,20 @@ impl MetaPlugin for CwdMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -124,5 +136,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_cwd_plugin() {
register_meta_plugin(MetaPluginType::Cwd, |options, outputs| {
Box::new(CwdMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register CwdMetaPlugin");
}

View File

@@ -32,7 +32,7 @@ impl Hasher {
match self {
Hasher::Sha256(hasher) => hasher.update(data),
Hasher::Md5(hasher) => {
let _ = hasher.write(data);
hasher.consume(data);
}
Hasher::Sha512(hasher) => hasher.update(data),
}
@@ -42,15 +42,15 @@ impl Hasher {
match self {
Hasher::Sha256(hasher) => {
let result = std::mem::replace(hasher, Sha256::new()).finalize_reset();
format!("{:x}", result)
format!("{result:x}")
}
Hasher::Md5(hasher) => {
let result = hasher.clone().compute();
format!("{:x}", result)
format!("{result:x}")
}
Hasher::Sha512(hasher) => {
let result = std::mem::replace(hasher, Sha512::new()).finalize_reset();
format!("{:x}", result)
format!("{result:x}")
}
}
}
@@ -119,7 +119,7 @@ impl DigestMetaPlugin {
// Only the selected method's output should be enabled, others should be None
let all_outputs = vec!["digest_md5", "digest_sha256", "digest_sha512"];
for output_name in &all_outputs {
if output_name == &format!("digest_{}", method) {
if output_name == &format!("digest_{method}") {
base.outputs.insert(
output_name.to_string(),
serde_yaml::Value::String(output_name.to_string()),
@@ -159,6 +159,14 @@ impl MetaPlugin for DigestMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn initialize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
@@ -235,8 +243,10 @@ impl MetaPlugin for DigestMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
@@ -251,8 +261,14 @@ impl MetaPlugin for DigestMetaPlugin {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
@@ -263,5 +279,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_digest_plugin() {
register_meta_plugin(MetaPluginType::Digest, |options, outputs| {
Box::new(DigestMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register DigestMetaPlugin");
}

View File

@@ -22,24 +22,40 @@ impl EnvMetaPlugin {
///
/// A new instance of `EnvMetaPlugin`.
pub fn new(
_options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> Self {
// Collect environment variables starting with KEEP_META_
let mut env_vars = Vec::new();
let mut outputs_map = std::collections::HashMap::new();
// Use options from --meta-plugin JSON if provided and non-empty,
// otherwise fall back to KEEP_META_* environment variables.
let use_options = options.as_ref().map(|o| !o.is_empty()).unwrap_or(false);
if use_options {
let opts = options.as_ref().unwrap();
for (key, value) in opts {
let value_str = match value {
serde_yaml::Value::String(s) => s.clone(),
serde_yaml::Value::Number(n) => n.to_string(),
serde_yaml::Value::Bool(b) => b.to_string(),
_ => serde_yaml::to_string(value).unwrap_or_default(),
};
env_vars.push((key.clone(), value_str));
outputs_map.insert(key.clone(), serde_yaml::Value::String(key.clone()));
}
} else {
// Fall back to KEEP_META_* environment variables
for (key, value) in std::env::vars() {
if let Some(stripped_key) = key.strip_prefix("KEEP_META_") {
// Add to env_vars to process later
env_vars.push((stripped_key.to_string(), value));
// Add to outputs with default mapping to the stripped name
outputs_map.insert(
stripped_key.to_string(),
serde_yaml::Value::String(stripped_key.to_string()),
);
}
}
}
// Override with provided outputs
if let Some(provided_outputs) = outputs {
@@ -87,6 +103,14 @@ impl MetaPlugin for EnvMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Initializes the plugin, processing environment variables.
///
/// Processes all KEEP_META_* variables and generates metadata using output mappings.
@@ -183,8 +207,10 @@ impl MetaPlugin for EnvMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names based on collected env vars.
@@ -212,8 +238,10 @@ impl MetaPlugin for EnvMetaPlugin {
/// # Panics
///
/// Panics with "options_mut() not implemented for EnvMetaPlugin".
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -223,5 +251,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_env_plugin() {
register_meta_plugin(MetaPluginType::Env, |options, outputs| {
Box::new(EnvMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register EnvMetaPlugin");
}

View File

@@ -20,7 +20,7 @@ pub struct MetaPluginExec {
pub supported: bool,
pub split_whitespace: bool,
process: Option<Child>,
writer: Option<Box<dyn Write>>,
writer: Option<Box<dyn Write + Send>>,
result: Option<String>,
base: BaseMetaPlugin,
}
@@ -66,7 +66,8 @@ impl MetaPluginExec {
/// # Examples
///
/// ```
/// let plugin = MetaPluginExec::new("date", &[], "date_output", false, None, None);
/// # use keep::meta_plugin::MetaPluginExec;
/// let plugin = MetaPluginExec::new("date", &[], "date_output".to_string(), false, None, None);
/// ```
pub fn new(
program: &str,
@@ -130,7 +131,19 @@ impl MetaPluginExec {
match cmd.spawn() {
Ok(mut child) => {
let stdin = child.stdin.take().unwrap();
let stdin = match child.stdin.take() {
Some(s) => s,
None => {
error!(
"META: Exec plugin: failed to capture stdin for '{}'",
self.program
);
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
};
self.writer = Some(Box::new(stdin));
self.process = Some(child);
debug!("META: Exec plugin: started process for '{}'", self.program);
@@ -166,6 +179,14 @@ impl MetaPlugin for MetaPluginExec {
false
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn initialize(&mut self) -> MetaPluginResponse {
self.start_process()
}
@@ -174,7 +195,7 @@ impl MetaPlugin for MetaPluginExec {
if let Some(writer) = self.writer.as_mut()
&& let Err(e) = writer.write_all(data)
{
error!("META: Exec plugin: failed to write to stdin: {}", e);
error!("META: Exec plugin: failed to write to stdin: {e}");
}
MetaPluginResponse {
metadata: Vec::new(),
@@ -219,11 +240,11 @@ impl MetaPlugin for MetaPluginExec {
}
} else {
let stderr = String::from_utf8_lossy(&output.stderr);
error!("META: Exec plugin: command failed: {}", stderr);
error!("META: Exec plugin: command failed: {stderr}");
}
}
Err(e) => {
error!("META: Exec plugin: failed to wait on process: {}", e);
error!("META: Exec plugin: failed to wait on process: {e}");
}
}
}
@@ -243,21 +264,29 @@ impl MetaPlugin for MetaPluginExec {
&self.base.outputs
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.base.outputs
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(&mut self.base.outputs)
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
&self.base.options
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.base.options
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(&mut self.base.options)
}
fn default_outputs(&self) -> Vec<String> {
vec!["exec".to_string()]
}
fn parallel_safe(&self) -> bool {
true
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -302,5 +331,6 @@ fn register_exec_plugin() {
options,
outputs,
))
});
})
.expect("Failed to register ExecMetaPlugin");
}

View File

@@ -178,7 +178,7 @@ impl HostnameMetaPlugin {
{
let domain_str = String::from_utf8_lossy(&domain.stdout).trim().to_string();
if !domain_str.is_empty() && domain_str != "(none)" {
return format!("{}.{}", short_hostname, domain_str);
return format!("{short_hostname}.{domain_str}");
}
}
}
@@ -211,6 +211,14 @@ impl MetaPlugin for HostnameMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn finalize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process again
if self.is_finalized {
@@ -375,8 +383,10 @@ impl MetaPlugin for HostnameMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
@@ -391,8 +401,10 @@ impl MetaPlugin for HostnameMetaPlugin {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -402,5 +414,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_hostname_plugin() {
register_meta_plugin(MetaPluginType::Hostname, |options, outputs| {
Box::new(HostnameMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register HostnameMetaPlugin");
}

View File

@@ -0,0 +1,177 @@
use crate::common::PIPESIZE;
use crate::meta_plugin::{
BaseMetaPlugin, MetaPlugin, MetaPluginResponse, MetaPluginType, process_metadata_outputs,
register_meta_plugin,
};
#[derive(Debug, Default)]
pub struct InferMetaPlugin {
buffer: Vec<u8>,
max_buffer_size: usize,
is_finalized: bool,
base: BaseMetaPlugin,
}
impl InferMetaPlugin {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> InferMetaPlugin {
let mut base = BaseMetaPlugin::new();
if let Some(opts) = options {
for (key, value) in opts {
base.options.insert(key, value);
}
}
let max_buffer_size = base
.options
.get("max_buffer_size")
.and_then(|v| v.as_u64())
.unwrap_or(PIPESIZE as u64) as usize;
base.outputs.insert(
"infer_mime_type".to_string(),
serde_yaml::Value::String("infer_mime_type".to_string()),
);
if let Some(outs) = outputs {
for (key, value) in outs {
base.outputs.insert(key, value);
}
}
InferMetaPlugin {
buffer: Vec::new(),
max_buffer_size,
is_finalized: false,
base,
}
}
}
impl MetaPlugin for InferMetaPlugin {
fn meta_type(&self) -> MetaPluginType {
MetaPluginType::Infer
}
fn is_finalized(&self) -> bool {
self.is_finalized
}
fn set_finalized(&mut self, finalized: bool) {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn update(&mut self, data: &[u8]) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let remaining = self.max_buffer_size.saturating_sub(self.buffer.len());
let to_add = &data[..data.len().min(remaining)];
self.buffer.extend_from_slice(to_add);
if self.buffer.len() >= self.max_buffer_size {
let mime_type = infer::get(&self.buffer)
.map(|kind| kind.mime_type().to_string())
.unwrap_or_else(|| "application/octet-stream".to_string());
self.is_finalized = true;
let metadata = process_metadata_outputs(
"infer_mime_type",
serde_yaml::Value::String(mime_type),
self.base.outputs(),
)
.map(|m| vec![m])
.unwrap_or_default();
return MetaPluginResponse {
metadata,
is_finalized: true,
};
}
MetaPluginResponse {
metadata: Vec::new(),
is_finalized: false,
}
}
fn finalize(&mut self) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let mime_type = infer::get(&self.buffer)
.map(|kind| kind.mime_type().to_string())
.unwrap_or_else(|| "application/octet-stream".to_string());
self.is_finalized = true;
let metadata = process_metadata_outputs(
"infer_mime_type",
serde_yaml::Value::String(mime_type),
self.base.outputs(),
)
.map(|m| vec![m])
.unwrap_or_default();
MetaPluginResponse {
metadata,
is_finalized: true,
}
}
fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs()
}
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
vec!["infer_mime_type".to_string()]
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
#[ctor::ctor]
fn register_infer_plugin() {
register_meta_plugin(MetaPluginType::Infer, |options, outputs| {
Box::new(InferMetaPlugin::new(options, outputs))
})
.expect("Failed to register InferMetaPlugin");
}

View File

@@ -54,6 +54,14 @@ impl MetaPlugin for KeepPidMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Finalizes the plugin, processing any remaining data if needed.
///
/// # Returns
@@ -162,8 +170,10 @@ impl MetaPlugin for KeepPidMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names for this plugin.
@@ -189,8 +199,10 @@ impl MetaPlugin for KeepPidMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -200,5 +212,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_keep_pid_plugin() {
register_meta_plugin(MetaPluginType::KeepPid, |options, outputs| {
Box::new(KeepPidMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register KeepPidMetaPlugin");
}

View File

@@ -1,337 +0,0 @@
use magic::{Cookie, CookieFlags};
use std::io;
use crate::common::PIPESIZE;
use crate::meta_plugin::{MetaPlugin, MetaPluginType};
#[derive(Debug)]
pub struct MagicFileMetaPlugin {
buffer: Vec<u8>,
max_buffer_size: usize,
is_finalized: bool,
cookie: Option<Cookie>,
base: crate::meta_plugin::BaseMetaPlugin,
}
impl MagicFileMetaPlugin {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> MagicFileMetaPlugin {
// Start with default options
let mut final_options = std::collections::HashMap::new();
final_options.insert("max_buffer_size".to_string(), serde_yaml::Value::Number(PIPESIZE.into()));
if let Some(opts) = options {
for (key, value) in opts {
final_options.insert(key, value);
}
}
// Start with default outputs
let mut final_outputs = std::collections::HashMap::new();
let default_outputs = vec!["mime_type".to_string(), "mime_encoding".to_string(), "file_type".to_string()];
for output_name in default_outputs {
final_outputs.insert(output_name.clone(), serde_yaml::Value::String(output_name));
}
if let Some(outs) = outputs {
for (key, value) in outs {
final_outputs.insert(key, value);
}
}
let max_buffer_size = final_options.get("max_buffer_size")
.and_then(|v| v.as_u64())
.unwrap_or(PIPESIZE as u64) as usize;
// Ensure the default max_buffer_size is in the options
if !final_options.contains_key("max_buffer_size") {
final_options.insert("max_buffer_size".to_string(), serde_yaml::Value::Number(PIPESIZE.into()));
}
let mut base = crate::meta_plugin::BaseMetaPlugin::new();
base.outputs = final_outputs;
base.options = final_options;
MagicFileMetaPlugin {
buffer: Vec::new(),
max_buffer_size,
is_finalized: false,
cookie: None,
base,
}
}
fn get_magic_result(&self, flags: CookieFlags) -> io::Result<String> {
// Use the existing cookie and just change flags
if let Some(cookie) = &self.cookie {
cookie.set_flags(flags)
.map_err(|e| io::Error::new(io::ErrorKind::Other, format!("Failed to set magic flags: {}", e)))?;
let result = cookie.buffer(&self.buffer)
.map_err(|e| io::Error::new(io::ErrorKind::Other, format!("Failed to analyze buffer: {}", e)))?;
// Clean up the result - remove extra whitespace and take first part if needed
let trimmed = result.trim();
// For some magic results, we might want just the first part before semicolon or comma
let cleaned = if trimmed.contains(';') {
trimmed.split(';').next().unwrap_or(trimmed).trim()
} else if trimmed.contains(',') && flags.contains(CookieFlags::MIME_TYPE | CookieFlags::MIME_ENCODING) {
trimmed.split(',').next().unwrap_or(trimmed).trim()
} else {
trimmed
};
Ok(cleaned.to_string())
} else {
Err(io::Error::new(io::ErrorKind::Other, "Magic cookie not initialized"))
}
}
/// Helper function to process all magic types and collect metadata
fn process_magic_types(&self) -> Vec<crate::meta_plugin::MetaData> {
let mut metadata = Vec::new();
// Define the types to process with their corresponding flags
let types_to_process = [
("mime_type", CookieFlags::MIME_TYPE),
("mime_encoding", CookieFlags::MIME_ENCODING),
("file_type", CookieFlags::default()),
];
for (name, flags) in types_to_process.iter() {
if let Ok(result) = self.get_magic_result(*flags) {
if !result.is_empty() {
// Use process_metadata_outputs to handle output mapping
if let Some(meta_data) = crate::meta_plugin::process_metadata_outputs(
name,
serde_yaml::Value::String(result),
self.base.outputs()
) {
metadata.push(meta_data);
}
}
}
}
metadata
}
}
impl MetaPlugin for MagicFileMetaPlugin {
/// Checks if the plugin has been finalized.
///
/// # Returns
///
/// `true` if finalized, `false` otherwise.
fn is_finalized(&self) -> bool {
self.is_finalized
}
/// Sets the finalized state of the plugin.
///
/// # Arguments
///
/// * `finalized` - The new finalized state.
fn set_finalized(&mut self, finalized: bool) {
self.is_finalized = finalized;
}
/// Initializes the magic cookie for file type detection.
///
/// Loads the magic database; finalizes if initialization fails.
///
/// # Returns
///
/// A `MetaPluginResponse` with empty metadata; `is_finalized` is `true` on failure.
///
/// # Errors
///
/// Logs errors; returns finalized response on cookie or load failure.
///
/// # Examples
///
/// ```
/// let mut plugin = MagicFileMetaPlugin::new(None, None);
/// let response = plugin.initialize();
/// ```
fn initialize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// Initialize the magic cookie once
let cookie = match Cookie::open(Default::default()) {
Ok(cookie) => cookie,
Err(_e) => {
return crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
};
if let Err(_e) = cookie.load(&[] as &[&str]) {
return crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
self.cookie = Some(cookie);
crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
is_finalized: false,
}
}
/// Finalizes the plugin and performs file type detection.
///
/// Analyzes the accumulated buffer and outputs detected types.
///
/// # Returns
///
/// A `MetaPluginResponse` with detection metadata and finalized state set to `true`.
///
/// # Examples
///
/// ```
/// let mut plugin = MagicFileMetaPlugin::new(None, None);
/// // ... after updates
/// let response = plugin.finalize();
/// assert!(response.is_finalized);
/// ```
fn finalize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process again
if self.is_finalized {
return crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let metadata = self.process_magic_types();
// Mark as finalized
self.is_finalized = true;
crate::meta_plugin::MetaPluginResponse {
metadata,
is_finalized: true,
}
}
/// Updates the plugin with new data, accumulating for analysis.
///
/// Buffers data up to `max_buffer_size`; triggers detection when full.
///
/// # Arguments
///
/// * `data` - Content chunk to buffer.
///
/// # Returns
///
/// A `MetaPluginResponse` with metadata on buffer full; finalizes then.
///
/// # Examples
///
/// ```
/// let mut plugin = MagicFileMetaPlugin::new(None, None);
/// let response = plugin.update(b"content");
/// ```
fn update(&mut self, data: &[u8]) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process more data
if self.is_finalized {
return crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let mut metadata = Vec::new();
// Only collect up to max_buffer_size
let remaining_capacity = self.max_buffer_size.saturating_sub(self.buffer.len());
if remaining_capacity > 0 {
let bytes_to_copy = std::cmp::min(data.len(), remaining_capacity);
self.buffer.extend_from_slice(&data[..bytes_to_copy]);
// Check if we've reached our buffer limit and return metadata
if self.buffer.len() >= self.max_buffer_size {
metadata = self.process_magic_types();
// Mark as finalized when we've processed enough data
self.is_finalized = true;
}
}
let is_finalized = !metadata.is_empty();
crate::meta_plugin::MetaPluginResponse {
metadata,
is_finalized,
}
}
/// Returns the type of this meta plugin.
///
/// # Returns
///
/// `MetaPluginType::MagicFile`.
fn meta_type(&self) -> MetaPluginType {
MetaPluginType::MagicFile
}
/// Returns a reference to the outputs mapping.
///
/// # Returns
///
/// A reference to the `HashMap` of outputs.
fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs()
}
/// Returns a mutable reference to the outputs mapping.
///
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
}
/// Returns the default output names for this plugin.
///
/// # Returns
///
/// Vector of default output field names.
fn default_outputs(&self) -> Vec<String> {
vec!["mime_type".to_string(), "mime_encoding".to_string(), "file_type".to_string()]
}
/// Returns a reference to the options mapping.
///
/// # Returns
///
/// A reference to the `HashMap` of options.
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
/// Returns a mutable reference to the options mapping.
///
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
}
}
use crate::meta_plugin::register_meta_plugin;
// Register the plugin at module initialization time
#[ctor::ctor]
fn register_magic_file_plugin() {
register_meta_plugin(MetaPluginType::MagicFile, |options, outputs| {
Box::new(MagicFileMetaPlugin::new(options, outputs))
});
}

View File

@@ -1,9 +1,8 @@
#[cfg(feature = "magic")]
#[cfg(feature = "meta_magic")]
use magic::{Cookie, CookieFlags};
#[cfg(not(feature = "magic"))]
#[cfg(not(feature = "meta_magic"))]
use std::process::{Command, Stdio};
use log::debug;
use std::io::{self, Write};
use std::path::Path;
@@ -12,17 +11,26 @@ use crate::meta_plugin::{
process_metadata_outputs,
};
#[cfg(feature = "magic")]
// Thread-local libmagic cookie, lazily initialized on first access per thread.
// Each thread gets its own independent Cookie instance. Libmagic documents that
// separate cookies can be used from different threads concurrently without
// synchronization. Using thread_local! avoids unsafe impl Send since the
// storage is inherently !Send.
#[cfg(feature = "meta_magic")]
thread_local! {
static MAGIC_COOKIE: std::cell::RefCell<Option<Cookie>> = const { std::cell::RefCell::new(None) };
}
#[cfg(feature = "meta_magic")]
#[derive(Debug)]
pub struct MagicFileMetaPluginImpl {
buffer: Vec<u8>,
max_buffer_size: usize,
is_finalized: bool,
cookie: Option<Cookie>,
base: BaseMetaPlugin,
}
#[cfg(feature = "magic")]
#[cfg(feature = "meta_magic")]
impl MagicFileMetaPluginImpl {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
@@ -45,28 +53,38 @@ impl MagicFileMetaPluginImpl {
buffer: Vec::new(),
max_buffer_size,
is_finalized: false,
cookie: None,
base,
}
}
fn get_magic_result(&self, flags: CookieFlags) -> io::Result<String> {
if let Some(cookie) = &self.cookie {
MAGIC_COOKIE.with(|cell| {
// Lazy init: create cookie on first access per thread
{
let mut opt = cell.borrow_mut();
if opt.is_none() {
let cookie = Cookie::open(CookieFlags::default())
.map_err(|e| io::Error::other(format!("Failed to open magic: {e}")))?;
cookie.load(&[] as &[&Path]).map_err(|e| {
io::Error::other(format!("Failed to load magic database: {e}"))
})?;
*opt = Some(cookie);
}
}
let cookie_ref = cell.borrow();
let cookie = cookie_ref.as_ref().expect("cookie initialized above");
cookie
.set_flags(flags)
.map_err(|e| io::Error::other(format!("Failed to set magic flags: {}", e)))?;
.map_err(|e| io::Error::other(format!("Failed to set magic flags: {e}")))?;
let result = cookie
.buffer(&self.buffer)
.map_err(|e| io::Error::other(format!("Failed to analyze buffer: {}", e)))?;
.map_err(|e| io::Error::other(format!("Failed to analyze buffer: {e}")))?;
// Clean up the result - remove extra whitespace
let trimmed = result.trim().to_string();
Ok(trimmed)
} else {
Err(io::Error::other("Magic cookie not initialized"))
}
Ok(result.trim().to_string())
})
}
fn process_magic_types(&self) -> Vec<MetaData> {
@@ -95,7 +113,7 @@ impl MagicFileMetaPluginImpl {
}
}
#[cfg(feature = "magic")]
#[cfg(feature = "meta_magic")]
impl MetaPlugin for MagicFileMetaPluginImpl {
fn is_finalized(&self) -> bool {
self.is_finalized
@@ -105,31 +123,16 @@ impl MetaPlugin for MagicFileMetaPluginImpl {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn initialize(&mut self) -> MetaPluginResponse {
let cookie = match Cookie::open(CookieFlags::default()) {
Ok(cookie) => cookie,
Err(e) => {
debug!("META: MagicFile plugin: failed to create cookie: {}", e);
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
};
if let Err(e) = cookie.load(&[] as &[&Path]) {
debug!(
"META: MagicFile plugin: failed to load magic database: {}",
e
);
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
self.cookie = Some(cookie);
// Cookie is lazily initialized in the thread-local on first use.
MetaPluginResponse {
metadata: Vec::new(),
is_finalized: false,
@@ -190,8 +193,10 @@ impl MetaPlugin for MagicFileMetaPluginImpl {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
@@ -206,12 +211,21 @@ impl MetaPlugin for MagicFileMetaPluginImpl {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
#[cfg(not(feature = "magic"))]
#[cfg(feature = "meta_magic")]
pub use MagicFileMetaPluginImpl as MagicFileMetaPlugin;
#[cfg(not(feature = "meta_magic"))]
#[derive(Debug)]
pub struct FallbackMagicFileMetaPlugin {
buffer: Vec<u8>,
@@ -220,26 +234,23 @@ pub struct FallbackMagicFileMetaPlugin {
base: BaseMetaPlugin,
}
#[cfg(not(feature = "magic"))]
#[cfg(not(feature = "meta_magic"))]
impl FallbackMagicFileMetaPlugin {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> FallbackMagicFileMetaPlugin {
) -> Self {
let mut base = BaseMetaPlugin::new();
// Set default outputs
let default_outputs = &["mime_type", "mime_encoding", "file_type"];
base.initialize_plugin(default_outputs, &options, &outputs);
// Get max_buffer_size from options, default to PIPESIZE
let max_buffer_size = base
.options
.get("max_buffer_size")
.and_then(|v| v.as_u64())
.unwrap_or(crate::common::PIPESIZE as u64) as usize;
FallbackMagicFileMetaPlugin {
Self {
buffer: Vec::new(),
max_buffer_size,
is_finalized: false,
@@ -247,76 +258,85 @@ impl FallbackMagicFileMetaPlugin {
}
}
fn run_file_command(&self, buffer: &[u8]) -> io::Result<String> {
let mut temp_file = tempfile::NamedTempFile::new()?;
temp_file.as_ref().write_all(buffer)?;
fn run_file_command(&self, args: &[&str]) -> Option<String> {
let output = Command::new("file")
.arg("-b")
.arg("-m")
.arg("all")
.arg(temp_file.path())
.output()
.map_err(|e| {
io::Error::new(
io::ErrorKind::Other,
format!("Failed to run file command: {}", e),
)
})?;
.args(args)
.arg("-")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.and_then(|mut child| {
if let Some(mut stdin) = child.stdin.take() {
if stdin.write_all(&self.buffer).is_err() {
// Ignore write error; child will see EOF and likely fail
// the file detection, returning no output.
}
}
child.wait_with_output()
});
if !output.status.success() {
return Err(io::Error::new(io::ErrorKind::Other, "File command failed"));
output
.ok()
.map(|o| String::from_utf8_lossy(&o.stdout).trim().to_string())
}
let result = String::from_utf8_lossy(&output.stdout).trim().to_string();
Ok(result)
}
fn process_file_output(&self, result: &str) -> Vec<MetaData> {
fn detect_type(&self) -> Vec<MetaData> {
let mut metadata = Vec::new();
// Parse the file command output
// file -m all output format is typically: type; charset=encoding
let parts: Vec<&str> = result.split(';').map(|s| s.trim()).collect();
let file_type = parts.first().cloned().unwrap_or(result);
let mime_encoding = parts
.get(1)
.and_then(|s| s.strip_prefix("charset="))
.cloned()
.unwrap_or("");
// Get mime_type and mime_encoding via --mime
if let Some(mime_line) = self.run_file_command(&["--brief", "--mime"]) {
// Format: "text/plain; charset=us-ascii"
if let Some((mime_type, rest)) = mime_line.split_once(';') {
let mime_type = mime_type.trim().to_string();
let mime_encoding = rest
.trim()
.strip_prefix("charset=")
.unwrap_or("binary")
.to_string();
// For mime_type, try to infer from file type or use a heuristic
let mime_type = if file_type.starts_with("text") {
"text/plain"
} else if file_type.contains("ASCII") || file_type.contains("UTF-8") {
"text/plain"
} else if file_type.contains("empty") {
"application/octet-stream"
} else {
"application/octet-stream" // default
};
let outputs_to_process = [
("mime_type", mime_type),
("mime_encoding", mime_encoding),
("file_type", file_type),
];
for (name, value) in outputs_to_process.iter() {
if let Some(meta_data) = process_metadata_outputs(
name,
serde_yaml::Value::String(value.to_string()),
"mime_type",
serde_yaml::Value::String(mime_type),
self.base.outputs(),
) {
metadata.push(meta_data);
}
if let Some(meta_data) = process_metadata_outputs(
"mime_encoding",
serde_yaml::Value::String(mime_encoding),
self.base.outputs(),
) {
metadata.push(meta_data);
}
} else {
// No charset, just mime type
if let Some(meta_data) = process_metadata_outputs(
"mime_type",
serde_yaml::Value::String(mime_line),
self.base.outputs(),
) {
metadata.push(meta_data);
}
}
}
// Get human-readable file type via --brief
if let Some(file_type) = self.run_file_command(&["--brief"])
&& !file_type.is_empty()
&& let Some(meta_data) = process_metadata_outputs(
"file_type",
serde_yaml::Value::String(file_type),
self.base.outputs(),
)
{
metadata.push(meta_data);
}
metadata
}
}
#[cfg(not(feature = "magic"))]
#[cfg(not(feature = "meta_magic"))]
impl MetaPlugin for FallbackMagicFileMetaPlugin {
fn is_finalized(&self) -> bool {
self.is_finalized
@@ -326,8 +346,15 @@ impl MetaPlugin for FallbackMagicFileMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn initialize(&mut self) -> MetaPluginResponse {
// No initialization needed for fallback
MetaPluginResponse {
metadata: Vec::new(),
is_finalized: false,
@@ -342,27 +369,18 @@ impl MetaPlugin for FallbackMagicFileMetaPlugin {
};
}
let remaining_capacity = self.max_buffer_size.saturating_sub(self.buffer.len());
if remaining_capacity > 0 {
let bytes_to_copy = std::cmp::min(data.len(), remaining_capacity);
self.buffer.extend_from_slice(&data[..bytes_to_copy]);
let remaining = self.max_buffer_size.saturating_sub(self.buffer.len());
if remaining > 0 {
let n = std::cmp::min(data.len(), remaining);
self.buffer.extend_from_slice(&data[..n]);
if self.buffer.len() >= self.max_buffer_size {
if let Ok(result) = self.run_file_command(&self.buffer) {
let metadata = self.process_file_output(&result);
let metadata = self.detect_type();
self.is_finalized = true;
return MetaPluginResponse {
metadata,
is_finalized: true,
};
} else {
// On error, finalize with empty metadata
self.is_finalized = true;
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
}
}
@@ -379,21 +397,9 @@ impl MetaPlugin for FallbackMagicFileMetaPlugin {
is_finalized: true,
};
}
let metadata = if !self.buffer.is_empty() {
if let Ok(result) = self.run_file_command(&self.buffer) {
self.process_file_output(&result)
} else {
Vec::new()
}
} else {
Vec::new()
};
self.is_finalized = true;
MetaPluginResponse {
metadata,
metadata: self.detect_type(),
is_finalized: true,
}
}
@@ -406,8 +412,10 @@ impl MetaPlugin for FallbackMagicFileMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
@@ -422,15 +430,18 @@ impl MetaPlugin for FallbackMagicFileMetaPlugin {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
#[cfg(feature = "magic")]
pub use MagicFileMetaPluginImpl as MagicFileMetaPlugin;
#[cfg(not(feature = "magic"))]
#[cfg(not(feature = "meta_magic"))]
pub use FallbackMagicFileMetaPlugin as MagicFileMetaPlugin;
use crate::meta_plugin::register_meta_plugin;
@@ -439,5 +450,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_magic_file_plugin() {
register_meta_plugin(MetaPluginType::MagicFile, |options, outputs| {
Box::new(MagicFileMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register MagicFileMetaPlugin");
}

View File

@@ -1,41 +1,48 @@
use log::debug;
use once_cell::sync::Lazy;
use log::{debug, warn};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::sync::Mutex;
use std::sync::{Arc, Mutex};
pub mod cwd;
pub mod digest;
pub mod env;
pub mod exec;
pub mod hostname;
#[cfg(feature = "meta_infer")]
pub mod infer_plugin;
pub mod keep_pid;
#[cfg(feature = "magic")]
pub mod magic_file;
pub mod read_rate;
pub mod read_time;
pub mod shell;
pub mod shell_pid;
pub mod text;
#[cfg(feature = "meta_tokens")]
pub mod tokens;
#[cfg(feature = "meta_tree_magic_mini")]
pub mod tree_magic_mini;
pub mod user;
// pub mod text; // Removed duplicate
pub use digest::DigestMetaPlugin;
pub use exec::MetaPluginExec;
#[cfg(feature = "magic")]
#[cfg(feature = "meta_magic")]
pub use magic_file::MagicFileMetaPlugin;
// pub use text::TextMetaPlugin; // Removed duplicate
pub use cwd::CwdMetaPlugin;
pub use env::EnvMetaPlugin;
pub use hostname::HostnameMetaPlugin;
#[cfg(feature = "meta_infer")]
pub use infer_plugin::InferMetaPlugin;
pub use keep_pid::KeepPidMetaPlugin;
pub use read_rate::ReadRateMetaPlugin;
pub use read_time::ReadTimeMetaPlugin;
pub use shell::ShellMetaPlugin;
pub use shell_pid::ShellPidMetaPlugin;
#[cfg(feature = "meta_tree_magic_mini")]
pub use tree_magic_mini::TreeMagicMiniMetaPlugin;
pub use user::UserMetaPlugin;
#[cfg(not(feature = "magic"))]
#[cfg(not(feature = "meta_magic"))]
pub use magic_file::FallbackMagicFileMetaPlugin as MagicFileMetaPlugin;
type PluginConstructor = fn(
@@ -61,8 +68,16 @@ pub struct MetaPluginResponse {
pub is_finalized: bool,
}
/// Type alias for the save_meta callback shared by all plugins.
pub type SaveMetaFn = Arc<Mutex<dyn FnMut(&str, &str) + Send>>;
/// Creates a no-op save_meta for plugins not wired through MetaService.
pub fn noop_save_meta() -> SaveMetaFn {
Arc::new(Mutex::new(|_: &str, _: &str| {}))
}
/// Base implementation for meta plugins to reduce boilerplate.
#[derive(Debug, Clone, Default)]
#[derive(Clone)]
pub struct BaseMetaPlugin {
/// Output mappings for metadata.
pub outputs: std::collections::HashMap<String, serde_yaml::Value>,
@@ -70,6 +85,29 @@ pub struct BaseMetaPlugin {
pub options: std::collections::HashMap<String, serde_yaml::Value>,
/// Whether the plugin is finalized.
pub is_finalized: bool,
/// Callback to store metadata. Called directly by plugins.
pub save_meta: SaveMetaFn,
}
impl std::fmt::Debug for BaseMetaPlugin {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("BaseMetaPlugin")
.field("outputs", &self.outputs)
.field("options", &self.options)
.field("is_finalized", &self.is_finalized)
.finish_non_exhaustive()
}
}
impl Default for BaseMetaPlugin {
fn default() -> Self {
Self {
outputs: HashMap::new(),
options: HashMap::new(),
is_finalized: false,
save_meta: noop_save_meta(),
}
}
}
impl BaseMetaPlugin {
@@ -83,41 +121,39 @@ impl BaseMetaPlugin {
}
/// Returns a reference to the outputs mapping.
///
/// # Returns
///
/// A reference to the `HashMap` of outputs.
pub fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
&self.outputs
}
/// Returns a mutable reference to the outputs mapping.
///
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
pub fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.outputs
}
/// Returns a reference to the options mapping.
///
/// # Returns
///
/// A reference to the `HashMap` of options.
pub fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
&self.options
}
/// Returns a mutable reference to the options mapping.
///
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
pub fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.options
}
/// Sets the save_meta callback on the base plugin.
pub fn set_save_meta(&mut self, save_meta: SaveMetaFn) {
self.save_meta = save_meta;
}
/// Saves a metadata entry via the save_meta callback.
pub fn save_meta(&self, name: &str, value: &str) {
if let Ok(mut f) = self.save_meta.lock() {
f(name, value);
} else {
warn!("META_PLUGIN: save_meta lock poisoned, dropping metadata: {name}={value}");
}
}
/// Helper function to initialize plugin options and outputs.
///
/// # Arguments
@@ -179,8 +215,10 @@ impl MetaPlugin for BaseMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.outputs
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(&mut self.outputs)
}
/// Returns a reference to the options mapping.
@@ -197,8 +235,10 @@ impl MetaPlugin for BaseMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.options
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(&mut self.options)
}
}
@@ -229,6 +269,9 @@ pub enum MetaPluginType {
Hostname,
Exec,
Env,
Tokens,
TreeMagicMini,
Infer,
}
/// Central function to handle metadata output with name mapping.
@@ -251,36 +294,20 @@ pub fn process_metadata_outputs(
if let Some(mapping) = outputs.get(internal_name) {
// Check for null to disable the output
if mapping.is_null() {
debug!("META: Skipping disabled output (null): {}", internal_name);
debug!("META: Skipping disabled output (null): {internal_name}");
return None;
}
// Check for boolean false to disable the output
if let Some(false_val) = mapping.as_bool()
&& !false_val
{
debug!("META: Skipping disabled output: {}", internal_name);
debug!("META: Skipping disabled output: {internal_name}");
return None;
}
if let Some(custom_name) = mapping.as_str() {
// Convert the value to a string representation
let value_str = match &value {
serde_yaml::Value::Null => "null".to_string(),
serde_yaml::Value::Bool(b) => b.to_string(),
serde_yaml::Value::Number(n) => n.to_string(),
serde_yaml::Value::String(s) => s.clone(),
serde_yaml::Value::Sequence(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
serde_yaml::Value::Mapping(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
serde_yaml::Value::Tagged(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
};
let value_str = yaml_value_to_string(&value);
debug!(
"META: Processing metadata: internal_name={}, custom_name={}, value={}",
internal_name, custom_name, value_str
"META: Processing metadata: internal_name={internal_name}, custom_name={custom_name}, value={value_str}"
);
return Some(MetaData {
name: custom_name.to_string(),
@@ -289,35 +316,31 @@ pub fn process_metadata_outputs(
}
}
// Convert the value to a string representation
let value_str = match &value {
serde_yaml::Value::Null => "null".to_string(),
serde_yaml::Value::Bool(b) => b.to_string(),
serde_yaml::Value::Number(n) => n.to_string(),
serde_yaml::Value::String(s) => s.clone(),
serde_yaml::Value::Sequence(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
serde_yaml::Value::Mapping(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
serde_yaml::Value::Tagged(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
};
let value_str = yaml_value_to_string(&value);
// Default: use internal name as output name
debug!(
"META: Processing metadata: name={}, value={}",
internal_name, value_str
);
debug!("META: Processing metadata: name={internal_name}, value={value_str}");
Some(MetaData {
name: internal_name.to_string(),
value: value_str,
})
}
pub trait MetaPlugin
fn yaml_value_to_string(value: &serde_yaml::Value) -> String {
match value {
serde_yaml::Value::Null => "null".to_string(),
serde_yaml::Value::Bool(b) => b.to_string(),
serde_yaml::Value::Number(n) => n.to_string(),
serde_yaml::Value::String(s) => s.clone(),
serde_yaml::Value::Sequence(_)
| serde_yaml::Value::Mapping(_)
| serde_yaml::Value::Tagged(_) => {
serde_yaml::to_string(value).unwrap_or_else(|_| "".to_string())
}
}
}
pub trait MetaPlugin: Send
where
Self: 'static,
{
@@ -420,19 +443,25 @@ where
///
/// An empty `HashMap` (default implementation).
fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
use once_cell::sync::Lazy;
static EMPTY: Lazy<std::collections::HashMap<String, serde_yaml::Value>> =
Lazy::new(std::collections::HashMap::new);
use std::sync::LazyLock;
static EMPTY: LazyLock<std::collections::HashMap<String, serde_yaml::Value>> =
LazyLock::new(std::collections::HashMap::new);
&EMPTY
}
/// Returns a mutable reference to the outputs mapping.
///
/// # Panics
/// # Returns
///
/// Panics with "outputs_mut() not implemented for this plugin".
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
panic!("outputs_mut() not implemented for this plugin")
/// A mutable reference to the outputs `HashMap`.
///
/// # Errors
///
/// Returns an error if the plugin does not support mutable outputs.
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
anyhow::bail!("outputs_mut() not supported by this plugin")
}
/// Returns a reference to the options mapping.
@@ -441,19 +470,25 @@ where
///
/// An empty `HashMap` (default implementation).
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
use once_cell::sync::Lazy;
static EMPTY: Lazy<std::collections::HashMap<String, serde_yaml::Value>> =
Lazy::new(std::collections::HashMap::new);
use std::sync::LazyLock;
static EMPTY: LazyLock<std::collections::HashMap<String, serde_yaml::Value>> =
LazyLock::new(std::collections::HashMap::new);
&EMPTY
}
/// Returns a mutable reference to the options mapping.
///
/// # Panics
/// # Returns
///
/// Panics with "options_mut() not implemented for this plugin".
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
panic!("options_mut() not implemented for this plugin")
/// A mutable reference to the options `HashMap`.
///
/// # Errors
///
/// Returns an error if the plugin does not support mutable options.
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
anyhow::bail!("options_mut() not supported by this plugin")
}
/// Gets the default output names this plugin can produce.
@@ -466,6 +501,82 @@ where
vec![self.meta_type().to_string()]
}
/// Returns a description of this plugin for display in config templates.
///
/// # Returns
///
/// A description string (empty by default).
fn description(&self) -> &str {
""
}
/// Returns true if this plugin can execute concurrently with other
/// parallel-safe plugins.
///
/// Plugins that do significant per-chunk work (hashing, tokenization,
/// piping to child processes) should return true. The MetaService will
/// run all parallel-safe plugins in separate threads per phase, then
/// process results sequentially.
fn parallel_safe(&self) -> bool {
false
}
/// Builds the schema for this plugin from its options and outputs.
///
/// Default implementation infers option types from YAML values and
/// collects enabled outputs.
///
/// # Returns
///
/// A `PluginSchema` describing this plugin's configuration.
fn schema(&self) -> crate::common::schema::PluginSchema {
use crate::common::schema::{OptionSchema, OptionType, OutputSchema, PluginSchema};
let options: Vec<OptionSchema> = self
.options()
.iter()
.map(|(key, value)| {
let option_type = OptionType::from_yaml_value(value);
let (default, required) = if value.is_null() {
(None, true)
} else {
(Some(value.clone()), false)
};
OptionSchema {
name: key.clone(),
option_type,
default,
required,
}
})
.collect();
let mut outputs: Vec<OutputSchema> = Vec::new();
for (key, value) in self.outputs() {
if !value.is_null() {
outputs.push(OutputSchema {
name: key.clone(),
description: key.clone(),
});
}
}
if outputs.is_empty() {
for output_name in self.default_outputs() {
outputs.push(OutputSchema {
name: output_name.clone(),
description: output_name,
});
}
}
PluginSchema {
name: self.meta_type().to_string(),
description: self.description().to_string(),
options,
outputs,
}
}
/// Method to downcast to concrete type (for checking finalization state).
///
/// # Returns
@@ -477,11 +588,22 @@ where
{
self
}
/// Sets the save_meta callback for this plugin.
///
/// Called by MetaService to wire the plugin to the metadata storage.
fn set_save_meta(&mut self, _save_meta: SaveMetaFn) {}
/// Saves a metadata entry via the save_meta callback.
///
/// Plugins call this during initialize/update/finalize to persist metadata.
fn save_meta(&self, _name: &str, _value: &str) {}
}
/// Global registry for meta plugins.
static META_PLUGIN_REGISTRY: Lazy<Mutex<HashMap<MetaPluginType, PluginConstructor>>> =
Lazy::new(|| Mutex::new(HashMap::new()));
static META_PLUGIN_REGISTRY: std::sync::LazyLock<
Mutex<HashMap<MetaPluginType, PluginConstructor>>,
> = std::sync::LazyLock::new(|| Mutex::new(HashMap::new()));
/// Register a meta plugin with the global registry.
///
@@ -489,23 +611,45 @@ static META_PLUGIN_REGISTRY: Lazy<Mutex<HashMap<MetaPluginType, PluginConstructo
///
/// * `meta_plugin_type` - The type of the meta plugin to register.
/// * `constructor` - The constructor function for creating plugin instances.
pub fn register_meta_plugin(meta_plugin_type: MetaPluginType, constructor: PluginConstructor) {
pub fn register_meta_plugin(
meta_plugin_type: MetaPluginType,
constructor: PluginConstructor,
) -> anyhow::Result<()> {
META_PLUGIN_REGISTRY
.lock()
.unwrap()
.map_err(|e| anyhow::anyhow!("plugin registry poisoned: {e}"))?
.insert(meta_plugin_type, constructor);
Ok(())
}
pub fn get_meta_plugin(
meta_plugin_type: MetaPluginType,
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> Box<dyn MetaPlugin> {
let registry = META_PLUGIN_REGISTRY.lock().unwrap();
) -> anyhow::Result<Box<dyn MetaPlugin>> {
get_meta_plugin_with_save(meta_plugin_type, options, outputs, None)
}
/// Creates a meta plugin instance with an optional save_meta callback.
///
/// If `save_meta` is provided, it is wired to the plugin so it can
/// store metadata directly during initialize/update/finalize.
pub fn get_meta_plugin_with_save(
meta_plugin_type: MetaPluginType,
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
save_meta: Option<SaveMetaFn>,
) -> anyhow::Result<Box<dyn MetaPlugin>> {
let registry = META_PLUGIN_REGISTRY
.lock()
.map_err(|e| anyhow::anyhow!("plugin registry poisoned: {e}"))?;
if let Some(constructor) = registry.get(&meta_plugin_type) {
return constructor(options, outputs);
let mut plugin = constructor(options, outputs);
if let Some(sm) = save_meta {
plugin.set_save_meta(sm);
}
return Ok(plugin);
}
// Fallback for unknown plugins
panic!("Meta plugin {:?} not registered", meta_plugin_type);
anyhow::bail!("Meta plugin {meta_plugin_type:?} not registered")
}

View File

@@ -40,6 +40,7 @@ impl ReadRateMetaPlugin {
/// # Examples
///
/// ```
/// # use keep::meta_plugin::{ReadRateMetaPlugin, MetaPlugin};
/// let plugin = ReadRateMetaPlugin::new(None, None);
/// assert!(!plugin.is_finalized());
/// ```
@@ -83,6 +84,14 @@ impl MetaPlugin for ReadRateMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Finalizes the plugin, calculating the read rate.
///
/// Computes KB/s from bytes read and elapsed time. Outputs via mappings.
@@ -192,8 +201,10 @@ impl MetaPlugin for ReadRateMetaPlugin {
/// # Returns
///
/// Mutable reference to the outputs HashMap.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names for this plugin.
@@ -221,8 +232,10 @@ impl MetaPlugin for ReadRateMetaPlugin {
/// # Returns
///
/// Mutable reference to the options HashMap.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -232,5 +245,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_read_rate_plugin() {
register_meta_plugin(MetaPluginType::ReadRate, |options, outputs| {
Box::new(ReadRateMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register ReadRateMetaPlugin");
}

View File

@@ -37,6 +37,14 @@ impl MetaPlugin for ReadTimeMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn finalize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process again
if self.is_finalized {
@@ -97,8 +105,10 @@ impl MetaPlugin for ReadTimeMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
@@ -109,8 +119,10 @@ impl MetaPlugin for ReadTimeMetaPlugin {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -120,5 +132,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_read_time_plugin() {
register_meta_plugin(MetaPluginType::ReadTime, |options, outputs| {
Box::new(ReadTimeMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register ReadTimeMetaPlugin");
}

View File

@@ -31,6 +31,7 @@ impl ShellMetaPlugin {
/// # Examples
///
/// ```
/// # use keep::meta_plugin::ShellMetaPlugin;
/// let plugin = ShellMetaPlugin::new(None, None);
/// ```
pub fn new(
@@ -69,6 +70,14 @@ impl MetaPlugin for ShellMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Finalizes the plugin without processing data.
///
/// For this plugin, finalization is handled in `initialize`, so this returns empty metadata.
@@ -141,6 +150,7 @@ impl MetaPlugin for ShellMetaPlugin {
/// # Examples
///
/// ```
/// # use keep::meta_plugin::{ShellMetaPlugin, MetaPlugin};
/// let mut plugin = ShellMetaPlugin::new(None, None);
/// let response = plugin.initialize();
/// assert!(response.is_finalized);
@@ -192,8 +202,10 @@ impl MetaPlugin for ShellMetaPlugin {
/// # Returns
///
/// * `&mut HashMap<String, serde_yaml::Value>` - Mutable outputs map.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names for this plugin.
@@ -219,8 +231,10 @@ impl MetaPlugin for ShellMetaPlugin {
/// # Returns
///
/// * `&mut HashMap<String, serde_yaml::Value>` - Mutable options map.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
/// Registers the shell meta plugin with the global registry.
@@ -234,5 +248,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_shell_plugin() {
register_meta_plugin(MetaPluginType::Shell, |options, outputs| {
Box::new(ShellMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register ShellMetaPlugin");
}

View File

@@ -35,6 +35,14 @@ impl MetaPlugin for ShellPidMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn finalize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process again
if self.is_finalized {
@@ -109,16 +117,20 @@ impl MetaPlugin for ShellPidMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -128,5 +140,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_shell_pid_plugin() {
register_meta_plugin(MetaPluginType::ShellPid, |options, outputs| {
Box::new(ShellPidMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register ShellPidMetaPlugin");
}

View File

@@ -510,6 +510,14 @@ impl MetaPlugin for TextMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Updates the plugin with new data chunk.
///
/// Accumulates data for binary detection (if pending) or text statistics.
@@ -532,15 +540,14 @@ impl MetaPlugin for TextMetaPlugin {
}
let mut metadata = Vec::new();
let processed_data = data.to_vec();
// If we haven't determined if content is binary yet, build buffer and check
if self.is_binary_content.is_none() {
let should_finalize = if let Some(ref mut buffer) = self.buffer {
// Add processed data to our buffer up to max_buffer_size
// Add data to our buffer up to max_buffer_size
let remaining_capacity = self.max_buffer_size.saturating_sub(buffer.len());
let bytes_to_take = std::cmp::min(processed_data.len(), remaining_capacity);
buffer.extend_from_slice(&processed_data[..bytes_to_take]);
let bytes_to_take = std::cmp::min(data.len(), remaining_capacity);
buffer.extend_from_slice(&data[..bytes_to_take]);
// If we have enough data to make a binary determination, do it now
let buffer_len = buffer.len();
@@ -562,7 +569,7 @@ impl MetaPlugin for TextMetaPlugin {
}
// If it's text, count words and lines for this chunk
self.count_text_stats(&processed_data[..bytes_to_take]);
self.count_text_stats(&data[..bytes_to_take]);
// If we've reached our buffer limit, drop the buffer to save memory
// But don't finalize yet - we need to keep counting words and lines
@@ -572,7 +579,7 @@ impl MetaPlugin for TextMetaPlugin {
false // Never finalize here for text content
} else {
// Still building up buffer, count words and lines for this chunk
self.count_text_stats(&processed_data[..bytes_to_take]);
self.count_text_stats(&data[..bytes_to_take]);
false
}
} else {
@@ -587,7 +594,7 @@ impl MetaPlugin for TextMetaPlugin {
}
} else if self.is_binary_content == Some(false) {
// We've already determined it's text, just count words and lines
self.count_text_stats(&processed_data);
self.count_text_stats(data);
}
// If is_binary_content == Some(true), we should have already finalized, but just in case:
else if self.is_binary_content == Some(true) {
@@ -653,27 +660,44 @@ impl MetaPlugin for TextMetaPlugin {
if self.is_binary_content.is_none()
&& let Some(buffer) = &self.buffer
&& !buffer.is_empty()
{
let buffer = if head_bytes.is_some()
|| head_lines.is_some()
|| tail_bytes.is_some()
|| tail_lines.is_some()
{
// Build filter string from individual parameters
let mut filter_parts = Vec::new();
if let Some(bytes) = head_bytes {
filter_parts.push(format!("head_bytes({})", bytes));
filter_parts.push(format!("head_bytes({bytes})"));
}
if let Some(lines) = head_lines {
filter_parts.push(format!("head_lines({})", lines));
filter_parts.push(format!("head_lines({lines})"));
}
if let Some(bytes) = tail_bytes {
filter_parts.push(format!("tail_bytes({})", bytes));
filter_parts.push(format!("tail_bytes({bytes})"));
}
if let Some(lines) = tail_lines {
filter_parts.push(format!("tail_lines({})", lines));
filter_parts.push(format!("tail_lines({lines})"));
}
// For now, just use the buffer as-is since filtering isn't implemented
let processed_buffer = buffer.clone();
// Apply filters if any are specified
let filter_string = filter_parts.join(",");
match crate::services::FilterService::new()
.process_with_filter(buffer, Some(&filter_string))
{
Ok(filtered) => filtered,
Err(e) => {
log::warn!("Failed to apply filters: {e}");
buffer.clone()
}
}
} else {
buffer.clone()
};
// Clone the processed buffer data for binary detection
let (binary_metadata, is_binary) = self.perform_binary_detection(&processed_buffer);
let (binary_metadata, is_binary) = self.perform_binary_detection(&buffer);
metadata.extend(binary_metadata);
self.is_binary_content = Some(is_binary);
@@ -753,8 +777,10 @@ impl MetaPlugin for TextMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names for this plugin.
@@ -777,7 +803,7 @@ impl MetaPlugin for TextMetaPlugin {
///
/// # Returns
///
/// A reference to the `HashMap` of options.
/// A reference to the `HashMap` of outputs.
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
@@ -786,9 +812,11 @@ impl MetaPlugin for TextMetaPlugin {
///
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
/// A mutable reference to the `HashMap` of outputs.
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -798,5 +826,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_text_plugin() {
register_meta_plugin(MetaPluginType::Text, |options, outputs| {
Box::new(TextMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register TextMetaPlugin");
}

325
src/meta_plugin/tokens.rs Normal file
View File

@@ -0,0 +1,325 @@
use crate::common::PIPESIZE;
use crate::common::is_binary::is_binary;
use crate::meta_plugin::{MetaPlugin, MetaPluginResponse, MetaPluginType};
use crate::tokenizer::{TokenEncoding, get_tokenizer};
#[derive(Debug, Clone)]
pub struct TokensMetaPlugin {
/// Buffer for binary detection (up to PIPESIZE bytes).
buffer: Option<Vec<u8>>,
max_buffer_size: usize,
is_finalized: bool,
is_binary_content: Option<bool>,
/// Running token count accumulated across chunks.
token_count: usize,
/// UTF-8 boundary carry buffer.
utf8_buffer: Vec<u8>,
base: crate::meta_plugin::BaseMetaPlugin,
/// The tokenizer encoding.
encoding: TokenEncoding,
}
impl TokensMetaPlugin {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> Self {
let mut base = crate::meta_plugin::BaseMetaPlugin::new();
base.initialize_plugin(&["token_count"], &options, &outputs);
// Set default options
let default_options = vec![
(
"token_detect_size",
serde_yaml::Value::Number(PIPESIZE.into()),
),
(
"encoding",
serde_yaml::Value::String("cl100k_base".to_string()),
),
];
for (key, value) in default_options {
if !base.options.contains_key(key) {
base.options.insert(key.to_string(), value);
}
}
let max_buffer_size = base
.options
.get("token_detect_size")
.and_then(|v| v.as_u64())
.unwrap_or(PIPESIZE as u64) as usize;
let encoding = base
.options
.get("encoding")
.and_then(|v| v.as_str())
.and_then(|s| s.parse::<TokenEncoding>().ok())
.unwrap_or_default();
Self {
buffer: Some(Vec::new()),
max_buffer_size,
is_finalized: false,
is_binary_content: None,
token_count: 0,
utf8_buffer: Vec::new(),
base,
encoding,
}
}
/// Tokenize a byte chunk, handling UTF-8 boundaries.
///
/// Combines with any pending UTF-8 carry bytes, converts to text,
/// and adds the token count to the running total.
///
/// Avoids unnecessary allocations when there is no pending UTF-8 carry
/// and the data is valid UTF-8.
fn count_tokens(&mut self, data: &[u8]) {
if data.is_empty() && self.utf8_buffer.is_empty() {
return;
}
let tokenizer = get_tokenizer(self.encoding);
if self.utf8_buffer.is_empty() {
// Fast path: no pending carry — try to use data directly
match std::str::from_utf8(data) {
Ok(text) => {
if !text.is_empty() {
self.token_count += tokenizer.count(text);
}
return;
}
Err(e) => {
let valid_up_to = e.valid_up_to();
if valid_up_to > 0 {
// Count the valid prefix without copying
let text =
std::str::from_utf8(&data[..valid_up_to]).expect("validated prefix");
self.token_count += tokenizer.count(text);
}
// Save invalid trailing bytes for next call
self.utf8_buffer.extend_from_slice(&data[valid_up_to..]);
return;
}
}
}
// Slow path: pending carry bytes — must build combined buffer
let mut combined = std::mem::take(&mut self.utf8_buffer);
combined.extend_from_slice(data);
match std::str::from_utf8(&combined) {
Ok(text) => {
if !text.is_empty() {
self.token_count += tokenizer.count(text);
}
}
Err(e) => {
let valid_up_to = e.valid_up_to();
if valid_up_to > 0 {
let text =
std::str::from_utf8(&combined[..valid_up_to]).expect("validated prefix");
self.token_count += tokenizer.count(text);
}
self.utf8_buffer.extend_from_slice(&combined[valid_up_to..]);
}
}
}
/// Perform binary detection on the buffer.
fn detect_binary(&mut self, buffer: &[u8]) -> bool {
let result = is_binary(buffer);
self.is_binary_content = Some(result);
result
}
}
impl MetaPlugin for TokensMetaPlugin {
fn is_finalized(&self) -> bool {
self.is_finalized
}
fn set_finalized(&mut self, finalized: bool) {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn update(&mut self, data: &[u8]) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let mut metadata = Vec::new();
if self.is_binary_content.is_none() {
// Add data to the buffer
let should_detect = if let Some(ref mut buffer) = self.buffer {
let remaining = self.max_buffer_size.saturating_sub(buffer.len());
let to_take = std::cmp::min(data.len(), remaining);
buffer.extend_from_slice(&data[..to_take]);
buffer.len() >= std::cmp::min(1024, self.max_buffer_size)
} else {
false
};
if should_detect {
let buffer_data = self.buffer.as_ref().unwrap().clone();
let is_binary = self.detect_binary(&buffer_data);
if is_binary {
if let Some(md) = crate::meta_plugin::process_metadata_outputs(
"token_count",
serde_yaml::Value::Null,
self.base.outputs(),
) {
metadata.push(md);
}
self.buffer = None;
self.is_finalized = true;
return MetaPluginResponse {
metadata,
is_finalized: true,
};
}
// It's text — tokenize the full buffer (nothing was counted yet),
// then clear to avoid double-counting in finalize().
self.count_tokens(&buffer_data);
self.buffer = Some(Vec::new());
}
} else if self.is_binary_content == Some(false) {
self.count_tokens(data);
} else if self.is_binary_content == Some(true) {
self.is_finalized = true;
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
MetaPluginResponse {
metadata,
is_finalized: self.is_finalized,
}
}
fn finalize(&mut self) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let mut metadata = Vec::new();
// If binary detection hasn't completed, do it now
if self.is_binary_content.is_none()
&& let Some(buffer) = &self.buffer
&& !buffer.is_empty()
{
let buffer_data = buffer.clone();
let is_binary = self.detect_binary(&buffer_data);
if is_binary {
if let Some(md) = crate::meta_plugin::process_metadata_outputs(
"token_count",
serde_yaml::Value::Null,
self.base.outputs(),
) {
metadata.push(md);
}
self.buffer = None;
self.is_finalized = true;
return MetaPluginResponse {
metadata,
is_finalized: true,
};
}
}
// Tokenize any bytes in the buffer
if let Some(buffer) = &self.buffer {
let data = buffer.clone();
self.count_tokens(&data);
}
// Process any remaining UTF-8 bytes
if !self.utf8_buffer.is_empty() {
self.count_tokens(&[]);
}
// Emit token count
if let Some(md) = crate::meta_plugin::process_metadata_outputs(
"token_count",
serde_yaml::Value::String(self.token_count.to_string()),
self.base.outputs(),
) {
metadata.push(md);
}
self.buffer = None;
self.is_finalized = true;
MetaPluginResponse {
metadata,
is_finalized: true,
}
}
fn meta_type(&self) -> MetaPluginType {
MetaPluginType::Tokens
}
fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs()
}
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
vec!["token_count".to_string()]
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
use crate::meta_plugin::register_meta_plugin;
#[ctor::ctor]
fn register_tokens_plugin() {
register_meta_plugin(MetaPluginType::Tokens, |options, outputs| {
Box::new(TokensMetaPlugin::new(options, outputs))
})
.expect("Failed to register TokensMetaPlugin");
}

View File

@@ -0,0 +1,173 @@
use crate::common::PIPESIZE;
use crate::meta_plugin::{
BaseMetaPlugin, MetaPlugin, MetaPluginResponse, MetaPluginType, process_metadata_outputs,
register_meta_plugin,
};
#[derive(Debug, Default)]
pub struct TreeMagicMiniMetaPlugin {
buffer: Vec<u8>,
max_buffer_size: usize,
is_finalized: bool,
base: BaseMetaPlugin,
}
impl TreeMagicMiniMetaPlugin {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> TreeMagicMiniMetaPlugin {
let mut base = BaseMetaPlugin::new();
if let Some(opts) = options {
for (key, value) in opts {
base.options.insert(key, value);
}
}
let max_buffer_size = base
.options
.get("max_buffer_size")
.and_then(|v| v.as_u64())
.unwrap_or(PIPESIZE as u64) as usize;
base.outputs.insert(
"tree_magic_mime_type".to_string(),
serde_yaml::Value::String("tree_magic_mime_type".to_string()),
);
if let Some(outs) = outputs {
for (key, value) in outs {
base.outputs.insert(key, value);
}
}
TreeMagicMiniMetaPlugin {
buffer: Vec::new(),
max_buffer_size,
is_finalized: false,
base,
}
}
}
impl MetaPlugin for TreeMagicMiniMetaPlugin {
fn meta_type(&self) -> MetaPluginType {
MetaPluginType::TreeMagicMini
}
fn is_finalized(&self) -> bool {
self.is_finalized
}
fn set_finalized(&mut self, finalized: bool) {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn update(&mut self, data: &[u8]) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let remaining = self.max_buffer_size.saturating_sub(self.buffer.len());
let to_add = &data[..data.len().min(remaining)];
self.buffer.extend_from_slice(to_add);
if self.buffer.len() >= self.max_buffer_size {
let mime_type = tree_magic_mini::from_u8(&self.buffer);
self.is_finalized = true;
let metadata = process_metadata_outputs(
"tree_magic_mime_type",
serde_yaml::Value::String(mime_type.to_string()),
self.base.outputs(),
)
.map(|m| vec![m])
.unwrap_or_default();
return MetaPluginResponse {
metadata,
is_finalized: true,
};
}
MetaPluginResponse {
metadata: Vec::new(),
is_finalized: false,
}
}
fn finalize(&mut self) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let mime_type = tree_magic_mini::from_u8(&self.buffer);
self.is_finalized = true;
let metadata = process_metadata_outputs(
"tree_magic_mime_type",
serde_yaml::Value::String(mime_type.to_string()),
self.base.outputs(),
)
.map(|m| vec![m])
.unwrap_or_default();
MetaPluginResponse {
metadata,
is_finalized: true,
}
}
fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs()
}
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
vec!["tree_magic_mime_type".to_string()]
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
#[ctor::ctor]
fn register_tree_magic_mini_plugin() {
register_meta_plugin(MetaPluginType::TreeMagicMini, |options, outputs| {
Box::new(TreeMagicMiniMetaPlugin::new(options, outputs))
})
.expect("Failed to register TreeMagicMiniMetaPlugin");
}

View File

@@ -105,6 +105,14 @@ impl MetaPlugin for UserMetaPlugin {
MetaPluginType::User
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Returns a reference to the outputs mapping.
///
/// # Returns
@@ -119,8 +127,10 @@ impl MetaPlugin for UserMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names.
@@ -151,8 +161,10 @@ impl MetaPlugin for UserMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -162,5 +174,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_user_plugin() {
register_meta_plugin(MetaPluginType::User, |options, outputs| {
Box::new(UserMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register UserMetaPlugin");
}

View File

@@ -0,0 +1,21 @@
use crate::client::KeepClient;
use clap::Command;
use log::debug;
pub fn mode(
client: &KeepClient,
_cmd: &mut Command,
settings: &crate::config::Settings,
ids: &[i64],
) -> Result<(), anyhow::Error> {
debug!("CLIENT_DELETE: Deleting items via remote server");
for &id in ids {
client.delete_item(id)?;
if !settings.quiet {
eprintln!("Deleted item {id}");
}
}
Ok(())
}

24
src/modes/client/diff.rs Normal file
View File

@@ -0,0 +1,24 @@
use crate::client::KeepClient;
use clap::Command;
use log::debug;
pub fn mode(
client: &KeepClient,
_cmd: &mut Command,
_settings: &crate::config::Settings,
ids: &[i64],
) -> Result<(), anyhow::Error> {
debug!("CLIENT_DIFF: Getting diff via remote server");
if ids.len() != 2 {
return Err(anyhow::anyhow!("Diff requires exactly 2 item IDs"));
}
let diff_lines = client.diff_items(ids[0], ids[1])?;
for line in &diff_lines {
println!("{line}");
}
Ok(())
}

View File

@@ -0,0 +1,77 @@
use anyhow::{Context, Result, anyhow};
use chrono::Utc;
use clap::Command;
use log::debug;
use std::collections::HashMap;
use std::fs;
use crate::client::KeepClient;
use crate::common::sanitize_ts_string;
use crate::config;
/// Export items to a `.keep.tar` archive via client.
///
/// Sends a request to the server's `/api/export` endpoint and
/// streams the response to a local tar file.
pub fn mode(
client: &KeepClient,
cmd: &mut Command,
settings: &config::Settings,
ids: &[i64],
tags: &[String],
) -> Result<()> {
// Validate: IDs XOR tags
if !ids.is_empty() && !tags.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Cannot use both IDs and tags with --export",
)
.exit();
}
if ids.is_empty() && tags.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Must provide either IDs or tags with --export",
)
.exit();
}
// We need to resolve items on the server to compute the filename.
// First, get the item info to build the filename template variables.
// For the tar filename, we use {name}_{ts}.keep.tar where name comes from
// --export-name or default export_<common-tags>.
let dir_name = if let Some(ref name) = settings.export_name {
name.clone()
} else {
"export".to_string()
};
let now = Utc::now();
let ts_str = sanitize_ts_string(&now.format("%Y-%m-%dT%H:%M:%SZ").to_string());
let mut vars = HashMap::new();
vars.insert("name".to_string(), dir_name);
vars.insert("ts".to_string(), ts_str);
let basename = strfmt::strfmt(&settings.export_filename_format, &vars).map_err(|e| {
anyhow!(
"Invalid export filename format '{}': {}",
settings.export_filename_format,
e
)
})?;
let tar_filename = format!("{basename}.keep.tar");
client
.export_items_to_file(ids, tags, std::path::Path::new(&tar_filename))
.map_err(|e| anyhow!("Export failed: {e}"))?;
if !settings.quiet {
eprintln!("{tar_filename}");
}
debug!("CLIENT_EXPORT: Wrote items to {tar_filename}");
Ok(())
}

75
src/modes/client/get.rs Normal file
View File

@@ -0,0 +1,75 @@
use crate::client::KeepClient;
use crate::compression_engine::CompressionType;
use crate::filter_plugin::FilterChain;
use crate::modes::common::{check_binary_tty, resolve_item_id};
use crate::services::compression_service::CompressionService;
use anyhow::Result;
use clap::Command;
use log::debug;
use std::io::{Read, Write};
use std::str::FromStr;
pub fn mode(
client: &KeepClient,
cmd: &mut Command,
settings: &crate::config::Settings,
ids: &[i64],
tags: &[String],
filter_chain: Option<FilterChain>,
) -> Result<(), anyhow::Error> {
debug!("CLIENT_GET: Getting item via remote server");
if !ids.is_empty() && !tags.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Both ID and tags given, you must supply either IDs or tags when using --get",
)
.exit();
}
let item_id = resolve_item_id(client, ids, tags)?;
// Get item info for metadata
let item_info = client.get_item_info(item_id)?;
let metadata = &item_info.metadata;
// Get streaming reader for raw content
let (reader, compression) = client.get_item_content_stream(item_id)?;
let compression_type = CompressionType::from_str(&compression).unwrap_or(CompressionType::Raw);
// Decompress through streaming readers
let mut decompressed_reader: Box<dyn Read> =
CompressionService::decompressing_reader(reader, &compression_type)?;
// Binary detection: sample first chunk
let mut sample_buf = [0u8; crate::common::PIPESIZE];
let sample_len = decompressed_reader.read(&mut sample_buf)?;
check_binary_tty(metadata, &sample_buf[..sample_len], settings.force)?;
// If filters present, buffer through filter chain; otherwise stream directly
if let Some(mut chain) = filter_chain {
// Apply filter to sample first, then remaining
let mut output = Vec::new();
chain.filter(&mut &sample_buf[..sample_len], &mut output)?;
crate::common::stream_copy(&mut decompressed_reader, |chunk| {
chain.filter(&mut std::io::Cursor::new(chunk), &mut output)?;
Ok(())
})?;
let stdout = std::io::stdout();
let mut stdout = stdout.lock();
stdout.write_all(&output)?;
stdout.flush()?;
} else {
// Stream decompressed content to stdout
let stdout = std::io::stdout();
let mut stdout = stdout.lock();
stdout.write_all(&sample_buf[..sample_len])?;
crate::common::stream_copy(&mut decompressed_reader, |chunk| {
stdout.write_all(chunk)?;
Ok(())
})?;
stdout.flush()?;
}
Ok(())
}

160
src/modes/client/import.rs Normal file
View File

@@ -0,0 +1,160 @@
use anyhow::{Context, Result, anyhow};
use clap::Command;
use log::debug;
use std::collections::HashMap;
use std::fs;
use std::io::Read;
use std::path::Path;
use crate::client::KeepClient;
use crate::compression_engine::CompressionType;
use crate::config;
use crate::modes::common::ImportMeta;
use std::str::FromStr;
/// Import items from a `.keep.tar` archive or legacy `.meta.yml` file via client.
///
/// For `.keep.tar` files, streams the archive to the server's `/api/import` endpoint.
/// For `.meta.yml` files, uses the legacy single-item import path.
pub fn mode(
client: &KeepClient,
cmd: &mut Command,
settings: &config::Settings,
import_path: &str,
) -> Result<()> {
if import_path.ends_with(".keep.tar") {
import_tar(client, cmd, settings, import_path)
} else if import_path.ends_with(".meta.yml") {
import_legacy(client, cmd, settings, import_path)
} else {
cmd.error(
clap::error::ErrorKind::InvalidValue,
format!("Unsupported import format: {}", import_path),
)
.exit();
}
}
/// Import from a `.keep.tar` archive via the server API.
fn import_tar(
client: &KeepClient,
_cmd: &mut Command,
settings: &config::Settings,
tar_path: &str,
) -> Result<()> {
let path = Path::new(tar_path);
let imported_ids = client
.import_tar_file(path)
.map_err(|e| anyhow!("Import failed: {e}"))?;
if !settings.quiet {
println!(
"KEEP: Imported {} item(s): {:?}",
imported_ids.len(),
imported_ids
);
}
debug!(
"CLIENT_IMPORT: Imported {} items from {}",
imported_ids.len(),
tar_path
);
Ok(())
}
/// Legacy single-item import from a `.meta.yml` file.
fn import_legacy(
client: &KeepClient,
cmd: &mut Command,
settings: &config::Settings,
meta_file: &str,
) -> Result<()> {
// Read and parse metadata
let meta_yaml = fs::read_to_string(meta_file)
.with_context(|| format!("Cannot read metadata file: {meta_file}"))?;
let import_meta: ImportMeta = serde_yaml::from_str(&meta_yaml)
.with_context(|| format!("Cannot parse metadata file: {meta_file}"))?;
// Validate compression type
CompressionType::from_str(&import_meta.compression).map_err(|_| {
anyhow!(
"Invalid compression type '{}' in metadata file",
import_meta.compression
)
})?;
debug!(
"CLIENT_IMPORT: Parsed meta: ts={}, compression={}, tags={:?}",
import_meta.ts, import_meta.compression, import_meta.tags
);
// Build query parameters
let ts_str = import_meta.ts.to_rfc3339();
let params = [
("compress".to_string(), "false".to_string()),
("meta".to_string(), "false".to_string()),
("tags".to_string(), import_meta.tags.join(",")),
(
"compression_type".to_string(),
import_meta.compression.clone(),
),
("ts".to_string(), ts_str),
];
let param_refs: Vec<(&str, &str)> = params
.iter()
.map(|(k, v)| (k.as_str(), v.as_str()))
.collect();
// Stream data to server without buffering entire file
let item_info = if let Some(ref data_file) = settings.import_data_file {
let mut reader = fs::File::open(data_file)
.with_context(|| format!("Cannot read data file: {}", data_file.display()))?;
client.post_stream("/api/item/", &mut reader, &param_refs)?
} else {
// For stdin, we need to buffer since stdin can't be seeked
// and post_stream may need to retry.
let mut buf = Vec::new();
std::io::stdin()
.read_to_end(&mut buf)
.context("Cannot read data from stdin")?;
if buf.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"No data provided (empty stdin)",
)
.exit();
}
let mut cursor = std::io::Cursor::new(&buf);
client.post_stream("/api/item/", &mut cursor, &param_refs)?
};
let item_id = item_info.id;
debug!("CLIENT_IMPORT: Created item {} via server", item_id);
// Set uncompressed size if known from metadata
if let Some(size) = import_meta.uncompressed_size {
client.set_item_size(item_id, size as u64)?;
debug!("CLIENT_IMPORT: Set size to {}", size);
}
// Post metadata
if !import_meta.metadata.is_empty() {
client.post_metadata(item_id, &import_meta.metadata)?;
debug!(
"CLIENT_IMPORT: Set {} metadata entries",
import_meta.metadata.len()
);
}
if !settings.quiet {
println!(
"KEEP: Imported item {} tags: {:?}",
item_id, import_meta.tags
);
}
Ok(())
}

52
src/modes/client/info.rs Normal file
View File

@@ -0,0 +1,52 @@
use crate::client::KeepClient;
use crate::modes::common::{
DisplayItemInfo, OutputFormat, format_size, render_item_info_table, resolve_item_ids,
settings_output_format,
};
use clap::Command;
use log::debug;
pub fn mode(
client: &KeepClient,
_cmd: &mut Command,
settings: &crate::config::Settings,
ids: &[i64],
tags: &[String],
) -> Result<(), anyhow::Error> {
debug!("CLIENT_INFO: Getting item info via remote server");
let output_format = settings_output_format(settings);
let item_ids = resolve_item_ids(client, ids, tags)?;
for &id in &item_ids {
let item = client.get_item_info(id)?;
match output_format {
OutputFormat::Json | OutputFormat::Yaml => {
crate::modes::common::print_serialized(&item, &output_format)?;
}
OutputFormat::Table => {
let display = DisplayItemInfo {
id: item.id,
timestamp: item.ts.clone(),
path: String::new(),
stream_size: item
.uncompressed_size
.map(|s| format_size(s as u64, settings.human_readable))
.unwrap_or_else(|| "N/A".to_string()),
compression: item.compression.clone(),
file_size: String::new(),
tags: item.tags.clone(),
metadata: item
.metadata
.iter()
.map(|(k, v)| (k.clone(), v.clone()))
.collect(),
};
render_item_info_table(&display, &settings.table_config);
}
}
}
Ok(())
}

71
src/modes/client/list.rs Normal file
View File

@@ -0,0 +1,71 @@
use crate::client::KeepClient;
use crate::modes::common::{
ColumnType, OutputFormat, format_size, render_list_table_with_format, settings_output_format,
};
use clap::Command;
use log::debug;
use std::str::FromStr;
pub fn mode(
client: &KeepClient,
_cmd: &mut Command,
settings: &crate::config::Settings,
ids: &[i64],
tags: &[String],
) -> Result<(), anyhow::Error> {
debug!("CLIENT_LIST: Listing items via remote server");
let items = client.list_items(ids, tags, "newest", 0, 100, &settings.meta_filter())?;
if settings.ids_only {
for item in &items {
println!("{}", item.id);
}
return Ok(());
}
let output_format = settings_output_format(settings);
match output_format {
OutputFormat::Json | OutputFormat::Yaml => {
crate::modes::common::print_serialized(&items, &output_format)?;
}
OutputFormat::Table => {
let rows: Vec<Vec<String>> = items
.iter()
.map(|item| {
let mut row = Vec::new();
for column in &settings.list_format {
let col_type = ColumnType::from_str(&column.name).ok();
let cell = match col_type {
Some(ColumnType::Id) => item.id.to_string(),
Some(ColumnType::Time) => item.ts.clone(),
Some(ColumnType::Size) => item
.uncompressed_size
.map(|s| format_size(s as u64, settings.human_readable))
.unwrap_or_default(),
Some(ColumnType::Compression) => item.compression.clone(),
Some(ColumnType::Tags) => item.tags.join(" "),
Some(ColumnType::Meta) => {
let meta_key = column.name.strip_prefix("meta:");
match meta_key {
Some(key) => {
item.metadata.get(key).cloned().unwrap_or_default()
}
None => String::new(),
}
}
_ => String::new(),
};
row.push(cell);
}
row
})
.collect();
render_list_table_with_format(&settings.list_format, &rows, &settings.table_config);
}
}
Ok(())
}

10
src/modes/client/mod.rs Normal file
View File

@@ -0,0 +1,10 @@
pub mod delete;
pub mod diff;
pub mod export;
pub mod get;
pub mod import;
pub mod info;
pub mod list;
pub mod save;
pub mod status;
pub mod update;

180
src/modes/client/save.rs Normal file
View File

@@ -0,0 +1,180 @@
use crate::client::KeepClient;
use crate::compression_engine::CompressionType;
use crate::config::Settings;
use crate::meta_plugin::SaveMetaFn;
use crate::modes::common::settings_compression_type;
use crate::services::ItemInfo;
use crate::services::compression_service::CompressionService;
use crate::services::meta_service::MetaService;
use anyhow::Result;
use clap::Command;
use is_terminal::IsTerminal;
use log::debug;
use std::collections::HashMap;
use std::io::{Read, Write};
use std::sync::{Arc, Mutex};
/// Streaming save mode for client.
///
/// Uses three threads for true streaming with constant memory:
/// - Reader thread: reads stdin, tees to stdout, runs meta plugins,
/// compresses data, writes to OS pipe
/// - Pipe: zero-copy transfer of compressed bytes between threads
/// - Streamer thread: reads from pipe, streams to server via chunked HTTP
///
/// Meta plugins run on the client side during streaming. Collected metadata
/// is sent to the server via a separate POST after streaming completes.
///
/// Memory usage is O(PIPESIZE) regardless of data size.
pub fn mode(
client: &KeepClient,
cmd: &mut Command,
settings: &Settings,
tags: &mut Vec<String>,
metadata: HashMap<String, String>,
) -> Result<(), anyhow::Error> {
debug!("CLIENT_SAVE: Saving item via remote server (streaming)");
crate::modes::common::ensure_default_tag(tags);
// Determine compression type from settings
let compression_type = settings_compression_type(cmd, settings);
let compression_type_str = compression_type.to_string();
// In client mode, the client always handles compression (even "raw").
// The server should never re-compress client data.
let server_compress = false;
// Shared metadata collection: plugins write here via save_meta closure
let collected_meta: Arc<Mutex<HashMap<String, String>>> = Arc::new(Mutex::new(HashMap::new()));
let meta_collector = collected_meta.clone();
let save_meta: SaveMetaFn = Arc::new(Mutex::new(move |name: &str, value: &str| {
if let Ok(mut map) = meta_collector.lock() {
map.insert(name.to_string(), value.to_string());
}
}));
// Create MetaService and get plugins (must happen before spawning reader thread)
let meta_service = MetaService::new(save_meta);
let mut plugins = meta_service.get_plugins(cmd, settings);
// Create OS pipe for streaming compressed bytes between threads
let (pipe_reader, pipe_writer) = os_pipe::pipe()?;
// Reader thread: stdin → tee(stdout) → meta plugins → compress → pipe
let compression_type_clone = compression_type.clone();
let reader_handle = std::thread::spawn(move || -> Result<u64> {
let stdin = std::io::stdin();
let stdout = std::io::stdout();
let mut stdin_lock = stdin.lock();
let mut stdout_lock = stdout.lock();
let mut total_bytes = 0u64;
let mut buffer = [0u8; 8192];
// Initialize meta plugins
meta_service.initialize_plugins(&mut plugins);
// Wrap pipe writer with appropriate compression
let mut compressor: Box<dyn Write> =
CompressionService::compressing_writer(Box::new(pipe_writer), &compression_type_clone)?;
loop {
let n = stdin_lock.read(&mut buffer)?;
if n == 0 {
break;
}
// Tee to stdout
stdout_lock.write_all(&buffer[..n])?;
// Feed chunk to meta plugins
meta_service.process_chunk(&mut plugins, &buffer[..n]);
total_bytes += n as u64;
// Compress and write to pipe
compressor.write_all(&buffer[..n])?;
}
// Finalize meta plugins (digest, text, tokens produce final output here)
meta_service.finalize_plugins(&mut plugins);
// Explicitly flush and finalize compression before dropping.
compressor.flush()?;
drop(compressor);
Ok(total_bytes)
});
// Streamer thread: reads compressed bytes from pipe → POST to server
let client_url = client.base_url().to_string();
let client_username = client.username().cloned();
let client_password = client.password().cloned();
let client_jwt = client.jwt().cloned();
let tags_clone = tags.clone();
let compression_type_str_clone = compression_type_str.clone();
let streamer_handle = std::thread::spawn(move || -> Result<ItemInfo> {
let streaming_client =
KeepClient::new(&client_url, client_username, client_password, client_jwt)?;
let params = [
("compress".to_string(), server_compress.to_string()),
("meta".to_string(), "false".to_string()),
("tags".to_string(), tags_clone.join(",")),
// Always send compression_type when compress=false (client handled compression)
("compression_type".to_string(), compression_type_str_clone),
];
// Filter out empty params
let params: Vec<(String, String)> =
params.into_iter().filter(|(_, v)| !v.is_empty()).collect();
let param_refs: Vec<(&str, &str)> = params
.iter()
.map(|(k, v)| (k.as_str(), v.as_str()))
.collect();
let mut reader: Box<dyn Read> = Box::new(pipe_reader);
let item_info = streaming_client.post_stream("/api/item/", &mut reader, &param_refs)?;
Ok(item_info)
});
// Wait for streaming to complete, capture item info
let item_info = streamer_handle
.join()
.map_err(|e| anyhow::anyhow!("Streamer thread panicked: {:?}", e))??;
// Wait for reader thread (should complete quickly after pipe is drained)
let uncompressed_size = reader_handle
.join()
.map_err(|e| anyhow::anyhow!("Reader thread panicked: {:?}", e))??;
// Merge plugin-collected metadata with CLI metadata
let mut local_metadata = metadata;
// Add plugin-collected metadata (digest, hostname, text stats, etc.)
if let Ok(plugin_meta) = collected_meta.lock() {
for (k, v) in plugin_meta.iter() {
local_metadata.entry(k.clone()).or_insert_with(|| v.clone());
}
}
// Send uncompressed size to server (proper field, not metadata)
client.set_item_size(item_info.id, uncompressed_size)?;
// Send metadata to server
if !local_metadata.is_empty() {
client.post_metadata(item_info.id, &local_metadata)?;
}
// Print status to stderr (item ID is known immediately from server response)
if !settings.quiet {
if std::io::stderr().is_terminal() {
eprintln!("KEEP: New item: {} tags: {}", item_info.id, tags.join(" "));
} else {
eprintln!("KEEP: New item: {} tags: {tags:?}", item_info.id);
}
}
debug!("CLIENT_SAVE: Streaming complete, {uncompressed_size} bytes uncompressed");
Ok(())
}

View File

@@ -0,0 +1,91 @@
use crate::client::KeepClient;
use crate::modes::common::OutputFormat;
use crate::modes::common::settings_output_format;
use clap::Command;
use comfy_table::{Attribute, Cell, Table};
use log::debug;
pub fn mode(
client: &KeepClient,
_cmd: &mut Command,
settings: &crate::config::Settings,
) -> Result<(), anyhow::Error> {
debug!("CLIENT_STATUS: Getting status from remote server");
let status_info = client.get_status()?;
let output_format = settings_output_format(settings);
match output_format {
OutputFormat::Json | OutputFormat::Yaml => {
crate::modes::common::print_serialized(&status_info, &output_format)?;
}
OutputFormat::Table => {
// Paths
let path_table =
crate::modes::common::build_path_table(&status_info.paths, &settings.table_config);
println!("PATHS:");
println!(
"{}",
crate::modes::common::trim_lines_end(&path_table.trim_fmt())
);
println!();
// Configured meta plugins
if let Some(ref configured) = status_info.configured_meta_plugins
&& !configured.is_empty()
{
let mut sorted = configured.clone();
sorted.sort_by(|a, b| a.name.cmp(&b.name));
let mut table =
crate::modes::common::create_table_with_config(&settings.table_config);
table.set_header(vec![
Cell::new("Plugin Name").add_attribute(Attribute::Bold),
Cell::new("Enabled").add_attribute(Attribute::Bold),
]);
for plugin in &sorted {
let enabled = status_info.enabled_meta_plugins.contains(&plugin.name);
table.add_row(vec![
plugin.name.clone(),
if enabled { "Yes" } else { "No" }.to_string(),
]);
}
println!("META PLUGINS:");
println!(
"{}",
crate::modes::common::trim_lines_end(&table.trim_fmt())
);
println!();
}
// Compression
if !status_info.compression.is_empty() {
let mut table =
crate::modes::common::create_table_with_config(&settings.table_config);
table.set_header(vec![
Cell::new("Type").add_attribute(Attribute::Bold),
Cell::new("Found").add_attribute(Attribute::Bold),
Cell::new("Default").add_attribute(Attribute::Bold),
Cell::new("Binary").add_attribute(Attribute::Bold),
]);
for comp in &status_info.compression {
table.add_row(vec![
comp.compression_type.clone(),
if comp.found { "Yes" } else { "No" }.to_string(),
if comp.default { "Yes" } else { "No" }.to_string(),
comp.binary.clone(),
]);
}
println!("COMPRESSION:");
println!(
"{}",
crate::modes::common::trim_lines_end(&table.trim_fmt())
);
println!();
}
}
}
Ok(())
}

102
src/modes/client/update.rs Normal file
View File

@@ -0,0 +1,102 @@
use crate::client::KeepClient;
use crate::config::Settings;
use anyhow::Result;
use clap::Command;
use log::debug;
use std::collections::HashMap;
/// Client update mode: runs meta plugins on the server for an existing item.
///
/// Sends the list of plugin names (from --meta-plugin config) and any direct
/// metadata (--meta key=value) to the server. The server reads the stored file,
/// runs the specified plugins, and stores the results.
pub fn mode(
client: &KeepClient,
cmd: &mut Command,
settings: &Settings,
ids: &mut [i64],
tags: &mut [String],
) -> Result<(), anyhow::Error> {
debug!("CLIENT_UPDATE: Updating item via remote server");
if ids.len() != 1 {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"--update requires exactly one numeric ID",
)
.exit();
}
let item_id = ids[0];
// Collect plugin names from settings (--meta-plugin config)
let plugin_names: Vec<String> = settings
.meta_plugins_names()
.into_iter()
.flat_map(|s| {
s.split(',')
.map(|p| p.trim().to_string())
.collect::<Vec<_>>()
})
.filter(|p| !p.is_empty())
.collect();
// Collect direct metadata from --meta flags
let metadata: HashMap<String, String> = settings
.meta
.iter()
.filter_map(|(k, v)| v.as_ref().map(|val| (k.clone(), val.clone())))
.collect();
// Build query params
let mut params: Vec<(String, String)> = Vec::new();
if !plugin_names.is_empty() {
params.push(("plugins".to_string(), plugin_names.join(",")));
}
if !metadata.is_empty() {
let meta_json = serde_json::to_string(&metadata)?;
params.push(("metadata".to_string(), meta_json));
}
if !tags.is_empty() {
params.push(("tags".to_string(), tags.join(",")));
}
// Nothing to update
if params.is_empty() {
if !settings.quiet {
eprintln!("KEEP: No changes specified for item {item_id}");
}
return Ok(());
}
let param_refs: Vec<(&str, &str)> = params
.iter()
.map(|(k, v)| (k.as_str(), v.as_str()))
.collect();
let url_path = format!("/api/item/{item_id}/update");
// POST to update endpoint
let _item_info = client.post_bytes(&url_path, &[], &param_refs)?;
if !settings.quiet {
let mut parts = Vec::new();
if !plugin_names.is_empty() {
parts.push(format!("plugins: {}", plugin_names.join(", ")));
}
if !metadata.is_empty() {
parts.push(format!("{} metadata", metadata.len()));
}
if !tags.is_empty() {
parts.push(format!("tags: {}", tags.join(" ")));
}
let action = parts.join(", ");
eprintln!("KEEP: Updated item {item_id} ({action})");
}
Ok(())
}

View File

@@ -1,3 +1,4 @@
use crate::common::status::PathInfo;
use crate::compression_engine::CompressionType;
/// Common utilities shared across different modes in the Keep application.
///
@@ -9,17 +10,19 @@ use crate::compression_engine::CompressionType;
/// These utilities are typically used internally by mode implementations:
///
/// ```
/// use crate::modes::common::{format_size, OutputFormat};
/// # use keep::modes::common::{format_size, OutputFormat};
/// let formatted = format_size(1024, true); // "1.0K"
/// let format = OutputFormat::from_str("json")?;
/// // let format = OutputFormat::from_str("json")?;
/// ```
use crate::config;
use crate::meta_plugin::MetaPluginType;
use anyhow::{Result, anyhow};
use chrono::{DateTime, Utc};
use clap::Command;
use clap::error::ErrorKind;
use comfy_table::{ContentArrangement, Table};
use comfy_table::{Attribute, Cell, ContentArrangement, Table};
use log::debug;
use regex::Regex;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::env;
use std::io::IsTerminal;
@@ -42,7 +45,8 @@ use strum::IntoEnumIterator;
/// # Examples
///
/// ```
/// use keep::modes::common::OutputFormat;
/// # use keep::modes::common::OutputFormat;
/// # use std::str::FromStr;
/// assert_eq!(OutputFormat::from_str("json").unwrap(), OutputFormat::Json);
/// ```
pub enum OutputFormat {
@@ -51,39 +55,18 @@ pub enum OutputFormat {
Yaml,
}
/// Extracts metadata from KEEP_META_* environment variables.
///
/// Scans environment for variables prefixed with KEEP_META_ and extracts
/// key-value pairs for initial item metadata. Ignores KEEP_META_PLUGINS.
///
/// # Returns
///
/// `HashMap<String, String>` - Metadata from environment variables, with keys in uppercase without prefix.
///
/// # Errors
///
/// None; silently ignores non-matching vars and PLUGINS.
///
/// # Examples
///
/// ```
/// # use std::env;
/// # use std::collections::HashMap;
/// env::set_var("KEEP_META_COMMAND", "ls -la");
/// let meta = get_meta_from_env();
/// assert_eq!(meta.get("COMMAND"), Some(&"ls -la".to_string()));
/// ```
pub const IMPORT_FORMAT_ERROR: &str =
"Unsupported import format: {} (expected .keep.tar or .meta.yml)";
pub fn get_meta_from_env() -> HashMap<String, String> {
debug!("COMMON: Getting meta from KEEP_META_*");
let re = Regex::new(r"^KEEP_META_(.+)$").unwrap();
let mut meta_env: HashMap<String, String> = HashMap::new();
const PREFIX: &str = "KEEP_META_";
for (key, value) in env::vars() {
if let Some(meta_name_caps) = re.captures(key.as_str()) {
let name = String::from(meta_name_caps.get(1).unwrap().as_str());
// Ignore KEEP_META_PLUGINS
if name != "PLUGINS" {
debug!("COMMON: Found meta: {}={}", name.clone(), value.clone());
meta_env.insert(name, value.clone());
if let Some(name) = key.strip_prefix(PREFIX) {
if !name.is_empty() && name != "PLUGINS" {
debug!("COMMON: Found meta: {}={}", name, value);
meta_env.insert(name.to_string(), value);
}
}
}
@@ -106,6 +89,7 @@ pub fn get_meta_from_env() -> HashMap<String, String> {
/// # Examples
///
/// ```
/// # use keep::modes::common::format_size;
/// let raw = format_size(1024, false); // "1024"
/// let human = format_size(1024, true); // "1.0K"
/// ```
@@ -136,7 +120,8 @@ pub fn format_size(size: u64, human_readable: bool) -> String {
/// # Examples
///
/// ```
/// use keep::modes::common::ColumnType;
/// # use keep::modes::common::ColumnType;
/// # use std::str::FromStr;
/// assert_eq!(ColumnType::from_str("id").unwrap(), ColumnType::Id);
/// assert_eq!(ColumnType::from_str("meta:hostname").unwrap(), ColumnType::Meta);
/// ```
@@ -204,9 +189,10 @@ pub fn settings_meta_plugin_types(
// Try to find the MetaPluginType by meta name
let mut found = false;
for meta_plugin_type in MetaPluginType::iter() {
let meta_plugin =
crate::meta_plugin::get_meta_plugin(meta_plugin_type.clone(), None, None);
if meta_plugin.meta_type().to_string() == trimmed_name {
if let Ok(meta_plugin) =
crate::meta_plugin::get_meta_plugin(meta_plugin_type.clone(), None, None)
&& meta_plugin.meta_type().to_string() == trimmed_name
{
meta_plugin_types.push(meta_plugin_type);
found = true;
break;
@@ -216,7 +202,7 @@ pub fn settings_meta_plugin_types(
if !found {
cmd.error(
ErrorKind::InvalidValue,
format!("Unknown meta plugin type: {}", trimmed_name),
format!("Unknown meta plugin type: {trimmed_name}"),
)
.exit();
}
@@ -254,10 +240,7 @@ pub fn settings_compression_type(
if compression_type_opt.is_err() {
cmd.error(
ErrorKind::InvalidValue,
format!(
"Invalid compression algorithm '{}'. Supported algorithms: lz4, gzip, xz, zstd",
compression_name
),
format!("Invalid compression algorithm '{compression_name}'. Supported algorithms: lz4, gzip, xz, zstd"),
)
.exit();
}
@@ -280,8 +263,9 @@ pub fn settings_compression_type(
/// # Examples
///
/// ```
/// let format = settings_output_format(&settings);
/// assert_eq!(format, OutputFormat::Json); // If settings.output_format = Some("json")
/// # use keep::modes::common::{settings_output_format, OutputFormat};
/// // Example usage requires a Settings instance
/// // let format = settings_output_format(&settings);
/// ```
pub fn settings_output_format(settings: &config::Settings) -> OutputFormat {
settings
@@ -306,6 +290,7 @@ pub fn settings_output_format(settings: &config::Settings) -> OutputFormat {
/// # Examples
///
/// ```
/// # use keep::modes::common::trim_lines_end;
/// let cleaned = trim_lines_end("line1 \nline2 ");
/// assert_eq!(cleaned, "line1\nline2");
/// ```
@@ -331,29 +316,12 @@ pub fn trim_lines_end(s: &str) -> String {
/// # Examples
///
/// ```
/// let table = create_table(true);
/// # use keep::modes::common::create_table;
/// let mut table = create_table(true);
/// table.add_row(vec!["Header1", "Header2"]);
/// ```
pub fn create_table(use_styling: bool) -> Table {
let mut table = Table::new();
table.set_content_arrangement(ContentArrangement::Dynamic);
if use_styling {
if std::io::stdout().is_terminal() {
table
.load_preset(comfy_table::presets::UTF8_FULL)
.apply_modifier(comfy_table::modifiers::UTF8_SOLID_INNER_BORDERS);
} else {
table.load_preset(comfy_table::presets::ASCII_FULL);
}
} else {
table.load_preset(comfy_table::presets::NOTHING);
}
if !std::io::stdout().is_terminal() {
table.force_no_tty();
}
table
pub fn create_table(_use_styling: bool) -> Table {
create_table_with_config(&crate::config::TableConfig::default())
}
/// Creates a table configured from application table settings.
@@ -371,6 +339,8 @@ pub fn create_table(use_styling: bool) -> Table {
/// # Examples
///
/// ```
/// # use keep::modes::common::create_table_with_config;
/// # use keep::config::TableConfig;
/// let config = TableConfig::default();
/// let table = create_table_with_config(&config);
/// ```
@@ -442,3 +412,292 @@ pub fn create_table_with_config(table_config: &crate::config::TableConfig) -> Ta
table
}
/// Display data for a single item's detail view (used by --info).
pub struct DisplayItemInfo {
pub id: i64,
pub timestamp: String,
pub path: String,
pub stream_size: String,
pub compression: String,
pub file_size: String,
pub tags: Vec<String>,
pub metadata: Vec<(String, String)>,
}
/// Renders item detail table. Shared by local and client info modes.
pub fn render_item_info_table(info: &DisplayItemInfo, table_config: &config::TableConfig) {
use comfy_table::{Attribute, Cell};
let mut table = create_table_with_config(table_config);
table.add_row(vec![
Cell::new("ID").add_attribute(Attribute::Bold),
Cell::new(info.id.to_string()),
]);
table.add_row(vec![
Cell::new("Time").add_attribute(Attribute::Bold),
Cell::new(&info.timestamp),
]);
table.add_row(vec![
Cell::new("Size").add_attribute(Attribute::Bold),
Cell::new(&info.stream_size),
]);
table.add_row(vec![
Cell::new("Compression").add_attribute(Attribute::Bold),
Cell::new(&info.compression),
]);
table.add_row(vec![
Cell::new("Tags").add_attribute(Attribute::Bold),
Cell::new(info.tags.join(" ")),
]);
for (key, value) in &info.metadata {
table.add_row(vec![
Cell::new(format!("Meta: {key}")).add_attribute(Attribute::Bold),
Cell::new(value),
]);
}
println!("{}", trim_lines_end(&table.trim_fmt()));
}
/// Renders list table with column format from config. Shared by local and client list modes.
pub fn render_list_table_with_format(
columns: &[config::ColumnConfig],
rows: &[Vec<String>],
table_config: &config::TableConfig,
) {
let mut table = create_table_with_config(table_config);
let header_cells: Vec<Cell> = columns
.iter()
.map(|col| Cell::new(&col.label).add_attribute(Attribute::Bold))
.collect();
table.set_header(header_cells);
for row in rows {
let cells: Vec<Cell> = row
.iter()
.enumerate()
.map(|(i, val)| {
let mut cell = Cell::new(val);
if let Some(col) = columns.get(i) {
if let Some(ref fg) = col.fg_color {
cell = apply_color(cell, fg, true);
}
if let Some(ref bg) = col.bg_color {
cell = apply_color(cell, bg, false);
}
for attr in &col.attributes {
cell = apply_table_attribute(cell, attr);
}
}
cell
})
.collect();
table.add_row(cells);
}
println!("{}", trim_lines_end(&table.trim_fmt()));
}
/// Applies config TableColor to a comfy-table Cell.
pub fn apply_color(mut cell: Cell, color: &config::TableColor, is_foreground: bool) -> Cell {
use comfy_table::Color;
let comfy_color = match color {
config::TableColor::Black => Color::Black,
config::TableColor::Red => Color::Red,
config::TableColor::Green => Color::Green,
config::TableColor::Yellow => Color::Yellow,
config::TableColor::Blue => Color::Blue,
config::TableColor::Magenta => Color::Magenta,
config::TableColor::Cyan => Color::Cyan,
config::TableColor::White => Color::White,
config::TableColor::Gray => Color::Grey,
config::TableColor::DarkRed => Color::DarkRed,
config::TableColor::DarkGreen => Color::DarkGreen,
config::TableColor::DarkYellow => Color::DarkYellow,
config::TableColor::DarkBlue => Color::DarkBlue,
config::TableColor::DarkMagenta => Color::DarkMagenta,
config::TableColor::DarkCyan => Color::DarkCyan,
config::TableColor::Rgb(r, g, b) => Color::Rgb {
r: *r,
g: *g,
b: *b,
},
};
if is_foreground {
cell = cell.fg(comfy_color);
} else {
cell = cell.bg(comfy_color);
}
cell
}
/// Ensures tags has at least one entry, adding "none" if empty.
pub fn ensure_default_tag(tags: &mut Vec<String>) {
if tags.is_empty() {
tags.push("none".to_string());
}
}
/// Prints a serializable value in JSON or YAML format based on output format.
///
/// Only handles Json and Yaml variants; Table should be handled separately.
pub fn print_serialized<T: serde::Serialize>(
value: &T,
format: &OutputFormat,
) -> anyhow::Result<()> {
match format {
OutputFormat::Json => println!("{}", serde_json::to_string_pretty(value)?),
OutputFormat::Yaml => println!("{}", serde_yaml::to_string(value)?),
OutputFormat::Table => unreachable!(),
}
Ok(())
}
/// Applies config TableAttribute to a comfy-table Cell.
pub fn apply_table_attribute(mut cell: Cell, attribute: &config::TableAttribute) -> Cell {
match attribute {
config::TableAttribute::Bold => cell = cell.add_attribute(Attribute::Bold),
config::TableAttribute::Dim => cell = cell.add_attribute(Attribute::Dim),
config::TableAttribute::Italic => cell = cell.add_attribute(Attribute::Italic),
config::TableAttribute::Underlined => cell = cell.add_attribute(Attribute::Underlined),
config::TableAttribute::SlowBlink => cell = cell.add_attribute(Attribute::SlowBlink),
config::TableAttribute::RapidBlink => cell = cell.add_attribute(Attribute::RapidBlink),
config::TableAttribute::Reverse => cell = cell.add_attribute(Attribute::Reverse),
config::TableAttribute::Hidden => cell = cell.add_attribute(Attribute::Hidden),
config::TableAttribute::CrossedOut => cell = cell.add_attribute(Attribute::CrossedOut),
}
cell
}
/// Builds a table showing data and database path information.
pub fn build_path_table(path_info: &PathInfo, table_config: &config::TableConfig) -> Table {
let mut path_table = create_table_with_config(table_config);
path_table.set_header(vec![
Cell::new("Type").add_attribute(Attribute::Bold),
Cell::new("Path").add_attribute(Attribute::Bold),
]);
path_table.add_row(vec!["Data", &path_info.data]);
path_table.add_row(vec!["Database", &path_info.database]);
path_table
}
/// Sanitize tags for use in filenames.
///
/// Replaces non-alphanumeric characters with underscores and joins with `_`.
/// Empty tags are filtered out to avoid double underscores.
pub fn sanitize_tags(tags: &[String]) -> String {
tags.iter()
.filter(|t| !t.is_empty())
.map(|t| {
t.chars()
.map(|c| if c.is_alphanumeric() { c } else { '_' })
.collect::<String>()
})
.collect::<Vec<_>>()
.join("_")
}
/// Metadata structure for export to YAML. Shared by local and client export modes.
#[derive(Debug, Serialize)]
pub struct ExportMeta {
pub ts: DateTime<Utc>,
pub compression: String,
pub uncompressed_size: Option<i64>,
pub tags: Vec<String>,
pub metadata: HashMap<String, String>,
}
/// Metadata structure for import from YAML. Shared by local and client import modes.
#[derive(Debug, Deserialize)]
pub struct ImportMeta {
pub ts: DateTime<Utc>,
pub compression: String,
#[serde(default, alias = "size")]
pub uncompressed_size: Option<i64>,
#[serde(default)]
pub tags: Vec<String>,
#[serde(default)]
pub metadata: HashMap<String, String>,
}
/// Resolve a single item ID from explicit IDs, tags, or latest item.
///
/// Returns the first ID if provided, the newest item matching tags,
/// or the newest item overall if neither is specified.
#[cfg(feature = "client")]
pub fn resolve_item_id(
client: &crate::client::KeepClient,
ids: &[i64],
tags: &[String],
) -> Result<i64> {
if !ids.is_empty() {
Ok(ids[0])
} else if !tags.is_empty() {
let items = client.list_items(&[], tags, "newest", 0, 1, &HashMap::new())?;
if items.is_empty() {
return Err(anyhow!("No items found matching tags: {:?}", tags));
}
Ok(items[0].id)
} else {
let items = client.list_items(&[], &[], "newest", 0, 1, &HashMap::new())?;
if items.is_empty() {
return Err(anyhow!("No items found"));
}
Ok(items[0].id)
}
}
/// Resolve item IDs from explicit IDs or tags (multi-item variant).
#[cfg(feature = "client")]
pub fn resolve_item_ids(
client: &crate::client::KeepClient,
ids: &[i64],
tags: &[String],
) -> Result<Vec<i64>> {
if !ids.is_empty() {
Ok(ids.to_vec())
} else if !tags.is_empty() {
let items = client.list_items(&[], tags, "newest", 0, 0, &HashMap::new())?;
if items.is_empty() {
return Err(anyhow!("No items found matching tags: {:?}", tags));
}
Ok(items.into_iter().map(|i| i.id).collect())
} else {
let items = client.list_items(&[], &[], "newest", 0, 1, &HashMap::new())?;
if items.is_empty() {
return Err(anyhow!("No items found"));
}
Ok(vec![items[0].id])
}
}
/// Check if binary content should be blocked from TTY output.
///
/// Uses metadata `text` field as fast path, then falls back to byte sampling.
/// Returns Err if content is binary and should not be displayed.
pub fn check_binary_tty(
metadata: &HashMap<String, String>,
data_sample: &[u8],
force: bool,
) -> Result<()> {
if force || !std::io::stdout().is_terminal() {
return Ok(());
}
if crate::common::is_binary::is_content_binary_from_metadata(metadata, data_sample) {
return Err(anyhow!(
"Refusing to output binary data to TTY, use --force to override"
));
}
Ok(())
}

View File

@@ -36,7 +36,7 @@ use rusqlite::Connection;
///
/// # Examples
///
/// ```
/// ```ignore
/// // This would be called from main after parsing args
/// mode_delete(&mut cmd, &settings, &config, &mut vec![1, 2], &mut vec![], &mut conn, data_path)?;
/// ```
@@ -66,8 +66,9 @@ pub fn mode_delete(
warn!("Unable to find item {item_id} in database");
}
_ => {
return Err(anyhow::Error::from(e)
.context(format!("Failed to delete item {}", item_id)));
return Err(
anyhow::Error::from(e).context(format!("Failed to delete item {item_id}"))
);
}
},
}

View File

@@ -1,18 +1,21 @@
use crate::config;
use crate::services::item_service::ItemService;
/// Diff mode implementation.
///
/// This module provides functionality for comparing two items and displaying their
/// differences using external diff tools.
use anyhow::{Context, Result};
/// differences using external diff tools. Decompressed content is streamed to diff
/// via pipes and /dev/fd file descriptors — no temporary files are created.
use crate::config;
use crate::services::compression_service::CompressionService;
use crate::services::item_service::ItemService;
use anyhow::{Context, Result, anyhow};
use clap::Command;
use command_fds::{CommandFdExt, FdMapping};
use log::debug;
use nix::fcntl::OFlag;
use nix::unistd::pipe2;
use std::io::Read;
use std::os::unix::io::{AsRawFd, OwnedFd};
fn validate_diff_args(
_cmd: &mut Command,
ids: &Vec<i64>,
tags: &Vec<String>,
) -> anyhow::Result<()> {
fn validate_diff_args(_cmd: &mut Command, ids: &[i64], tags: &[String]) -> anyhow::Result<()> {
if !tags.is_empty() {
return Err(anyhow::anyhow!(
"Tags are not supported with --diff. Please provide exactly two IDs."
@@ -27,19 +30,6 @@ fn validate_diff_args(
}
/// Fetches and validates items from the database for diff operation.
///
/// This function retrieves two items by their IDs from the database using the
/// item service, which handles validation, and returns them as a tuple.
///
/// # Arguments
///
/// * `conn` - Mutable reference to the database connection.
/// * `ids` - Vector of item IDs to fetch.
/// * `item_service` - Reference to the item service for validation.
///
/// # Returns
///
/// * `Result<(ItemWithMeta, ItemWithMeta)>` - Tuple of items with metadata or error.
fn fetch_and_validate_items(
conn: &mut rusqlite::Connection,
ids: &[i64],
@@ -48,7 +38,6 @@ fn fetch_and_validate_items(
crate::services::types::ItemWithMeta,
crate::services::types::ItemWithMeta,
)> {
// Fetch items using the service, which handles validation
let item_a = item_service
.get_item(conn, ids[0])
.with_context(|| format!("Unable to find first item (ID: {}) in database", ids[0]))?;
@@ -56,48 +45,12 @@ fn fetch_and_validate_items(
.get_item(conn, ids[1])
.with_context(|| format!("Unable to find second item (ID: {}) in database", ids[1]))?;
debug!("MAIN: Found item A {:?}", item_a.item);
debug!("MAIN: Found item B {:?}", item_b.item);
debug!("DIFF: Found item A {:?}", item_a.item);
debug!("DIFF: Found item B {:?}", item_b.item);
Ok((item_a, item_b))
}
/// Sets up file paths and compression for diff operation.
///
/// This function constructs the file paths for the two items and prepares the
/// compression engines needed for reading their contents.
///
/// # Arguments
///
/// * `item_service` - Reference to the item service.
/// * `item_a` - First item with metadata.
/// * `item_b` - Second item with metadata.
///
/// # Returns
///
/// * `Result<(PathBuf, PathBuf)>` - Tuple of item file paths or error.
fn setup_diff_paths_and_compression(
item_service: &ItemService,
item_a: &crate::services::types::ItemWithMeta,
item_b: &crate::services::types::ItemWithMeta,
) -> Result<(std::path::PathBuf, std::path::PathBuf)> {
let item_a_id = item_a
.item
.id
.ok_or_else(|| anyhow::anyhow!("Item A missing ID"))?;
let item_b_id = item_b
.item
.id
.ok_or_else(|| anyhow::anyhow!("Item B missing ID"))?;
// Use the service's data path to construct proper file paths
let data_path = item_service.get_data_path();
let item_a_path = data_path.join(item_a_id.to_string());
let item_b_path = data_path.join(item_b_id.to_string());
Ok((item_a_path, item_b_path))
}
pub fn mode_diff(
cmd: &mut Command,
args: &crate::args::Args,
@@ -129,17 +82,122 @@ pub fn mode_diff(
validate_diff_args(cmd, &ids, &tags)?;
let settings = crate::config::Settings::new(args, crate::config::Settings::default_dir()?)?;
let item_service = crate::services::item_service::ItemService::new(settings.dir.clone());
let settings = config::Settings::new(args, config::Settings::default_dir()?)?;
let item_service = ItemService::new(settings.dir.clone());
let (item_a, item_b) = fetch_and_validate_items(conn, &ids, &item_service)?;
let (path_a, path_b) = setup_diff_paths_and_compression(&item_service, &item_a, &item_b)?;
// TODO: Implement actual diff logic here
// For now, just print paths or something to make it compile
println!("Diff between {:?} and {:?}", path_a, path_b);
Ok(())
run_external_diff(&item_service, &item_a, &item_b)
}
/// Creates a pipe with CLOEXEC set atomically, returns (read_fd, write_fd).
fn create_pipe() -> Result<(OwnedFd, OwnedFd)> {
pipe2(OFlag::O_CLOEXEC).context("Failed to create pipe")
}
/// Streams decompressed item content through a pipe fd.
///
/// Returns a JoinHandle for the writer thread. The thread writes decompressed
/// data to write_fd and closes it when done (causing EOF for the reader).
fn spawn_writer_thread(
item_service: &ItemService,
item: &crate::services::types::ItemWithMeta,
write_fd: OwnedFd,
) -> std::thread::JoinHandle<Result<()>> {
let data_path = item_service.get_data_path().clone();
let id = match item.item.id {
Some(id) => id,
None => return std::thread::spawn(|| Err(anyhow!("item missing ID"))),
};
let compression = item.item.compression.clone();
let mut item_path = data_path;
item_path.push(id.to_string());
std::thread::spawn(move || -> Result<()> {
let compression_service = CompressionService::new();
let mut reader = compression_service
.stream_item_content(item_path, &compression)
.map_err(|e| anyhow::anyhow!("Failed to stream item {id}: {e}"))?;
// Convert OwnedFd to File — safe, takes ownership, closes on drop
let mut writer = std::fs::File::from(write_fd);
crate::common::stream_copy(&mut reader, |chunk| {
use std::io::Write;
writer.write_all(chunk)
})
.map_err(|e| anyhow::anyhow!("Error reading item {id}: {e}"))?;
// writer dropped here, closing write_fd → diff sees EOF
Ok(())
})
}
/// Runs external diff command, streaming decompressed content via /dev/fd pipes.
///
/// Creates two pipes, spawns writer threads to decompress each item into its pipe,
/// and runs `diff -u /dev/fd/N /dev/fd/M` where N and M are the pipe read fds.
/// The `command-fds` crate handles CLOEXEC clearing safely — no unsafe needed.
fn run_external_diff(
item_service: &ItemService,
item_a: &crate::services::types::ItemWithMeta,
item_b: &crate::services::types::ItemWithMeta,
) -> Result<()> {
if which::which_global("diff").is_err() {
return Err(anyhow::anyhow!(
"diff command not found. Please install diffutils."
));
}
let (read_fd_a, write_fd_a) = create_pipe()?;
let (read_fd_b, write_fd_b) = create_pipe()?;
// Spawn writer threads — they take ownership of write fds and close them on exit
let writer_a = spawn_writer_thread(item_service, item_a, write_fd_a);
let writer_b = spawn_writer_thread(item_service, item_b, write_fd_b);
// Get fd numbers for /dev/fd paths (borrows, does not consume)
let raw_read_a = read_fd_a.as_raw_fd();
let raw_read_b = read_fd_b.as_raw_fd();
debug!("DIFF: pipe fds: a(r={raw_read_a}) b(r={raw_read_b})");
// Spawn diff with /dev/fd/N paths. command-fds handles CLOEXEC clearing
// and fd inheritance safely — the fds are released from OwnedFd to the
// child process. If spawn fails, the OwnedFd values in FdMapping are
// dropped and the fds are properly closed.
let mut command = std::process::Command::new("diff");
command
.arg("-u")
.arg(format!("/dev/fd/{raw_read_a}"))
.arg(format!("/dev/fd/{raw_read_b}"))
.stdout(std::process::Stdio::inherit())
.stderr(std::process::Stdio::inherit())
.stdin(std::process::Stdio::null())
.fd_mappings(vec![
FdMapping {
parent_fd: read_fd_a,
child_fd: raw_read_a,
},
FdMapping {
parent_fd: read_fd_b,
child_fd: raw_read_b,
},
])
.map_err(|e| anyhow::anyhow!("FD mapping collision: {e}"))?;
let mut child = command.spawn().context("Failed to spawn diff command")?;
let status = child.wait().context("Failed to wait for diff command")?;
// Join writer threads and propagate errors
writer_a
.join()
.map_err(|e| anyhow::anyhow!("Writer A panicked: {e:?}"))??;
writer_b
.join()
.map_err(|e| anyhow::anyhow!("Writer B panicked: {e:?}"))??;
// diff returns 0 if identical, 1 if different, 2 on error
if status.code() == Some(2) {
Err(anyhow::anyhow!("diff command failed with an error"))
} else {
Ok(())
}
}

145
src/modes/export.rs Normal file
View File

@@ -0,0 +1,145 @@
use anyhow::{Context, Result, anyhow};
use chrono::Utc;
use clap::Command;
use log::debug;
use std::collections::HashMap;
use std::fs;
use std::path::PathBuf;
use crate::common::sanitize_ts_string;
use crate::config;
use crate::export_tar;
use crate::filter_plugin::FilterChain;
use crate::modes::common::sanitize_tags;
use crate::services::item_service::ItemService;
use crate::services::types::ItemWithMeta;
/// Export items to a `.keep.tar` archive.
///
/// Requires either IDs or tags (mutually exclusive). If IDs are given,
/// ALL must exist. Archives contain per-item data and metadata files.
pub fn mode_export(
cmd: &mut Command,
settings: &config::Settings,
ids: &[i64],
tags: &[String],
conn: &mut rusqlite::Connection,
data_path: PathBuf,
filter_chain: Option<FilterChain>,
) -> Result<()> {
// Validate: IDs XOR tags
if !ids.is_empty() && !tags.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Cannot use both IDs and tags with --export",
)
.exit();
}
if ids.is_empty() && tags.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Must provide either IDs or tags with --export",
)
.exit();
}
let item_service = ItemService::new(data_path.clone());
let meta_filter = settings.meta_filter();
// Resolve items
let items: Vec<ItemWithMeta> = if !ids.is_empty() {
// Fetch each ID individually; ALL must exist
let mut result = Vec::new();
for &id in ids {
match item_service.get_item(conn, id) {
Ok(item) => result.push(item),
Err(_) => {
cmd.error(
clap::error::ErrorKind::InvalidValue,
format!("Item {id} not found"),
)
.exit();
}
}
}
result
} else {
// Search by tags
item_service
.list_items(conn, tags, &meta_filter)
.map_err(|e| anyhow!("Unable to find matching items: {}", e))?
};
if items.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"No items found matching the given criteria",
)
.exit();
}
// Validate: --export-filename-format doesn't use per-item vars with multiple items
if items.len() > 1 {
let fmt = &settings.export_filename_format;
if fmt.contains("{id}") || fmt.contains("{tags}") || fmt.contains("{compression}") {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Cannot use {id}, {tags}, or {compression} in --export-filename-format when exporting multiple items",
)
.exit();
}
}
// Compute export name
let dir_name = export_tar::export_name(&settings.export_name, &items);
// Compute tar filename from format template
let now = Utc::now();
let ts_str = sanitize_ts_string(&now.format("%Y-%m-%dT%H:%M:%SZ").to_string());
let mut vars = HashMap::new();
vars.insert("name".to_string(), dir_name.clone());
vars.insert("ts".to_string(), ts_str.clone());
// For single-item exports, also provide per-item vars
if items.len() == 1 {
let item = &items[0];
let item_id = item.item.id.context("Item missing ID")?;
let item_tags = item.tag_names();
vars.insert("id".to_string(), item_id.to_string());
vars.insert("tags".to_string(), sanitize_tags(&item_tags));
vars.insert("compression".to_string(), item.item.compression.clone());
}
let basename = strfmt::strfmt(&settings.export_filename_format, &vars).map_err(|e| {
anyhow!(
"Invalid export filename format '{}': {}",
settings.export_filename_format,
e
)
})?;
let tar_filename = format!("{basename}.keep.tar");
// Write the tar archive
let tar_file = fs::File::create(&tar_filename)
.with_context(|| format!("Cannot create tar file: {tar_filename}"))?;
export_tar::write_export_tar(
tar_file,
&dir_name,
&items,
&data_path,
filter_chain.as_ref(),
&item_service,
conn,
)?;
if !settings.quiet {
eprintln!("{tar_filename}");
}
debug!("EXPORT: Wrote {} items to {tar_filename}", items.len());
Ok(())
}

View File

@@ -1,80 +1,17 @@
use crate::meta_plugin::MetaPlugin;
use anyhow::Result;
use clap::Command;
use serde::{Deserialize, Serialize};
use serde_yaml;
use std::collections::HashMap;
use strum::IntoEnumIterator;
/// Mode for generating a default configuration file.
///
/// This module creates a commented YAML template with default values for settings,
/// including list format, server config, compression, and meta plugins.
#[derive(Debug, Serialize, Deserialize)]
/// Default configuration structure for the generated template.
///
/// Includes core settings, list formatting, server options, compression, and meta plugins.
struct DefaultConfig {
dir: Option<String>,
list_format: Vec<ColumnConfig>,
human_readable: bool,
output_format: Option<String>,
quiet: bool,
force: bool,
server: Option<ServerConfig>,
compression_plugin: Option<CompressionPluginConfig>,
meta_plugins: Option<Vec<MetaPluginConfig>>,
}
#[derive(Debug, Serialize, Deserialize)]
/// Configuration for a column in the list format.
struct ColumnConfig {
name: String,
label: Option<String>,
#[serde(default)]
align: ColumnAlignment,
#[serde(default)]
max_len: Option<String>,
}
#[derive(Debug, Serialize, Deserialize, Default)]
#[serde(rename_all = "lowercase")]
/// Alignment options for table columns.
enum ColumnAlignment {
#[default]
Left,
Right,
}
#[derive(Debug, Serialize, Deserialize)]
/// Server configuration options.
struct ServerConfig {
address: Option<String>,
port: Option<u16>,
password_file: Option<String>,
password: Option<String>,
password_hash: Option<String>,
}
#[derive(Debug, Serialize, Deserialize)]
/// Configuration for the compression plugin.
struct CompressionPluginConfig {
name: String,
}
#[derive(Debug, Serialize, Deserialize)]
/// Configuration for a meta plugin.
struct MetaPluginConfig {
name: String,
#[serde(default)]
options: std::collections::HashMap<String, serde_yaml::Value>,
#[serde(default)]
outputs: std::collections::HashMap<String, String>,
}
use crate::common::schema::{gather_filter_plugin_schemas, gather_meta_plugin_schemas};
use crate::compression_engine::CompressionType;
use crate::config;
/// Generates and prints a default commented YAML configuration template.
///
/// Creates instances of available meta plugins to populate default options and outputs,
/// then serializes the config to YAML with all lines commented for easy editing.
/// Discovers all registered meta plugins, filter plugins, and compression engines
/// at runtime via the plugin schema system. Outputs a commented YAML template
/// with all available plugins and their default options/outputs.
///
/// # Arguments
///
@@ -84,151 +21,244 @@ struct MetaPluginConfig {
/// # Returns
///
/// `Ok(())` on success.
///
/// # Examples
///
/// ```
/// mode_generate_config(&mut cmd, &settings)?;
/// ```
pub fn mode_generate_config(_cmd: &mut Command, _settings: &crate::config::Settings) -> Result<()> {
// Create instances of each meta plugin to get their default options and outputs
let cwd_plugin = crate::meta_plugin::cwd::CwdMetaPlugin::new(None, None);
let digest_plugin = crate::meta_plugin::digest::DigestMetaPlugin::new(None, None);
let hostname_plugin = crate::meta_plugin::hostname::HostnameMetaPlugin::new(None, None);
#[cfg(feature = "magic")]
let magic_file_plugin = crate::meta_plugin::magic_file::MagicFileMetaPlugin::new(None, None);
let env_plugin = crate::meta_plugin::env::EnvMetaPlugin::new(None, None);
let meta_schemas = gather_meta_plugin_schemas();
let filter_schemas = gather_filter_plugin_schemas();
// Create a default configuration
let default_config = DefaultConfig {
dir: Some("~/.local/share/keep".to_string()),
list_format: vec![
ColumnConfig {
name: "id".to_string(),
label: Some("Item".to_string()),
align: ColumnAlignment::Right,
max_len: None,
},
ColumnConfig {
name: "time".to_string(),
label: Some("Time".to_string()),
align: ColumnAlignment::Right,
max_len: None,
},
ColumnConfig {
name: "size".to_string(),
label: Some("Size".to_string()),
align: ColumnAlignment::Right,
max_len: None,
},
ColumnConfig {
name: "tags".to_string(),
label: Some("Tags".to_string()),
align: ColumnAlignment::Left,
max_len: Some("40".to_string()),
},
ColumnConfig {
name: "meta:hostname_full".to_string(),
label: Some("Hostname".to_string()),
align: ColumnAlignment::Left,
max_len: Some("28".to_string()),
},
],
human_readable: false,
output_format: Some("table".to_string()),
quiet: false,
force: false,
server: Some(ServerConfig {
address: Some("127.0.0.1".to_string()),
port: Some(8080),
password_file: None,
password: None,
password_hash: None,
}),
compression_plugin: None,
meta_plugins: Some(vec![
MetaPluginConfig {
name: "cwd".to_string(),
options: cwd_plugin.options().clone(),
outputs: convert_outputs_to_string_map(cwd_plugin.outputs()),
},
MetaPluginConfig {
name: "digest".to_string(),
options: digest_plugin.options().clone(),
outputs: convert_outputs_to_string_map(digest_plugin.outputs()),
},
MetaPluginConfig {
name: "hostname".to_string(),
options: hostname_plugin.options().clone(),
outputs: convert_outputs_to_string_map(hostname_plugin.outputs()),
},
#[cfg(feature = "magic")]
MetaPluginConfig {
name: "magic_file".to_string(),
options: magic_file_plugin.options().clone(),
outputs: convert_outputs_to_string_map(magic_file_plugin.outputs()),
},
MetaPluginConfig {
name: "env".to_string(),
options: env_plugin.options().clone(),
outputs: convert_outputs_to_string_map(env_plugin.outputs()),
},
]),
};
// Build list_format defaults matching config.rs
let list_format = default_list_format();
// Serialize to YAML and comment out all lines
let yaml = serde_yaml::to_string(&default_config)?;
// Build meta_plugins with env as the default (active), rest commented
let meta_plugins = build_meta_plugins_section(&meta_schemas);
// Comment out every line
let commented_yaml = yaml
.lines()
.map(|line| {
if line.trim().is_empty() {
line.to_string()
} else {
format!("# {}", line)
// Build the full YAML
let mut lines = Vec::with_capacity(128);
lines.push("# Keep configuration file".to_string());
lines.push("# Uncomment and modify the settings you need.".to_string());
lines.push(String::new());
// Core settings
lines.push("# Data directory for storing items".to_string());
lines.push("dir: ~/.local/share/keep".to_string());
lines.push(String::new());
// List format
lines.push("# Column configuration for --list output".to_string());
lines.push("list_format:".to_string());
for col in &list_format {
lines.push(format!(" - name: {}", col.name));
lines.push(format!(" label: {}", col.label));
lines.push(format!(" align: {}", col.align));
}
})
.collect::<Vec<String>>()
.join("\n");
lines.push(String::new());
println!("{}", commented_yaml);
// Table config
lines.push("# Table display configuration".to_string());
lines.push("#table_config:".to_string());
lines.push("# style: nothing".to_string());
lines.push("# modifiers: []".to_string());
lines.push("# content_arrangement: dynamic".to_string());
lines.push("# truncination_indicator: \"\"".to_string());
lines.push(String::new());
// Other settings
lines.push("human_readable: false".to_string());
lines.push("output_format: table".to_string());
lines.push("quiet: false".to_string());
lines.push("force: false".to_string());
lines.push(String::new());
// Server config
lines.push("# Server configuration (only used with --server)".to_string());
lines.push("server:".to_string());
lines.push(" address: 127.0.0.1".to_string());
lines.push(" port: 8080".to_string());
lines.push("# username: keep".to_string());
lines.push("# password: null".to_string());
lines.push("# password_file: null".to_string());
lines.push("# password_hash: null".to_string());
lines.push("# jwt_secret: null".to_string());
lines.push("# jwt_secret_file: null".to_string());
lines.push("# cert_file: null".to_string());
lines.push("# key_file: null".to_string());
lines.push("# cors_origin: null".to_string());
lines.push(String::new());
// Compression plugin
lines.push("# Compression plugin to use".to_string());
lines.push("#compression_plugin:".to_string());
let mut comp_types: Vec<String> = CompressionType::iter().map(|ct| ct.to_string()).collect();
comp_types.sort();
for ct in &comp_types {
lines.push(format!("# name: {ct} # {}", compression_description(ct)));
}
lines.push(String::new());
// Meta plugins
lines.push("# Meta plugins to run when saving items".to_string());
lines.push("meta_plugins:".to_string());
for line in &meta_plugins {
lines.push(line.clone());
}
lines.push(String::new());
// Filter plugins reference
if !filter_schemas.is_empty() {
lines.push("# Available filter plugins (use with --filter)".to_string());
for schema in &filter_schemas {
lines.push(format!("# {}", schema.name));
if !schema.description.is_empty() {
lines.push(format!("# {}", schema.description));
}
for opt in &schema.options {
let req = if opt.required { "required" } else { "optional" };
lines.push(format!(
"# {} ({:?}, {})",
opt.name, opt.option_type, req
));
}
}
lines.push(String::new());
}
// Client config
lines.push("# Client configuration (requires client feature)".to_string());
lines.push("#client:".to_string());
lines.push("# url: null".to_string());
lines.push("# username: null".to_string());
lines.push("# password: null".to_string());
lines.push("# jwt: null".to_string());
// Print
for line in &lines {
println!("{line}");
}
Ok(())
}
/// Helper function to convert outputs from serde_yaml::Value to String.
///
/// Handles null (uses key), strings, and other values by serializing to YAML string.
///
/// # Arguments
///
/// * `outputs` - Reference to the outputs HashMap.
///
/// # Returns
///
/// A HashMap with string keys and values.
fn convert_outputs_to_string_map(
outputs: &std::collections::HashMap<String, serde_yaml::Value>,
) -> std::collections::HashMap<String, String> {
let mut result = std::collections::HashMap::new();
for (key, value) in outputs {
match value {
serde_yaml::Value::Null => {
// For null, use the key as the value
result.insert(key.clone(), key.clone());
}
serde_yaml::Value::String(s) => {
result.insert(key.clone(), s.clone());
}
_ => {
// Convert other values to their YAML string representation
result.insert(
key.clone(),
serde_yaml::to_string(value).unwrap_or_default(),
);
}
}
}
result
struct ListColumn {
name: String,
label: String,
align: String,
}
fn default_list_format() -> Vec<ListColumn> {
vec![
ListColumn {
name: "id".into(),
label: "Item".into(),
align: "right".into(),
},
ListColumn {
name: "time".into(),
label: "Time".into(),
align: "right".into(),
},
ListColumn {
name: "size".into(),
label: "Size".into(),
align: "right".into(),
},
ListColumn {
name: "meta:text_line_count".into(),
label: "Lines".into(),
align: "right".into(),
},
ListColumn {
name: "tags".into(),
label: "Tags".into(),
align: "left".into(),
},
ListColumn {
name: "meta:hostname_short".into(),
label: "Host".into(),
align: "left".into(),
},
ListColumn {
name: "meta:command".into(),
label: "Command".into(),
align: "left".into(),
},
]
}
fn build_meta_plugins_section(schemas: &[crate::common::schema::PluginSchema]) -> Vec<String> {
let mut lines = Vec::new();
for (i, schema) in schemas.iter().enumerate() {
let is_default = schema.name == "env";
let prefix = if is_default { "" } else { "# " };
if i > 0 {
lines.push(format!("{prefix}# --- {name} ---", name = schema.name));
}
lines.push(format!("{prefix}- name: {}", schema.name));
// Options
if !schema.options.is_empty() {
lines.push(format!("{prefix} options:"));
for opt in &schema.options {
if let Some(ref default) = opt.default {
let default_str = format_yaml_value(default);
lines.push(format!("{prefix} {}: {}", opt.name, default_str));
} else if opt.required {
lines.push(format!("{prefix} {}: null # required", opt.name));
}
}
} else {
lines.push(format!("{prefix} options: {{}}"));
}
// Outputs
if !schema.outputs.is_empty() {
lines.push(format!("{prefix} outputs:"));
for output in &schema.outputs {
lines.push(format!("{prefix} {}: {}", output.name, output.name));
}
} else {
lines.push(format!("{prefix} outputs: {{}}"));
}
}
lines
}
fn format_yaml_value(value: &serde_yaml::Value) -> String {
match value {
serde_yaml::Value::Null => "null".into(),
serde_yaml::Value::Bool(b) => b.to_string(),
serde_yaml::Value::Number(n) => n.to_string(),
serde_yaml::Value::String(s) => {
if s.contains(' ') || s.contains(':') || s.contains('#') {
format!("\"{s}\"")
} else {
s.clone()
}
}
serde_yaml::Value::Sequence(_) | serde_yaml::Value::Mapping(_) => {
serde_yaml::to_string(value)
.unwrap_or_default()
.trim()
.to_string()
}
serde_yaml::Value::Tagged(_) => serde_yaml::to_string(value)
.unwrap_or_default()
.trim()
.to_string(),
}
}
fn compression_description(name: &str) -> &str {
match name {
"lz4" => "Fast compression (native)",
"gzip" => "Good compression ratio (native)",
"bzip2" => "High compression (requires bzip2 binary)",
"xz" => "Very high compression (requires xz binary)",
"zstd" => "Modern fast compression (requires zstd binary)",
"raw" => "No compression (alias: none)",
_ => "",
}
}

View File

@@ -1,4 +1,4 @@
use anyhow::{Result, anyhow};
use anyhow::{Context, Result, anyhow};
use std::io::Write;
use crate::common::PIPESIZE;
@@ -52,10 +52,10 @@ pub fn mode_get(
let item_service = ItemService::new(data_path.clone());
let item_with_meta = item_service
.find_item(conn, ids, tags, &std::collections::HashMap::new())
.find_item(conn, ids, tags, &settings.meta_filter())
.map_err(|e| anyhow!("Unable to find matching item in database: {}", e))?;
let item_id = item_with_meta.item.id.unwrap();
let item_id = item_with_meta.item.id.context("Item missing ID")?;
// Determine if we should detect binary data
let mut detect_binary = !settings.force && std::io::stdout().is_terminal();
@@ -73,40 +73,39 @@ pub fn mode_get(
}
}
// Get a reader that applies the filters using the pre-parsed filter chain
let (mut reader, _, _) = item_service.get_item_content_info_streaming_with_chain(
conn,
item_id,
filter_chain.as_ref(),
)?;
if detect_binary {
// Read only the first 8192 bytes for binary detection
// Binary detection: sample first 8KB, then create a fresh reader for the full output.
let (mut sample_reader, _, _) = item_service
.get_item_content_info_streaming_with_item(item_with_meta, filter_chain.as_ref())?;
let mut sample_buffer = vec![0; PIPESIZE];
let bytes_read = reader.read(&mut sample_buffer)?;
let bytes_read = sample_reader.read(&mut sample_buffer)?;
if is_binary(&sample_buffer[..bytes_read]) {
return Err(anyhow!(
"Refusing to output binary data to TTY, use --force to override"
));
}
// We need to create a new reader since we consumed some bytes
let (new_reader, _, _) = item_service.get_item_content_info_streaming_with_chain(
// Create fresh reader for actual output (sampling consumed the first reader)
let (reader, _, _) = item_service.get_item_content_info_streaming_with_chain(
conn,
item_id,
filter_chain.as_ref(),
)?;
reader = new_reader;
stream_to_stdout(reader)?;
} else {
// No binary detection needed, use the already-fetched item with meta
let (reader, _, _) = item_service
.get_item_content_info_streaming_with_item(item_with_meta, filter_chain.as_ref())?;
stream_to_stdout(reader)?;
}
// Stream the content to stdout
let mut stdout = std::io::stdout();
let mut buffer = [0; PIPESIZE];
loop {
let bytes_read = reader.read(&mut buffer)?;
if bytes_read == 0 {
break;
}
stdout.write_all(&buffer[..bytes_read])?;
}
Ok(())
}
fn stream_to_stdout(mut reader: Box<dyn Read + Send>) -> Result<()> {
let mut stdout = std::io::stdout();
crate::common::stream_copy(&mut reader, |chunk| {
stdout.write_all(chunk)?;
Ok(())
})?;
Ok(())
}

192
src/modes/import.rs Normal file
View File

@@ -0,0 +1,192 @@
use anyhow::{Context, Result, anyhow};
use chrono::{DateTime, Utc};
use clap::Command;
use log::debug;
use std::collections::HashMap;
use std::fs;
use std::io::{Read, Write};
use std::path::PathBuf;
use std::str::FromStr;
use crate::common::PIPESIZE;
use crate::compression_engine::CompressionType;
use crate::config;
use crate::db;
use crate::import_tar;
use crate::modes::common::ImportMeta;
/// Import items from a `.keep.tar` archive or legacy `.meta.yml` file.
///
/// For `.keep.tar` files, all items are imported in their original ID order,
/// each receiving a new auto-incremented ID from the database.
/// For `.meta.yml` files, the legacy single-item import is used.
pub fn mode_import(
cmd: &mut Command,
settings: &config::Settings,
import_path: &str,
conn: &mut rusqlite::Connection,
data_path: PathBuf,
) -> Result<()> {
let path = PathBuf::from(import_path);
if import_path.ends_with(".keep.tar") {
// New tar-based import
let imported_ids = import_tar::import_from_tar(&path, conn, &data_path)?;
if !settings.quiet {
println!(
"KEEP: Imported {} item(s): {:?}",
imported_ids.len(),
imported_ids
);
}
debug!(
"IMPORT: Imported {} items from {}",
imported_ids.len(),
import_path
);
} else if import_path.ends_with(".meta.yml") {
// Legacy single-item import
import_legacy(cmd, settings, import_path, conn, data_path)?;
} else {
cmd.error(
clap::error::ErrorKind::InvalidValue,
format!("Unsupported import format: {}", import_path),
)
.exit();
}
Ok(())
}
/// Legacy single-item import from a `.meta.yml` file.
fn import_legacy(
cmd: &mut Command,
settings: &config::Settings,
meta_file: &str,
conn: &mut rusqlite::Connection,
data_path: PathBuf,
) -> Result<()> {
// Read metadata
let meta_yaml = fs::read_to_string(meta_file)
.with_context(|| format!("Cannot read metadata file: {meta_file}"))?;
let import_meta: ImportMeta = serde_yaml::from_str(&meta_yaml)
.with_context(|| format!("Cannot parse metadata file: {meta_file}"))?;
// Validate compression type
CompressionType::from_str(&import_meta.compression).map_err(|_| {
anyhow!(
"Invalid compression type '{}' in metadata file",
import_meta.compression
)
})?;
debug!(
"IMPORT: Parsed meta: ts={}, compression={}, tags={:?}",
import_meta.ts, import_meta.compression, import_meta.tags
);
// Create item with original timestamp
let item = db::insert_item_with_ts(conn, import_meta.ts, &import_meta.compression)?;
let item_id = item.id.context("New item missing ID")?;
debug!(
"IMPORT: Created item {} with compression {}",
item_id, import_meta.compression
);
// Set tags
if !import_meta.tags.is_empty() {
db::set_item_tags(conn, item.clone(), &import_meta.tags)?;
debug!("IMPORT: Set {} tags", import_meta.tags.len());
}
// Write data to storage using streaming copy
let mut item_path = data_path;
item_path.push(item_id.to_string());
let data_size: i64 = if let Some(ref data_file) = settings.import_data_file {
// Stream from file to storage using fixed-size buffers
let mut reader = fs::File::open(data_file)
.with_context(|| format!("Cannot read data file: {}", data_file.display()))?;
let mut writer = fs::File::create(&item_path)
.with_context(|| format!("Cannot create item file: {}", item_path.display()))?;
let mut buf = [0u8; PIPESIZE];
let mut total = 0i64;
loop {
let n = reader.read(&mut buf)?;
if n == 0 {
break;
}
writer.write_all(&buf[..n])?;
total += n as i64;
}
total
} else {
// Stream from stdin to storage
let mut writer = fs::File::create(&item_path)
.with_context(|| format!("Cannot create item file: {}", item_path.display()))?;
let mut stdin = std::io::stdin().lock();
let mut buf = [0u8; PIPESIZE];
let mut total = 0i64;
loop {
let n = stdin.read(&mut buf)?;
if n == 0 {
break;
}
writer.write_all(&buf[..n])?;
total += n as i64;
}
total
};
if data_size == 0 {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"No data provided (empty file or stdin)",
)
.exit();
}
debug!(
"IMPORT: Wrote {} bytes to {}",
data_size,
item_path.display()
);
// Set metadata
for (key, value) in &import_meta.metadata {
db::query_upsert_meta(
conn,
db::Meta {
id: item_id,
name: key.clone(),
value: value.clone(),
},
)?;
}
if !import_meta.metadata.is_empty() {
debug!(
"IMPORT: Set {} metadata entries",
import_meta.metadata.len()
);
}
// Update item sizes (use imported size if available, otherwise data length)
let size_to_record = import_meta.uncompressed_size.unwrap_or(data_size);
let mut updated_item = item;
updated_item.uncompressed_size = Some(size_to_record);
updated_item.compressed_size = Some(std::fs::metadata(&item_path)?.len() as i64);
updated_item.closed = true;
db::update_item(conn, updated_item)?;
if !settings.quiet {
println!(
"KEEP: Imported item {} tags: {:?}",
item_id, import_meta.tags
);
}
Ok(())
}

View File

@@ -1,7 +1,7 @@
use crate::config;
use crate::modes::common::{OutputFormat, format_size};
use crate::modes::common::{DisplayItemInfo, OutputFormat, format_size, render_item_info_table};
use crate::services::types::ItemWithMeta;
use anyhow::{Result, anyhow};
use anyhow::{Context, Result, anyhow};
use clap::Command;
use clap::error::ErrorKind;
use serde::{Deserialize, Serialize};
@@ -9,7 +9,6 @@ use std::path::PathBuf;
use crate::services::item_service::ItemService;
use chrono::prelude::*;
use comfy_table::{Attribute, Cell};
/// Displays detailed information about an item or the last item if no ID/tags specified.
///
@@ -36,7 +35,8 @@ use comfy_table::{Attribute, Cell};
///
/// # Examples
///
/// ```
/// ```ignore
/// // Example usage requires Command, Settings, Connection, and PathBuf instances
/// mode_info(&mut cmd, &settings, &mut vec![123], &mut vec![], &mut conn, data_path)?;
/// ```
pub fn mode_info(
@@ -64,9 +64,8 @@ pub fn mode_info(
// If both are empty, find_item will find the last item
let item_service = ItemService::new(data_path.clone());
// Use empty metadata HashMap
let item_with_meta = item_service
.find_item(conn, ids, tags, &std::collections::HashMap::new())
.find_item(conn, ids, tags, &settings.meta_filter())
.map_err(|e| anyhow!("Unable to find matching item in database: {}", e))?;
show_item(item_with_meta, settings, data_path)
@@ -124,7 +123,8 @@ pub struct ItemInfo {
///
/// # Examples
///
/// ```
/// ```ignore
/// // Example usage requires ItemWithMeta, Settings, and PathBuf instances
/// show_item(item_with_meta, &settings, data_path)?;
/// ```
fn show_item(
@@ -138,77 +138,44 @@ fn show_item(
return show_item_structured(item_with_meta, settings, data_path, output_format);
}
let item_tags = item_with_meta.tag_names();
let item = item_with_meta.item;
let item_id = item.id.unwrap();
let item_tags: Vec<String> = item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let mut table = crate::modes::common::create_table(false);
// Add all the rows
table.add_row(vec![
Cell::new("ID").add_attribute(Attribute::Bold),
Cell::new(item_id.to_string()),
]);
let timestamp_str = item.ts.with_timezone(&Local).format("%F %T %Z").to_string();
table.add_row(vec![
Cell::new("Timestamp").add_attribute(Attribute::Bold),
Cell::new(&timestamp_str),
]);
let item_id = item.id.context("Item missing ID")?;
let mut item_path_buf = data_path.clone();
item_path_buf.push(item.id.unwrap().to_string());
let path_str = item_path_buf
.to_str()
.expect("Unable to get item path")
.to_string();
table.add_row(vec![
Cell::new("Path").add_attribute(Attribute::Bold),
Cell::new(&path_str),
]);
item_path_buf.push(item_id.to_string());
let size_str = match item.size {
let size_str = match item.uncompressed_size {
Some(size) => format_size(size as u64, settings.human_readable),
None => "Missing".to_string(),
};
table.add_row(vec![
Cell::new("Stream Size").add_attribute(Attribute::Bold),
Cell::new(&size_str),
]);
table.add_row(vec![
Cell::new("Compression").add_attribute(Attribute::Bold),
Cell::new(&item.compression),
]);
let file_size_str = match item_path_buf.metadata() {
Ok(metadata) => format_size(metadata.len(), settings.human_readable),
Err(_) => "Missing".to_string(),
};
table.add_row(vec![
Cell::new("File Size").add_attribute(Attribute::Bold),
Cell::new(&file_size_str),
]);
let tags_str = item_tags.join(" ");
table.add_row(vec![
Cell::new("Tags").add_attribute(Attribute::Bold),
Cell::new(&tags_str),
]);
let metadata: Vec<(String, String)> = item_with_meta
.meta
.iter()
.map(|m| (m.name.clone(), m.value.clone()))
.collect();
// Add meta rows
for meta in item_with_meta.meta {
let meta_name = format!("Meta: {}", &meta.name);
table.add_row(vec![
Cell::new(&meta_name).add_attribute(Attribute::Bold),
Cell::new(&meta.value),
]);
}
let display = DisplayItemInfo {
id: item_id,
timestamp: item.ts.with_timezone(&Local).format("%F %T %Z").to_string(),
path: item_path_buf
.to_str()
.ok_or_else(|| anyhow::anyhow!("non-UTF-8 item path"))?
.to_string(),
stream_size: size_str,
compression: item.compression.clone(),
file_size: file_size_str,
tags: item_tags,
metadata,
};
println!(
"{}",
crate::modes::common::trim_lines_end(&table.trim_fmt())
);
render_item_info_table(&display, &settings.table_config);
Ok(())
}
@@ -234,7 +201,8 @@ fn show_item(
///
/// # Examples
///
/// ```
/// ```ignore
/// // Example usage requires ItemWithMeta, Settings, PathBuf, and OutputFormat instances
/// show_item_structured(item_with_meta, &settings, data_path, OutputFormat::Json)?;
/// ```
fn show_item_structured(
@@ -243,10 +211,10 @@ fn show_item_structured(
data_path: PathBuf,
output_format: OutputFormat,
) -> Result<()> {
let item_tags: Vec<String> = item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let item_tags = item_with_meta.tag_names();
let meta_map = item_with_meta.meta_as_map();
let item = item_with_meta.item;
let item_id = item.id.unwrap();
let item_id = item.id.context("Item missing ID")?;
let mut item_path_buf = data_path.clone();
item_path_buf.push(item_id.to_string());
@@ -257,7 +225,7 @@ fn show_item_structured(
None => "Missing".to_string(),
};
let stream_size_formatted = match item.size {
let stream_size_formatted = match item.uncompressed_size {
Some(size) => format_size(size as u64, settings.human_readable),
None => "Missing".to_string(),
};
@@ -270,7 +238,7 @@ fn show_item_structured(
.format("%F %T %Z")
.to_string(),
path: item_path_buf.to_str().unwrap_or("").to_string(),
stream_size: item.size.map(|s| s as u64),
stream_size: item.uncompressed_size.map(|s| s as u64),
stream_size_formatted,
compression: item.compression,
file_size,
@@ -279,15 +247,7 @@ fn show_item_structured(
meta: meta_map,
};
match output_format {
OutputFormat::Json => {
println!("{}", serde_json::to_string_pretty(&item_info)?);
}
OutputFormat::Yaml => {
println!("{}", serde_yaml::to_string(&item_info)?);
}
OutputFormat::Table => unreachable!(),
}
crate::modes::common::print_serialized(&item_info, &output_format)?;
Ok(())
}

View File

@@ -5,10 +5,10 @@
/// including table, JSON, and YAML.
use crate::config;
use crate::modes::common::ColumnType;
use crate::modes::common::{OutputFormat, format_size};
use crate::modes::common::{OutputFormat, apply_color, apply_table_attribute, format_size};
use crate::services::item_service::ItemService;
use crate::services::types::ItemWithMeta;
use anyhow::Result;
use anyhow::{Context, Result};
use comfy_table::CellAlignment;
use comfy_table::{Attribute, Cell, Color, Row};
use serde::{Deserialize, Serialize};
@@ -63,88 +63,6 @@ struct ListItem {
meta: std::collections::HashMap<String, String>,
}
// Helper function to apply color to a cell.
///
/// This function converts the configuration color to a comfy-table Color and
/// applies it to the cell as foreground or background color.
///
/// # Arguments
///
/// * `cell` - The cell to modify.
/// * `color` - The color from configuration to apply.
/// * `is_foreground` - True for foreground color, false for background.
///
/// # Returns
///
/// The modified cell with color applied.
fn apply_color(mut cell: Cell, color: &crate::config::TableColor, is_foreground: bool) -> Cell {
use crate::config::TableColor::*;
use comfy_table::Color;
let comfy_color = match color {
Black => Color::Black,
Red => Color::Red,
Green => Color::Green,
Yellow => Color::Yellow,
Blue => Color::Blue,
Magenta => Color::Magenta,
Cyan => Color::Cyan,
White => Color::White,
Gray => Color::Grey,
DarkRed => Color::DarkRed,
DarkGreen => Color::DarkGreen,
DarkYellow => Color::DarkYellow,
DarkBlue => Color::DarkBlue,
DarkMagenta => Color::DarkMagenta,
DarkCyan => Color::DarkCyan,
Rgb(r, g, b) => Color::Rgb {
r: *r,
g: *g,
b: *b,
},
};
if is_foreground {
cell = cell.fg(comfy_color);
} else {
cell = cell.bg(comfy_color);
}
cell
}
// Helper function to apply attribute to a cell.
///
/// This function applies a single table attribute to the cell based on the
/// configuration attribute type.
///
/// # Arguments
///
/// * `cell` - The cell to modify.
/// * `attribute` - The attribute from configuration to apply.
///
/// # Returns
///
/// The modified cell with attribute applied.
fn apply_attribute(mut cell: Cell, attribute: &crate::config::TableAttribute) -> Cell {
use crate::config::TableAttribute::*;
use comfy_table::Attribute;
match attribute {
Bold => cell = cell.add_attribute(Attribute::Bold),
Dim => cell = cell.add_attribute(Attribute::Dim),
Italic => cell = cell.add_attribute(Attribute::Italic),
Underlined => cell = cell.add_attribute(Attribute::Underlined),
SlowBlink => cell = cell.add_attribute(Attribute::SlowBlink),
RapidBlink => cell = cell.add_attribute(Attribute::RapidBlink),
Reverse => cell = cell.add_attribute(Attribute::Reverse),
Hidden => cell = cell.add_attribute(Attribute::Hidden),
CrossedOut => cell = cell.add_attribute(Attribute::CrossedOut),
}
cell
}
/// Main list mode function.
///
/// This function handles the listing of items based on tags, applying formatting
@@ -163,23 +81,24 @@ fn apply_attribute(mut cell: Cell, attribute: &crate::config::TableAttribute) ->
///
/// * `Result<()>` - Success or error if listing fails.
pub fn mode_list(
cmd: &mut clap::Command,
_cmd: &mut clap::Command,
settings: &config::Settings,
ids: &mut [i64],
tags: &[String],
conn: &mut rusqlite::Connection,
data_path: std::path::PathBuf,
) -> Result<()> {
if !ids.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"ID given, you can only supply tags when using --list",
)
.exit();
}
let item_service = ItemService::new(data_path.clone());
let items_with_meta = item_service.list_items(conn, tags, &std::collections::HashMap::new())?;
let items_with_meta = item_service.get_items(conn, ids, tags, &settings.meta_filter())?;
if settings.ids_only {
for item_with_meta in &items_with_meta {
if let Some(id) = item_with_meta.item.id {
println!("{id}");
}
}
return Ok(());
}
let output_format = crate::modes::common::settings_output_format(settings);
@@ -197,12 +116,12 @@ pub fn mode_list(
table.set_header(header_cells);
for item_with_meta in items_with_meta {
let tags: Vec<String> = item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let tags = item_with_meta.tag_names();
let meta = item_with_meta.meta_as_map();
let item = item_with_meta.item;
let mut item_path = data_path.clone();
item_path.push(item.id.unwrap().to_string());
item_path.push(item.id.context("Item missing ID")?.to_string());
let mut table_row = Row::new();
@@ -210,7 +129,7 @@ pub fn mode_list(
let column_type = column
.name
.parse::<ColumnType>()
.unwrap_or_else(|_| panic!("Unknown column {:?}", column.name));
.with_context(|| format!("Unknown column type {:?} in list format", column.name))?;
let mut meta_name: Option<&str> = None;
@@ -228,19 +147,29 @@ pub fn mode_list(
.with_timezone(&chrono::Local)
.format("%F %T")
.to_string(),
ColumnType::Size => match item.size {
ColumnType::Size => match item.uncompressed_size {
Some(size) => format_size(size as u64, settings.human_readable),
None => match item_path.metadata() {
Ok(_) => "Unknown".to_string(),
Err(_) => "Missing".to_string(),
Err(e) => {
log::warn!("File missing or inaccessible: {}", e);
"Missing".to_string()
}
},
},
ColumnType::Compression => item.compression.to_string(),
ColumnType::FileSize => match item_path.metadata() {
Ok(metadata) => format_size(metadata.len(), settings.human_readable),
Err(_) => "Missing".to_string(),
Err(e) => {
log::warn!("File missing or inaccessible: {}", e);
"Missing".to_string()
}
},
ColumnType::FilePath => item_path.clone().into_os_string().into_string().unwrap(),
ColumnType::FilePath => item_path
.clone()
.into_os_string()
.into_string()
.unwrap_or_else(|os| os.to_string_lossy().into_owned()),
ColumnType::Tags => tags.join(" "),
ColumnType::Meta => match meta_name {
Some(meta_name) => match meta.get(meta_name) {
@@ -278,7 +207,7 @@ pub fn mode_list(
}
for attribute in &column.attributes {
cell = apply_attribute(cell, attribute);
cell = apply_table_attribute(cell, attribute);
}
// Apply padding if specified
@@ -290,7 +219,7 @@ pub fn mode_list(
// Apply styling for specific cases
match column_type {
ColumnType::Size => {
if item.size.is_none() {
if item.uncompressed_size.is_none() {
if item_path.metadata().is_ok() {
cell = cell
.fg(comfy_table::Color::Yellow)
@@ -340,10 +269,10 @@ fn show_list_structured(
let mut list_items = Vec::new();
for item_with_meta in items_with_meta {
let tags: Vec<String> = item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let tags = item_with_meta.tag_names();
let meta = item_with_meta.meta_as_map();
let item = item_with_meta.item;
let item_id = item.id.unwrap();
let item_id = item.id.context("Item missing ID")?;
let mut item_path = data_path.clone();
item_path.push(item_id.to_string());
@@ -354,7 +283,7 @@ fn show_list_structured(
None => "Missing".to_string(),
};
let size_formatted = match item.size {
let size_formatted = match item.uncompressed_size {
Some(size) => crate::modes::common::format_size(size as u64, settings.human_readable),
None => "Unknown".to_string(),
};
@@ -366,7 +295,7 @@ fn show_list_structured(
.with_timezone(&chrono::Local)
.format("%F %T")
.to_string(),
size: item.size.map(|s| s as u64),
size: item.uncompressed_size.map(|s| s as u64),
size_formatted,
compression: item.compression,
file_size,
@@ -379,15 +308,7 @@ fn show_list_structured(
list_items.push(list_item);
}
match output_format {
OutputFormat::Json => {
println!("{}", serde_json::to_string_pretty(&list_items)?);
}
OutputFormat::Yaml => {
println!("{}", serde_yaml::to_string(&list_items)?);
}
OutputFormat::Table => unreachable!(),
}
crate::modes::common::print_serialized(&list_items, &output_format)?;
Ok(())
}

View File

@@ -1,18 +1,24 @@
#[cfg(feature = "server")]
pub mod server;
#[cfg(feature = "client")]
pub mod client;
/// Common utilities for all modes, including column types and output formatting.
pub mod common;
pub mod delete;
pub mod diff;
pub mod export;
pub mod generate_config;
pub mod get;
pub mod import;
pub mod info;
pub mod list;
pub mod save;
pub mod status;
pub mod status_plugins;
pub mod update;
/// Column types, output formats, and formatting utilities shared across modes.
pub use common::{ColumnType, OutputFormat, format_size, settings_output_format};
@@ -23,12 +29,18 @@ pub use delete::mode_delete;
/// Compares two items and shows differences.
pub use diff::mode_diff;
/// Exports an item to data and metadata files.
pub use export::mode_export;
/// Generates a default configuration file.
pub use generate_config::mode_generate_config;
/// Retrieves and outputs item content.
pub use get::mode_get;
/// Imports an item from metadata and data files.
pub use import::mode_import;
/// Displays detailed information about items.
pub use info::mode_info;
@@ -47,3 +59,6 @@ pub use status::mode_status;
/// Lists available plugins and their configurations.
pub use status_plugins::mode_status_plugins;
/// Updates an item's tags and metadata by ID.
pub use update::mode_update;

View File

@@ -63,7 +63,7 @@ impl<R: Read, W: Write> Read for TeeReader<R, W> {
///
/// # Examples
///
/// ```
/// ```ignore
/// let mut tee = TeeReader {
/// reader: std::io::Cursor::new(b"Hello, world!"),
/// writer: std::io::sink(),
@@ -104,7 +104,7 @@ impl<R: Read, W: Write> Read for TeeReader<R, W> {
///
/// # Examples
///
/// ```
/// ```ignore
/// // In CLI context, this would be called internally
/// mode_save(&mut cmd, &settings, &mut vec![], &mut vec!["important".to_string()], &mut conn, data_path)?;
/// ```

View File

@@ -1,16 +1,16 @@
use axum::{
http::{header, StatusCode},
http::{StatusCode, header},
response::Response,
};
use serde::Serialize;
use log;
use serde::Serialize;
pub struct ResponseBuilder;
impl ResponseBuilder {
pub fn json<T: Serialize>(data: T) -> Result<Response, StatusCode> {
let json = serde_json::to_vec(&data).map_err(|e| {
log::warn!("Failed to serialize response: {}", e);
log::warn!("Failed to serialize response: {e}");
StatusCode::INTERNAL_SERVER_ERROR
})?;
@@ -19,7 +19,7 @@ impl ResponseBuilder {
.header(header::CONTENT_LENGTH, json.len().to_string())
.body(axum::body::Body::from(json))
.map_err(|e| {
log::warn!("Failed to build response: {}", e);
log::warn!("Failed to build response: {e}");
StatusCode::INTERNAL_SERVER_ERROR
})
}
@@ -30,7 +30,7 @@ impl ResponseBuilder {
.header(header::CONTENT_LENGTH, content.len().to_string())
.body(axum::body::Body::from(content.to_vec()))
.map_err(|e| {
log::warn!("Failed to build response: {}", e);
log::warn!("Failed to build response: {e}");
StatusCode::INTERNAL_SERVER_ERROR
})
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,72 +0,0 @@
use axum::{
extract::State,
http::StatusCode,
response::sse::{Event, KeepAlive, Sse},
};
use futures::stream::{self, Stream};
use log::{debug, info};
use std::convert::Infallible;
use std::time::Duration;
use crate::modes::server::common::AppState;
use crate::modes::server::mcp::KeepMcpServer;
#[utoipa::path(
get,
path = "/mcp/sse",
operation_id = "mcp_sse",
summary = "MCP SSE endpoint",
description = "Server-Sent Events for Model Context Protocol. Enables AI tools to interact with Keep's storage and retrieval functions.",
responses(
(status = 200, description = "SSE stream established"),
(status = 401, description = "Unauthorized"),
(status = 500, description = "Internal server error")
),
security(
("bearerAuth" = [])
),
tag = "mcp"
)]
pub async fn handle_mcp_sse(
State(state): State<AppState>,
) -> Result<Sse<impl Stream<Item = Result<Event, Infallible>>>, StatusCode> {
debug!("MCP: Starting SSE endpoint");
let _mcp_server = KeepMcpServer::new(state);
// Create a simple message channel for SSE communication
let (tx, rx) = tokio::sync::mpsc::unbounded_channel::<String>();
// Send initial connection message
let _ = tx.send("data: {\"type\":\"connection\",\"status\":\"connected\"}\n\n".to_string());
// For now, create a simple stream that sends periodic keep-alive messages
// In a full implementation, this would integrate with the rmcp transport layer
let stream = stream::unfold((rx, tx), |(mut rx, tx)| async move {
tokio::select! {
msg = rx.recv() => {
match msg {
Some(data) => {
let event = Event::default().data(data);
Some((Ok(event), (rx, tx)))
}
None => None,
}
}
_ = tokio::time::sleep(Duration::from_secs(30)) => {
let event = Event::default()
.event("keep-alive")
.data("ping");
Some((Ok(event), (rx, tx)))
}
}
});
info!("MCP: SSE endpoint established");
Ok(Sse::new(stream).keep_alive(
KeepAlive::new()
.interval(Duration::from_secs(30))
.text("keep-alive"),
))
}

View File

@@ -1,10 +1,11 @@
#[cfg(feature = "swagger")]
pub mod common;
pub mod item;
#[cfg(feature = "mcp")]
pub mod mcp;
pub mod status;
use axum::{Router, routing::get};
use axum::{
Router,
routing::{delete, get, post},
};
use crate::modes::server::common::AppState;
use utoipa::OpenApi;
@@ -53,12 +54,14 @@ use utoipa_swagger_ui::SwaggerUi;
(url = "/", description = "Local server")
)
)]
#[allow(dead_code)]
struct ApiDoc;
pub fn add_routes(router: Router<AppState>) -> Router<AppState> {
let router = router
router
// Status endpoints
.route("/api/status", get(status::handle_status))
.route("/api/plugins/status", get(status::handle_plugins_status))
// Item endpoints
.route(
"/api/item/",
@@ -72,18 +75,20 @@ pub fn add_routes(router: Router<AppState>) -> Router<AppState> {
"/api/item/latest/content",
get(item::handle_get_item_latest_content),
)
.route("/api/item/{item_id}/meta", get(item::handle_get_item_meta))
.route(
"/api/item/{item_id}/meta",
get(item::handle_get_item_meta).post(item::handle_post_item_meta),
)
.route(
"/api/item/{item_id}/content",
get(item::handle_get_item_content),
);
#[cfg(feature = "mcp")]
{
router = router.route("/mcp/sse", get(mcp::handle_mcp_sse));
}
router
)
.route("/api/item/{item_id}", delete(item::handle_delete_item))
.route("/api/item/{item_id}/info", get(item::handle_get_item_info))
.route("/api/item/{item_id}/update", post(item::handle_update_item))
.route("/api/diff", get(item::handle_diff_items))
.route("/api/export", get(item::handle_export_items))
.route("/api/import", post(item::handle_import_items))
}
#[cfg(feature = "swagger")]

View File

@@ -1,6 +1,32 @@
use axum::{extract::State, http::StatusCode, response::Json};
use crate::modes::server::common::{AppState, StatusInfoResponse};
use crate::modes::server::common::{ApiResponse, AppState, StatusInfoResponse};
async fn generate_status(
state: &AppState,
) -> Result<crate::common::status::StatusInfo, StatusCode> {
let db_path = state
.db
.lock()
.await
.path()
.unwrap_or("unknown")
.to_string();
let status_service = crate::services::status_service::StatusService::new();
let mut cmd = state.cmd.lock().await;
status_service
.generate_status(
&mut cmd,
&state.settings,
state.data_dir.clone(),
db_path.into(),
)
.map_err(|e| {
log::warn!("Failed to generate status: {e}");
StatusCode::INTERNAL_SERVER_ERROR
})
}
#[utoipa::path(
get,
@@ -39,7 +65,7 @@ use crate::modes::server::common::{AppState, StatusInfoResponse};
///
/// # Examples
///
/// ```
/// ```ignore
/// // In an Axum app:
/// async fn app() -> Result<Json<StatusInfoResponse>, StatusCode> {
/// handle_status(State(app_state)).await
@@ -48,24 +74,7 @@ use crate::modes::server::common::{AppState, StatusInfoResponse};
pub async fn handle_status(
State(state): State<AppState>,
) -> Result<Json<StatusInfoResponse>, StatusCode> {
// Get database path
let db_path = state
.db
.lock()
.await
.path()
.unwrap_or("unknown")
.to_string();
// Use the status service to generate status info showing configured plugins
let status_service = crate::services::status_service::StatusService::new();
let mut cmd = state.cmd.lock().await;
let status_info = status_service.generate_status(
&mut cmd,
&state.settings,
state.data_dir.clone(),
db_path.into(),
);
let status_info = generate_status(&state).await?;
let response = StatusInfoResponse {
success: true,
@@ -75,3 +84,46 @@ pub async fn handle_status(
Ok(Json(response))
}
#[derive(Debug, serde::Serialize, serde::Deserialize, utoipa::ToSchema)]
pub struct PluginsStatusResponse {
pub meta_plugins: std::collections::HashMap<String, crate::common::status::MetaPluginInfo>,
pub filter_plugins: Vec<crate::common::status::FilterPluginInfo>,
pub compression: Vec<crate::common::status::CompressionInfo>,
}
#[utoipa::path(
get,
path = "/api/plugins/status",
operation_id = "keep_plugins_status",
summary = "Get plugins status",
description = "Retrieve detailed status of all available plugins including meta, filter, and compression plugins.",
responses(
(status = 200, description = "Plugins status retrieved", body = ApiResponse<PluginsStatusResponse>),
(status = 401, description = "Unauthorized"),
(status = 500, description = "Internal server error")
),
security(
("bearerAuth" = [])
),
tag = "status"
)]
pub async fn handle_plugins_status(
State(state): State<AppState>,
) -> Result<Json<crate::modes::server::common::ApiResponse<PluginsStatusResponse>>, StatusCode> {
let status_info = generate_status(&state).await?;
let response_data = PluginsStatusResponse {
meta_plugins: status_info.meta_plugins,
filter_plugins: status_info.filter_plugins,
compression: status_info.compression,
};
let response = crate::modes::server::common::ApiResponse::<PluginsStatusResponse> {
success: true,
data: Some(response_data),
error: None,
};
Ok(Json(response))
}

118
src/modes/server/auth.rs Normal file
View File

@@ -0,0 +1,118 @@
use axum::http::Method;
use jsonwebtoken::{DecodingKey, TokenData, Validation, decode};
use log::debug;
use serde::Deserialize;
/// JWT claims for permission-based access control.
///
/// External token generators should include these claims in the JWT payload.
/// The server validates the signature and checks permissions for each request.
///
/// # Example token payload
///
/// ```json
/// {
/// "sub": "my-client",
/// "exp": 1735689600,
/// "read": true,
/// "write": true,
/// "delete": false
/// }
/// ```
#[derive(Debug, Deserialize)]
pub struct Claims {
/// Subject (client identifier).
pub sub: String,
/// Expiration time (Unix timestamp).
pub exp: usize,
/// Read permission (GET requests).
#[serde(default)]
pub read: bool,
/// Write permission (POST/PUT requests).
#[serde(default)]
pub write: bool,
/// Delete permission (DELETE requests).
#[serde(default)]
pub delete: bool,
}
/// Returns the required permission for an HTTP method.
///
/// # Mapping
///
/// - GET, HEAD → "read"
/// - POST, PUT, PATCH → "write"
/// - DELETE → "delete"
///
/// # Arguments
///
/// * `method` - The HTTP method of the incoming request.
///
/// # Returns
///
/// A string slice representing the required permission.
pub fn required_permission(method: &Method) -> &'static str {
if method == Method::GET || method == Method::HEAD {
"read"
} else if method == Method::DELETE {
"delete"
} else {
"write"
}
}
/// Checks if the JWT claims grant the required permission.
///
/// # Arguments
///
/// * `claims` - The validated JWT claims.
/// * `permission` - The required permission string ("read", "write", or "delete").
///
/// # Returns
///
/// `true` if the claims grant the permission, `false` otherwise.
pub fn check_permission(claims: &Claims, permission: &str) -> bool {
match permission {
"read" => claims.read,
"write" => claims.write,
"delete" => claims.delete,
_ => false,
}
}
/// Validates a JWT token and returns the claims.
///
/// Uses HMAC-SHA256 signature verification with the provided secret.
///
/// # Arguments
///
/// * `token` - The JWT token string (without "Bearer " prefix).
/// * `secret` - The secret key used to verify the signature.
///
/// # Returns
///
/// * `Ok(Claims)` - The validated claims if the token is valid.
/// * `Err(String)` - A human-readable error message if validation fails.
pub fn validate_jwt(token: &str, secret: &str) -> Result<Claims, String> {
let mut validation = Validation::new(jsonwebtoken::Algorithm::HS256);
validation.algorithms = vec![jsonwebtoken::Algorithm::HS256];
validation.set_required_spec_claims(&["exp", "sub"]);
let token_data: TokenData<Claims> = decode::<Claims>(
token,
&DecodingKey::from_secret(secret.as_bytes()),
&validation,
)
.map_err(|e| {
debug!("JWT validation failed: {e}");
match e.kind() {
jsonwebtoken::errors::ErrorKind::ExpiredSignature => "Token expired".to_string(),
jsonwebtoken::errors::ErrorKind::InvalidSignature => "Invalid token".to_string(),
jsonwebtoken::errors::ErrorKind::InvalidToken => "Malformed token".to_string(),
jsonwebtoken::errors::ErrorKind::ImmatureSignature => "Token not yet valid".to_string(),
_ => "Invalid token".to_string(),
}
})?;
Ok(token_data.claims)
}

View File

@@ -1,4 +1,5 @@
use crate::services::item_service::ItemService;
use crate::services::types::ItemWithMeta;
/// Common utilities and types for the server module.
///
/// This module provides shared structures, functions, and middleware used across
@@ -7,15 +8,15 @@ use crate::services::item_service::ItemService;
///
/// # Usage
///
/// ```rust
/// ```rust,ignore
/// // Illustrative — requires runtime values (db connection, settings).
/// use keep::modes::server::common::{ServerConfig, AppState};
/// let config = ServerConfig { address: "127.0.0.1".to_string(), ..Default::default() };
/// let state = AppState { /* ... */ };
/// let config = ServerConfig { address: "127.0.0.1".to_string(), port: Some(8080), /* ... */ };
/// ```
use anyhow::Result;
use axum::{
extract::{ConnectInfo, Request},
http::{HeaderMap, StatusCode},
http::{HeaderMap, Method, StatusCode},
middleware::Next,
response::Response,
};
@@ -27,6 +28,7 @@ use std::net::SocketAddr;
use std::path::PathBuf;
use std::sync::Arc;
use std::time::Instant;
use subtle::ConstantTimeEq;
use tokio::sync::Mutex;
use utoipa::ToSchema;
@@ -37,12 +39,18 @@ use utoipa::ToSchema;
///
/// # Examples
///
/// ```
/// ```rust
/// use keep::modes::server::common::ServerConfig;
/// let config = ServerConfig {
/// address: "127.0.0.1".to_string(),
/// port: Some(8080),
/// username: None,
/// password: None,
/// password_hash: None,
/// jwt_secret: None,
/// cert_file: None,
/// key_file: None,
/// cors_origin: None,
/// };
/// ```
#[derive(Debug, Clone)]
@@ -57,9 +65,13 @@ pub struct ServerConfig {
/// The TCP port number to listen on. If not specified, a default port (typically
/// 8080 or 21080) will be used.
pub port: Option<u16>,
/// Optional authentication username.
///
/// Username for Basic authentication. Defaults to "keep" when not specified.
pub username: Option<String>,
/// Optional authentication password.
///
/// Plain text password for basic or bearer token authentication. This should be
/// Plain text password for Basic authentication. This should be
/// used only for testing or low-security environments.
pub password: Option<String>,
/// Optional hashed authentication password.
@@ -67,6 +79,25 @@ pub struct ServerConfig {
/// Pre-hashed password (Unix crypt format) for secure authentication. Preferred
/// over plain text password for production use.
pub password_hash: Option<String>,
/// Optional JWT secret for token-based authentication.
///
/// When set, the server validates JWT tokens (HS256) and checks permission claims
/// (read, write, delete) for each request. Takes priority over password auth.
pub jwt_secret: Option<String>,
/// Optional path to TLS certificate file (PEM).
///
/// When both cert_file and key_file are set, the server uses HTTPS.
pub cert_file: Option<PathBuf>,
/// Optional path to TLS private key file (PEM).
///
/// When both cert_file and key_file are set, the server uses HTTPS.
pub key_file: Option<PathBuf>,
/// Optional CORS allowed origin.
///
/// When set, cross-origin requests are restricted to this origin.
/// Defaults to "http://localhost" if not specified. Use "*" to allow
/// all origins (not recommended for production).
pub cors_origin: Option<String>,
}
/// Application state shared across all routes.
@@ -76,7 +107,8 @@ pub struct ServerConfig {
///
/// # Examples
///
/// ```rust
/// ```rust,ignore
/// // AppState requires runtime values (db connection, settings) not available in doctests.
/// use keep::modes::server::common::AppState;
/// use std::sync::Arc;
/// use tokio::sync::Mutex;
@@ -126,9 +158,9 @@ pub struct AppState {
///
/// ```rust
/// use keep::modes::server::common::ApiResponse;
/// let response: ApiResponse<Vec<ItemInfo>> = ApiResponse {
/// let response: ApiResponse<String> = ApiResponse {
/// success: true,
/// data: Some(items),
/// data: Some("items".to_string()),
/// error: None,
/// };
/// ```
@@ -151,6 +183,26 @@ pub struct ApiResponse<T> {
pub error: Option<String>,
}
impl<T> ApiResponse<T> {
/// Creates a successful API response with the given data.
pub fn ok(data: T) -> Self {
Self {
success: true,
data: Some(data),
error: None,
}
}
/// Creates a successful API response with no data.
pub fn empty() -> Self {
Self {
success: true,
data: None,
error: None,
}
}
}
/// Response type for list of item information.
///
/// Specialized response for endpoints that return multiple items.
@@ -161,7 +213,7 @@ pub struct ApiResponse<T> {
/// use keep::modes::server::common::ItemInfoListResponse;
/// let response = ItemInfoListResponse {
/// success: true,
/// data: Some(vec![item_info]),
/// data: Some(vec![]),
/// error: None,
/// };
/// ```
@@ -191,7 +243,7 @@ pub struct ItemInfoListResponse {
/// use keep::modes::server::common::ItemInfoResponse;
/// let response = ItemInfoResponse {
/// success: true,
/// data: Some(item_info),
/// data: None,
/// error: None,
/// };
/// ```
@@ -221,7 +273,7 @@ pub struct ItemInfoResponse {
/// use keep::modes::server::common::ItemContentInfoResponse;
/// let response = ItemContentInfoResponse {
/// success: true,
/// data: Some(content_info),
/// data: None,
/// error: None,
/// };
/// ```
@@ -251,7 +303,7 @@ pub struct ItemContentInfoResponse {
/// use keep::modes::server::common::MetadataResponse;
/// let response = MetadataResponse {
/// success: true,
/// data: Some(meta_map),
/// data: None,
/// error: None,
/// };
/// ```
@@ -281,7 +333,7 @@ pub struct MetadataResponse {
/// use keep::modes::server::common::StatusInfoResponse;
/// let response = StatusInfoResponse {
/// success: true,
/// data: Some(status_info),
/// data: None,
/// error: None,
/// };
/// ```
@@ -314,10 +366,13 @@ pub struct StatusInfoResponse {
/// let item_info = ItemInfo {
/// id: 42,
/// ts: "2023-12-01T15:30:45Z".to_string(),
/// size: Some(1024),
/// uncompressed_size: Some(1024),
/// compressed_size: Some(512),
/// closed: true,
/// compression: "gzip".to_string(),
/// tags: vec!["important".to_string()],
/// metadata: HashMap::from([("mime_type".to_string(), "text/plain".to_string())]),
/// file_size: Some(512),
/// };
/// ```
#[derive(Serialize, Deserialize, ToSchema)]
@@ -333,11 +388,19 @@ pub struct ItemInfo {
/// The creation timestamp of the item in ISO 8601 format.
#[schema(example = "2023-12-01T15:30:45Z")]
pub ts: String,
/// Size in bytes.
/// Uncompressed size in bytes.
///
/// The size of the item's content in bytes, may be None if not set.
/// The uncompressed size of the item's content in bytes, may be None if not set.
#[schema(example = 1024)]
pub size: Option<i64>,
pub uncompressed_size: Option<i64>,
/// Compressed size in bytes.
///
/// The compressed file size on disk in bytes, may be None if not set.
#[schema(example = 512)]
pub compressed_size: Option<i64>,
/// Whether the item has been fully written and closed.
#[schema(example = true)]
pub closed: bool,
/// Compression type.
///
/// The compression algorithm used for the item's content.
@@ -353,6 +416,56 @@ pub struct ItemInfo {
/// Key-value pairs containing additional metadata about the item.
#[schema(example = json!({"mime_type": "text/plain", "mime_encoding": "utf-8", "line_count": "42"}))]
pub metadata: HashMap<String, String>,
/// Actual file size in bytes.
///
/// The filesystem-reported size of the item's data file. This may differ from
/// `compressed_size` if the file was written and the database hasn't been updated.
/// None if the file cannot be read (e.g., file not found, permission denied).
#[schema(example = 512)]
pub file_size: Option<i64>,
}
impl ItemInfo {
/// Enriches this `ItemInfo` with the actual filesystem-reported size.
///
/// Reads the size of the item's data file from disk and sets `file_size`.
/// If the file cannot be read, `file_size` is left as None.
///
/// # Arguments
///
/// * `data_dir` - The data directory path containing item files.
///
/// # Returns
///
/// A new `ItemInfo` with `file_size` populated from the filesystem.
pub fn with_file_size(mut self, data_dir: &std::path::Path) -> Self {
let item_path = data_dir.join(self.id.to_string());
self.file_size = std::fs::metadata(&item_path).map(|m| m.len() as i64).ok();
self
}
}
impl TryFrom<ItemWithMeta> for ItemInfo {
type Error = anyhow::Error;
fn try_from(item_with_meta: ItemWithMeta) -> Result<Self, Self::Error> {
let tags = item_with_meta.tag_names();
let metadata = item_with_meta.meta_as_map();
Ok(ItemInfo {
id: item_with_meta
.item
.id
.ok_or_else(|| anyhow::anyhow!("Item missing ID"))?,
ts: item_with_meta.item.ts.to_rfc3339(),
uncompressed_size: item_with_meta.item.uncompressed_size,
compressed_size: item_with_meta.item.compressed_size,
closed: item_with_meta.item.closed,
compression: item_with_meta.item.compression,
tags,
metadata,
file_size: None,
})
}
}
/// Item information including content and metadata, with binary detection.
@@ -419,14 +532,20 @@ pub struct TagsQuery {
/// ```rust
/// use keep::modes::server::common::ListItemsQuery;
/// let query = ListItemsQuery {
/// ids: None,
/// tags: Some("important".to_string()),
/// order: Some("newest".to_string()),
/// start: Some(0),
/// count: Some(10),
/// meta: None,
/// };
/// ```
#[derive(Debug, Deserialize)]
pub struct ListItemsQuery {
/// Optional comma-separated item IDs for filtering.
///
/// String containing numeric IDs to filter the item list.
pub ids: Option<String>,
/// Optional comma-separated tags for filtering.
///
/// String containing tags to filter the item list.
@@ -443,6 +562,11 @@ pub struct ListItemsQuery {
///
/// Unsigned integer limiting the number of items returned.
pub count: Option<u32>,
/// Optional metadata filter as JSON string.
///
/// JSON object where keys are metadata keys and values are either
/// `null` (filter by key existence) or a string (filter by exact value match).
pub meta: Option<String>,
}
/// Query parameters for item retrieval.
@@ -459,6 +583,7 @@ pub struct ListItemsQuery {
/// length: 1024,
/// stream: false,
/// as_meta: false,
/// decompress: true,
/// };
/// ```
#[derive(Debug, Deserialize, utoipa::ToSchema)]
@@ -488,6 +613,10 @@ pub struct ItemQuery {
/// Boolean flag to return content and metadata in a structured JSON format.
#[serde(default = "default_as_meta")]
pub as_meta: bool,
/// Whether the server should decompress the content (default: true).
/// Set to false when the client wants raw stored bytes for local decompression.
#[serde(default = "default_true")]
pub decompress: bool,
}
/// Query parameters for item content retrieval.
@@ -505,6 +634,7 @@ pub struct ItemQuery {
/// length: 1024,
/// stream: false,
/// as_meta: false,
/// decompress: true,
/// };
/// ```
#[derive(Debug, Deserialize, utoipa::ToSchema)]
@@ -538,6 +668,10 @@ pub struct ItemContentQuery {
/// Boolean flag to return content and metadata in a structured JSON format.
#[serde(default = "default_as_meta")]
pub as_meta: bool,
/// Whether the server should decompress the content (default: true).
/// Set to false when the client wants raw stored bytes for local decompression.
#[serde(default = "default_true")]
pub decompress: bool,
}
/// Default function for allow_binary parameter.
@@ -567,53 +701,127 @@ fn default_as_meta() -> bool {
false
}
/// Validates bearer authentication token.
///
/// This function checks if the provided authorization string is a valid Bearer token
/// matching the expected password or hash.
///
/// # Arguments
///
/// * `auth_str` - The authorization string from the header.
/// * `expected_password` - The expected plain text password.
/// * `expected_hash` - Optional expected password hash.
/// Default function for true boolean parameters.
///
/// # Returns
///
/// * `true` - If authentication succeeds.
/// * `false` - Otherwise.
/// `true` as the default value.
fn default_true() -> bool {
true
}
/// Query parameters for creating an item via POST.
///
/// # Errors
/// Query parameters for POST /api/item/ with streaming binary body.
#[derive(Debug, Deserialize)]
pub struct CreateItemQuery {
/// Optional comma-separated tags to associate with the item.
pub tags: Option<String>,
/// Optional metadata as JSON string.
pub metadata: Option<String>,
/// Whether the server should compress the content (default: true).
/// Set to false when the client has already compressed the content.
#[serde(default = "default_true")]
pub compress: bool,
/// Whether the server should run meta plugins (default: true).
/// Set to false when the client has already collected metadata.
#[serde(default = "default_true")]
pub meta: bool,
/// Compression type used by the client (e.g. "lz4", "gzip").
/// Only used when compress=false — tells the server what compression
/// the client applied so the correct type is recorded in the database.
pub compression_type: Option<String>,
/// Optional timestamp for the item (RFC 3339 format).
/// Used during import to preserve the original item's timestamp.
/// If not provided, the server uses the current time.
pub ts: Option<String>,
}
/// Query parameters for updating item metadata via POST.
///
/// None; returns false on failure.
fn check_bearer_auth(
auth_str: &str,
expected_password: &str,
expected_hash: &Option<String>,
/// Query parameters for POST /api/item/{item_id}/update.
/// Re-runs specified meta plugins on the stored content and/or
/// applies direct metadata key-value overrides.
#[derive(Debug, Deserialize)]
pub struct UpdateItemQuery {
/// Optional comma-separated list of plugin names to re-run.
pub plugins: Option<String>,
/// Optional metadata overrides as JSON string.
pub metadata: Option<String>,
/// Optional comma-separated tags to add.
pub tags: Option<String>,
/// Optional uncompressed size to set on the item.
pub uncompressed_size: Option<i64>,
}
/// Request body for creating a new item.
///
/// Contains the content to store and optional tags.
#[derive(Debug, Deserialize, Serialize, ToSchema)]
pub struct CreateItemRequest {
/// The content to store.
#[schema(example = "Hello, world!")]
pub content: String,
/// Optional tags to associate with the item.
#[schema(example = json!(["important", "work"]))]
pub tags: Option<Vec<String>>,
/// Optional metadata key-value pairs.
pub metadata: Option<std::collections::HashMap<String, String>>,
}
/// Checks authorization header for valid credentials.
///
/// This function inspects the HTTP Authorization header for valid Basic
/// authentication credentials against the provided username and password or hash.
/// Bearer tokens are not checked here — JWT validation is handled separately
/// in the middleware.
///
/// # Arguments
///
/// * `headers` - HTTP headers from the request.
/// * `username` - Optional expected username (defaults to "keep").
/// * `password` - Optional expected password.
/// * `password_hash` - Optional expected password hash.
///
/// # Returns
///
/// * `true` - If authorized (or no auth required).
/// * `false` - If unauthorized.
pub fn check_auth(
headers: &HeaderMap,
username: &Option<String>,
password: &Option<String>,
password_hash: &Option<String>,
) -> bool {
if !auth_str.starts_with("Bearer ") {
return false;
// If neither password nor hash is set, no authentication required
if password.is_none() && password_hash.is_none() {
return true;
}
let provided_password = &auth_str[7..];
let effective_username = username.as_deref().unwrap_or("keep");
// If we have a password hash, verify against it
if let Some(hash) = expected_hash {
return pwhash::unix::verify(provided_password, hash);
if let Some(auth_header) = headers.get("authorization")
&& let Ok(auth_str) = auth_header.to_str()
{
return check_basic_auth(
auth_str,
effective_username,
password.as_deref().unwrap_or(""),
password_hash,
);
}
// Otherwise, do direct comparison
provided_password == expected_password
false
}
/// Validates basic authentication credentials.
///
/// This function decodes and validates Basic Auth credentials from the authorization
/// header against the expected password or hash.
/// header against the expected username and password or hash.
///
/// # Arguments
///
/// * `auth_str` - The authorization string from the header.
/// * `expected_username` - The expected username.
/// * `expected_password` - The expected plain text password.
/// * `expected_hash` - Optional expected password hash.
///
@@ -627,6 +835,7 @@ fn check_bearer_auth(
/// Returns false on decode or validation failure.
fn check_basic_auth(
auth_str: &str,
expected_username: &str,
expected_password: &str,
expected_hash: &Option<String>,
) -> bool {
@@ -635,63 +844,33 @@ fn check_basic_auth(
}
let encoded = &auth_str[6..];
if let Ok(decoded_bytes) = base64::engine::general_purpose::STANDARD.decode(encoded) {
if let Ok(decoded_str) = String::from_utf8(decoded_bytes) {
if let Some(colon_pos) = decoded_str.find(':') {
if let Ok(decoded_bytes) = base64::engine::general_purpose::STANDARD.decode(encoded)
&& let Ok(decoded_str) = String::from_utf8(decoded_bytes)
&& let Some(colon_pos) = decoded_str.find(':')
{
let provided_username = &decoded_str[..colon_pos];
let provided_password = &decoded_str[colon_pos + 1..];
// Check username with constant-time comparison
if !bool::from(
provided_username
.as_bytes()
.ct_eq(expected_username.as_bytes()),
) {
return false;
}
// If we have a password hash, verify against it
if let Some(hash) = expected_hash {
return pwhash::unix::verify(provided_password, hash);
}
// Otherwise, do direct comparison
let expected_credentials = format!("keep:{}", expected_password);
return decoded_str == expected_credentials;
}
}
}
false
}
/// Checks authorization header for valid credentials.
///
/// This function inspects the HTTP Authorization header for valid Bearer or Basic
/// authentication credentials against the provided password or hash.
///
/// # Arguments
///
/// * `headers` - HTTP headers from the request.
/// * `password` - Optional expected password.
/// * `password_hash` - Optional expected password hash.
///
/// # Returns
///
/// * `true` - If authorized (or no auth required).
/// * `false` - If unauthorized.
///
/// # Examples
///
/// ```
/// if check_auth(&headers, &Some("pass".to_string()), &None) {
/// // Proceed
/// }
/// ```
pub fn check_auth(
headers: &HeaderMap,
password: &Option<String>,
password_hash: &Option<String>,
) -> bool {
// If neither password nor hash is set, no authentication required
if password.is_none() && password_hash.is_none() {
return true;
}
if let Some(auth_header) = headers.get("authorization") {
if let Ok(auth_str) = auth_header.to_str() {
return check_bearer_auth(auth_str, password.as_deref().unwrap_or(""), password_hash)
|| check_basic_auth(auth_str, password.as_deref().unwrap_or(""), password_hash);
}
// Otherwise, do constant-time comparison to prevent timing attacks
return bool::from(
provided_password
.as_bytes()
.ct_eq(expected_password.as_bytes()),
);
}
false
}
@@ -758,28 +937,31 @@ pub async fn logging_middleware(
/// Creates authentication middleware for the application.
///
/// This function returns a middleware that enforces authentication on protected routes
/// using Bearer token or Basic Auth, challenging unauthorized requests with appropriate
/// headers.
/// This function returns a middleware that enforces authentication on protected routes.
///
/// **JWT and Basic Auth are mutually exclusive.** When `jwt_secret` is set, the
/// middleware validates JWT (HS256) tokens and checks permission claims (read, write,
/// delete) based on the HTTP method. Requests without a valid Bearer token are
/// rejected with 401 — Basic Auth is **not** consulted as a fallback.
///
/// When `jwt_secret` is not set, Basic Auth password authentication is used instead.
///
/// # Arguments
///
/// * `username` - Optional username (defaults to "keep").
/// * `password` - Optional plain text password.
/// * `password_hash` - Optional hashed password.
/// * `jwt_secret` - Optional JWT secret for token-based authentication.
///
/// # Returns
///
/// A clonable async middleware function for Axum.
///
/// # Examples
///
/// ```
/// let auth_middleware = create_auth_middleware(Some("pass".to_string()), None);
/// router.layer(auth_middleware);
/// ```
#[allow(clippy::type_complexity)]
pub fn create_auth_middleware(
username: Option<String>,
password: Option<String>,
password_hash: Option<String>,
jwt_secret: Option<String>,
) -> impl Fn(
ConnectInfo<SocketAddr>,
Request,
@@ -789,14 +971,63 @@ pub fn create_auth_middleware(
+ Clone
+ Send {
move |ConnectInfo(addr): ConnectInfo<SocketAddr>, request: Request, next: Next| {
let username = username.clone();
let password = password.clone();
let password_hash = password_hash.clone();
let jwt_secret = jwt_secret.clone();
Box::pin(async move {
let headers = request.headers().clone();
let uri = request.uri().clone();
let method = request.method().clone();
if !check_auth(&headers, &password, &password_hash) {
warn!("Unauthorized request to {} from {}", uri, addr);
// CORS preflight requests pass through without authentication
if method == Method::OPTIONS {
return Ok(next.run(request).await);
}
// JWT authentication takes priority when secret is configured
if let Some(ref secret) = jwt_secret
&& let Some(auth_header) = headers.get("authorization")
&& let Ok(auth_str) = auth_header.to_str()
&& let Some(token) = auth_str.strip_prefix("Bearer ")
{
match super::auth::validate_jwt(token, secret) {
Ok(claims) => {
let required = super::auth::required_permission(&method);
if !super::auth::check_permission(&claims, required) {
warn!(
"Forbidden: {method} {uri} from {addr} \
(sub={}, missing permission: {required})",
claims.sub
);
let mut response = Response::new(axum::body::Body::from("Forbidden"));
*response.status_mut() = StatusCode::FORBIDDEN;
return Ok(response);
}
// JWT valid and authorized, proceed
let response = next.run(request).await;
return Ok(response);
}
Err(e) => {
warn!("JWT validation failed for {uri} from {addr}: {e}");
let mut response = Response::new(axum::body::Body::from("Unauthorized"));
*response.status_mut() = StatusCode::UNAUTHORIZED;
return Ok(response);
}
}
}
// JWT secret configured but no valid Bearer token provided
if jwt_secret.is_some() {
warn!("Missing JWT token for {uri} from {addr}");
let mut response = Response::new(axum::body::Body::from("Unauthorized"));
*response.status_mut() = StatusCode::UNAUTHORIZED;
return Ok(response);
}
// Fall back to Basic Auth password authentication
if !check_auth(&headers, &username, &password, &password_hash) {
warn!("Unauthorized request to {uri} from {addr}");
// Add WWW-Authenticate header to trigger basic auth in browsers
let mut response = Response::new(axum::body::Body::from("Unauthorized"));
*response.status_mut() = StatusCode::UNAUTHORIZED;

View File

@@ -1,83 +0,0 @@
pub mod server;
pub mod tools;
pub use server::KeepMcpServer;
/// Module for handling MCP (Model Context Protocol) requests in the server.
///
/// Provides handlers for JSON-RPC style requests to interact with Keep's storage
/// via the API.
use axum::{Json, extract::State, http::StatusCode, response::IntoResponse};
use serde::Deserialize;
use serde_json::Value;
use crate::modes::server::common::ApiResponse;
use crate::modes::server::common::AppState;
/// Request structure for MCP JSON-RPC calls.
///
/// # Fields
///
/// * `method` - The MCP method name (e.g., "save_item").
/// * `params` - Optional JSON parameters for the method.
#[derive(Deserialize)]
pub struct McpRequest {
pub method: String,
pub params: Option<Value>,
}
/// Handles an MCP request via the Axum framework.
///
/// Parses the JSON request, delegates to `KeepMcpServer`, and returns an API response.
/// Attempts to parse the result as JSON; falls back to string if invalid.
///
/// # Arguments
///
/// * `State(state)` - The application state.
/// * `Json(request)` - The deserialized MCP request.
///
/// # Returns
///
/// An `IntoResponse` with status code and JSON API response.
///
/// # Errors
///
/// Returns 400 Bad Request on handler errors.
pub async fn handle_mcp_request(
State(state): State<AppState>,
Json(request): Json<McpRequest>,
) -> impl IntoResponse {
let mcp_server = KeepMcpServer::new(state);
match mcp_server
.handle_request(&request.method, request.params)
.await
{
Ok(result) => match serde_json::from_str(&result) {
Ok(parsed_result) => {
let response = ApiResponse {
success: true,
data: Some(parsed_result),
error: None,
};
(StatusCode::OK, Json(response))
}
Err(_) => {
let response = ApiResponse {
success: true,
data: Some(serde_json::Value::String(result)),
error: None,
};
(StatusCode::OK, Json(response))
}
},
Err(e) => {
let response = ApiResponse {
success: false,
data: None,
error: Some(e.to_string()),
};
(StatusCode::BAD_REQUEST, Json(response))
}
}
}

View File

@@ -1,83 +0,0 @@
use log::debug;
use serde_json::Value;
use super::tools::{KeepTools, ToolError};
use crate::modes::server::common::AppState;
/// Server handler for MCP (Model Context Protocol) requests.
///
/// Routes requests to appropriate tools and handles responses. Clones AppState for tool usage.
///
/// # Fields
///
/// * `state` - The shared application state (DB, config, etc.).
#[derive(Clone)]
pub struct KeepMcpServer {
state: AppState,
}
/// Creates a new `KeepMcpServer` instance.
///
/// # Arguments
///
/// * `state` - The application state containing DB, config, and services.
///
/// # Returns
///
/// A new `KeepMcpServer` instance.
///
/// # Examples
///
/// ```
/// let server = KeepMcpServer::new(app_state);
/// ```
impl KeepMcpServer {
pub fn new(state: AppState) -> Self {
Self { state }
}
/// Handles an MCP request by routing to the appropriate tool.
///
/// Supports methods like "save_item", "get_item", "list_items". Logs the request and delegates to KeepTools.
///
/// # Arguments
///
/// * `method` - The MCP method name (string).
/// * `params` - Optional JSON parameters as serde_json::Value.
///
/// # Returns
///
/// `Ok(String)` with JSON-serialized response on success, or `Err(ToolError)` on failure.
///
/// # Errors
///
/// * ToolError::UnknownTool if method unsupported.
/// * Propagates tool-specific errors (e.g., invalid args, DB failures).
///
/// # Examples
///
/// ```
/// let result = server.handle_request("save_item", Some(params)).await?;
/// ```
pub async fn handle_request(
&self,
method: &str,
params: Option<Value>,
) -> Result<String, ToolError> {
debug!(
"MCP: Handling request '{}' with params: {:?}",
method, params
);
let tools = KeepTools::new(self.state.clone());
match method {
"save_item" => tools.save_item(params).await,
"get_item" => tools.get_item(params).await,
"get_latest_item" => tools.get_latest_item(params).await,
"list_items" => tools.list_items(params).await,
"search_items" => tools.search_items(params).await,
_ => Err(ToolError::UnknownTool(method.to_string())),
}
}
}

View File

@@ -1,344 +0,0 @@
use anyhow::{Result, anyhow};
use log::debug;
use serde_json::Value;
use std::collections::HashMap;
use crate::modes::server::common::AppState;
use crate::services::async_item_service::AsyncItemService;
use crate::services::error::CoreError;
#[derive(Debug, thiserror::Error)]
pub enum ToolError {
#[error("Unknown tool: {0}")]
UnknownTool(String),
#[error("Invalid arguments: {0}")]
InvalidArguments(String),
#[error("Database error: {0}")]
Database(#[from] rusqlite::Error),
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
#[error("JSON error: {0}")]
Json(#[from] serde_json::Error),
#[error("Parse error: {0}")]
Parse(#[from] strum::ParseError),
#[error("Other error: {0}")]
Other(#[from] anyhow::Error),
}
pub struct KeepTools {
state: AppState,
}
impl KeepTools {
pub fn new(state: AppState) -> Self {
Self { state }
}
pub async fn save_item(&self, args: Option<Value>) -> Result<String, ToolError> {
let args =
args.ok_or_else(|| ToolError::InvalidArguments("Missing arguments".to_string()))?;
let content = args
.get("content")
.and_then(|v| v.as_str())
.ok_or_else(|| ToolError::InvalidArguments("Missing 'content' field".to_string()))?;
let tags: Vec<String> = args
.get("tags")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
let metadata: HashMap<String, String> = args
.get("metadata")
.and_then(|v| v.as_object())
.map(|obj| {
obj.iter()
.filter_map(|(k, v)| v.as_str().map(|s| (k.clone(), s.to_string())))
.collect()
})
.unwrap_or_default();
debug!(
"MCP: Saving item with {} bytes, {} tags, {} metadata entries",
content.len(),
tags.len(),
metadata.len()
);
let service = AsyncItemService::new(
self.state.data_dir.clone(),
self.state.db.clone(),
self.state.item_service.clone(),
self.state.cmd.clone(),
self.state.settings.clone(),
);
let item_with_meta = service
.save_item_from_mcp(content.as_bytes().to_vec(), tags, metadata)
.await
.map_err(|e| ToolError::Other(anyhow::Error::from(e)))?;
let item_id = item_with_meta
.item
.id
.ok_or_else(|| anyhow!("Failed to get item ID"))?;
Ok(format!("Successfully saved item with ID: {}", item_id))
}
pub async fn get_item(&self, args: Option<Value>) -> Result<String, ToolError> {
let args =
args.ok_or_else(|| ToolError::InvalidArguments("Missing arguments".to_string()))?;
let item_id = args.get("id").and_then(|v| v.as_i64()).ok_or_else(|| {
ToolError::InvalidArguments("Missing or invalid 'id' field".to_string())
})?;
let service = AsyncItemService::new(
self.state.data_dir.clone(),
self.state.db.clone(),
self.state.item_service.clone(),
self.state.cmd.clone(),
self.state.settings.clone(),
);
let item_with_content = match service.get_item_content(item_id).await {
Ok(iwc) => iwc,
Err(CoreError::ItemNotFound(_)) => {
return Err(ToolError::InvalidArguments(format!(
"Item {} not found",
item_id
)));
}
Err(e) => return Err(ToolError::Other(anyhow::Error::from(e))),
};
let content = String::from_utf8_lossy(&item_with_content.content).to_string();
let tags: Vec<String> = item_with_content
.item_with_meta
.tags
.iter()
.map(|t| t.name.clone())
.collect();
let metadata = item_with_content.item_with_meta.meta_as_map();
let item = item_with_content.item_with_meta.item;
let response = serde_json::json!({
"id": item_id,
"content": content,
"timestamp": item.ts.to_rfc3339(),
"size": item.size,
"compression": item.compression,
"tags": tags,
"metadata": metadata,
});
Ok(serde_json::to_string_pretty(&response)?)
}
pub async fn get_latest_item(&self, args: Option<Value>) -> Result<String, ToolError> {
let tags: Vec<String> = args
.as_ref()
.and_then(|v| v.get("tags"))
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
let service = AsyncItemService::new(
self.state.data_dir.clone(),
self.state.db.clone(),
self.state.item_service.clone(),
self.state.cmd.clone(),
self.state.settings.clone(),
);
let item_with_meta = match service.find_item(vec![], tags, HashMap::new()).await {
Ok(iwm) => iwm,
Err(CoreError::ItemNotFoundGeneric) => {
return Err(ToolError::InvalidArguments("No items found".to_string()));
}
Err(e) => return Err(ToolError::Other(anyhow::Error::from(e))),
};
let item_id = item_with_meta
.item
.id
.ok_or_else(|| anyhow!("Item missing ID after find"))?;
let item_with_content = service
.get_item_content(item_id)
.await
.map_err(|e| ToolError::Other(anyhow::Error::from(e)))?;
let content = String::from_utf8_lossy(&item_with_content.content).to_string();
let tags: Vec<String> = item_with_content
.item_with_meta
.tags
.iter()
.map(|t| t.name.clone())
.collect();
let metadata = item_with_content.item_with_meta.meta_as_map();
let item = item_with_content.item_with_meta.item;
let response = serde_json::json!({
"id": item_id,
"content": content,
"timestamp": item.ts.to_rfc3339(),
"size": item.size,
"compression": item.compression,
"tags": tags,
"metadata": metadata,
});
Ok(serde_json::to_string_pretty(&response)?)
}
pub async fn list_items(&self, args: Option<Value>) -> Result<String, ToolError> {
let args_ref = args.as_ref();
let tags: Vec<String> = args_ref
.and_then(|v| v.get("tags"))
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
let limit = args_ref
.and_then(|v| v.get("limit"))
.and_then(|v| v.as_u64())
.unwrap_or(10) as usize;
let offset = args_ref
.and_then(|v| v.get("offset"))
.and_then(|v| v.as_u64())
.unwrap_or(0) as usize;
let service = AsyncItemService::new(
self.state.data_dir.clone(),
self.state.db.clone(),
self.state.item_service.clone(),
self.state.cmd.clone(),
self.state.settings.clone(),
);
let mut items_with_meta = service
.list_items(tags, HashMap::new())
.await
.map_err(|e| ToolError::Other(anyhow::Error::from(e)))?;
// Sort by timestamp (newest first) and apply pagination
items_with_meta.sort_by(|a, b| b.item.ts.cmp(&a.item.ts));
let items_with_meta: Vec<_> = items_with_meta
.into_iter()
.skip(offset)
.take(limit)
.collect();
let items_info: Vec<_> = items_with_meta
.into_iter()
.map(|item_with_meta| {
let item_tags: Vec<String> =
item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let item_meta = item_with_meta.meta_as_map();
let item = item_with_meta.item;
let item_id = item.id.unwrap_or(0);
serde_json::json!({
"id": item_id,
"timestamp": item.ts.to_rfc3339(),
"size": item.size,
"compression": item.compression,
"tags": item_tags,
"metadata": item_meta
})
})
.collect();
let response = serde_json::json!({
"items": items_info,
"count": items_info.len(),
"offset": offset,
"limit": limit
});
Ok(serde_json::to_string_pretty(&response)?)
}
pub async fn search_items(&self, args: Option<Value>) -> Result<String, ToolError> {
let tags: Vec<String> = args
.as_ref()
.and_then(|v| v.get("tags"))
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
let metadata: HashMap<String, String> = args
.as_ref()
.and_then(|v| v.get("metadata"))
.and_then(|v| v.as_object())
.map(|obj| {
obj.iter()
.filter_map(|(k, v)| v.as_str().map(|s| (k.clone(), s.to_string())))
.collect()
})
.unwrap_or_default();
let service = AsyncItemService::new(
self.state.data_dir.clone(),
self.state.db.clone(),
self.state.item_service.clone(),
self.state.cmd.clone(),
self.state.settings.clone(),
);
let mut items_with_meta = service
.list_items(tags.clone(), metadata.clone())
.await
.map_err(|e| ToolError::Other(anyhow::Error::from(e)))?;
// Sort by timestamp (newest first)
items_with_meta.sort_by(|a, b| b.item.ts.cmp(&a.item.ts));
let items_info: Vec<_> = items_with_meta
.into_iter()
.map(|item_with_meta| {
let item_tags: Vec<String> =
item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let item_meta = item_with_meta.meta_as_map();
let item = item_with_meta.item;
let item_id = item.id.unwrap_or(0);
serde_json::json!({
"id": item_id,
"timestamp": item.ts.to_rfc3339(),
"size": item.size,
"compression": item.compression,
"tags": item_tags,
"metadata": item_meta
})
})
.collect();
let response = serde_json::json!({
"items": items_info,
"count": items_info.len(),
"search_criteria": {
"tags": tags,
"metadata": metadata
}
});
Ok(serde_json::to_string_pretty(&response)?)
}
}

View File

@@ -1,7 +1,10 @@
use crate::config;
use crate::services::item_service::ItemService;
use anyhow::Result;
use axum::{Router, routing::post};
use axum::Router;
use axum::http::{HeaderValue, header};
use axum::middleware::Next;
use axum::response::Response;
use clap::Command;
use log::{debug, info};
use std::net::SocketAddr;
@@ -13,13 +16,28 @@ use tower_http::cors::CorsLayer;
use tower_http::trace::TraceLayer;
mod api;
pub mod auth;
pub mod common;
#[cfg(feature = "mcp")]
mod mcp;
mod pages;
pub use common::{AppState, create_auth_middleware, logging_middleware};
/// Adds security headers to all responses.
async fn security_headers(req: axum::extract::Request, next: Next) -> Response {
let mut response = next.run(req).await;
let headers = response.headers_mut();
headers.insert(
header::X_CONTENT_TYPE_OPTIONS,
HeaderValue::from_static("nosniff"),
);
headers.insert(header::X_FRAME_OPTIONS, HeaderValue::from_static("DENY"));
headers.insert(
header::REFERRER_POLICY,
HeaderValue::from_static("strict-origin-when-cross-origin"),
);
response
}
pub fn mode_server(
cmd: &mut Command,
settings: &config::Settings,
@@ -50,8 +68,13 @@ pub fn mode_server(
let server_config = common::ServerConfig {
address: server_address,
port: Some(server_port),
username: settings.server_username(),
password: settings.server_password(),
password_hash: settings.server_password_hash(),
jwt_secret: settings.server_jwt_secret(),
cert_file: settings.server_cert_file(),
key_file: settings.server_key_file(),
cors_origin: settings.server_cors_origin(),
};
// Create ItemService once
@@ -88,7 +111,7 @@ async fn run_server(
format!("{}:21080", config.address)
};
debug!("SERVER: Starting REST HTTP server on {}", bind_address);
debug!("SERVER: Starting REST HTTP server on {bind_address}");
// Use the existing database connection
let db_conn = Arc::new(Mutex::new(conn));
@@ -101,50 +124,93 @@ async fn run_server(
settings: Arc::new(settings.clone()),
};
#[cfg(feature = "mcp")]
let mcp_router = Router::new()
.route("/mcp", post(mcp::handle_mcp_request))
.with_state(state.clone());
let mut protected_router = Router::new()
let protected_router = Router::new()
.merge(api::add_routes(Router::new()))
.merge(pages::add_routes(Router::new()));
.merge(pages::add_routes(Router::new()))
.layer(axum::middleware::from_fn(create_auth_middleware(
config.username.clone(),
config.password.clone(),
config.password_hash.clone(),
config.jwt_secret.clone(),
)));
#[cfg(feature = "mcp")]
{
protected_router = protected_router.merge(mcp_router);
}
let protected_router = protected_router.layer(axum::middleware::from_fn(
create_auth_middleware(config.password.clone(), config.password_hash.clone()),
));
// Build CORS layer - restricted by default, configurable via cors_origin setting
let cors_origin = config.cors_origin.as_deref().unwrap_or("http://localhost");
let cors_layer = if cors_origin == "*" {
CorsLayer::permissive()
} else {
CorsLayer::new()
.allow_origin(
cors_origin
.parse::<axum::http::HeaderValue>()
.unwrap_or_else(|_| {
log::warn!(
"Invalid CORS origin '{cors_origin}', defaulting to http://localhost"
);
"http://localhost".parse().unwrap()
}),
)
.allow_methods([
axum::http::Method::GET,
axum::http::Method::POST,
axum::http::Method::PUT,
axum::http::Method::DELETE,
])
.allow_headers([header::CONTENT_TYPE, header::AUTHORIZATION, header::ACCEPT])
};
// Create the app with documentation routes open and others protected
let app = Router::new()
// Add documentation routes without authentication
.merge(api::add_docs_routes(Router::new()))
// Add API, pages, and MCP routes with authentication
// Add API and pages routes with authentication
.merge(protected_router)
// Apply state to all routes
.with_state(state)
// Add other middleware layers to all routes
.layer(axum::middleware::from_fn(security_headers))
.layer(axum::middleware::from_fn(logging_middleware))
.layer(
ServiceBuilder::new()
.layer(TraceLayer::new_for_http())
.layer(CorsLayer::permissive()),
.layer(cors_layer),
);
let addr: SocketAddr = bind_address.parse()?;
info!("SERVER: HTTP server listening on {}", addr);
// Warn if authentication is enabled without TLS
if (config.password.is_some() || config.password_hash.is_some() || config.jwt_secret.is_some())
&& (config.cert_file.is_none() || config.key_file.is_none())
{
log::warn!(
"SECURITY: Authentication enabled but TLS is not configured. Credentials will be transmitted in plain text!"
);
}
// Build the app into a service
let service = app.into_make_service_with_connect_info::<SocketAddr>();
// Use TLS if both cert and key files are provided
if let (Some(cert_file), Some(key_file)) = (&config.cert_file, &config.key_file) {
info!("SERVER: HTTPS server listening on {addr}");
use axum_server::tls_rustls::RustlsConfig;
let tls_config = RustlsConfig::from_pem_file(cert_file, key_file)
.await
.map_err(|e| anyhow::anyhow!("Failed to load TLS config: {e}"))?;
axum_server::bind_rustls(addr, tls_config)
.serve(service)
.await?;
return Ok(());
}
info!("SERVER: HTTP server listening on {addr}");
let listener = tokio::net::TcpListener::bind(addr).await?;
axum::serve(
listener,
app.into_make_service_with_connect_info::<SocketAddr>(),
)
.await?;
axum::serve(listener, service).await?;
Ok(())
}

Some files were not shown because too many files have changed in this diff Show More