Compare commits

...

50 Commits

Author SHA1 Message Date
8379ae2136 refactor: rename plugin features with type prefix for consistency
- Plugin features now use type_ prefix (meta_magic, filter_grep, etc.)
- Added meta_all_musl and filter_all_musl for MUSL-compatible builds
- grep filter plugin made optional via filter_grep feature flag
- Removed regex crate from grep-related code, uses strip_prefix instead
- Updated CHANGELOG.md with breaking change documentation
2026-03-21 17:36:29 -03:00
12de215527 feat: feature-gate CLI args by server/client features
- CLI now shows only relevant options: --server and --server-* args
  hidden when built without 'server' feature; --client-* args hidden
  without 'client' feature. Run --help only displays applicable options.
- Removed verbose 'conflicts_with_all' from all mode args — clap's
  implicit group("mode") already enforces mutual exclusivity.
- 'server' feature now includes TLS/HTTPS by default (axum-server);
  'tls' feature removed. rustls already available via client/ureq.
- Gated KeepModes::Server, server mode detection, and server-password
  validation in main.rs.
- Gated server arg reads in config.rs.
- Removed redundant #[cfg(feature = "tls")] guards from server/mod.rs.
- Gated resolve_item_id/resolve_item_ids helpers in common.rs.
- All 4 feature combinations (server+client, server-only, client-only,
  neither) compile and pass tests.
2026-03-21 16:26:27 -03:00
e2cb36d2a8 feat(server): add file_size to API ItemInfo response 2026-03-21 14:03:58 -03:00
0004324301 perf: pre-allocate status info collections with known capacities 2026-03-21 13:54:37 -03:00
b3edfe7de6 chore: code review cleanup — fixes, deps, docs
Fixed:
- CLI help typo: "metatdata" -> "metadata"
- Filter buffer OOM: check size before loading into memory

Changed:
- #[inline] on HTML escape helpers for hot path performance
- Replaced once_cell and lazy_static with std::sync::LazyLock
- Removed unused once_cell and lazy_static crate dependencies

Refactored:
- Added module-level doc to services/ module

Documentation:
- README.md: zstd is native not external, "none" -> "raw"
- DESIGN.md: current schema and meta plugins section
- CHANGELOG.md: Unreleased section populated
2026-03-21 11:44:37 -03:00
ab2fb07505 docs: add changelog update instructions to AGENTS.md 2026-03-21 10:56:43 -03:00
547f0b5d11 docs: add CHANGELOG.md following Keep a Changelog format 2026-03-21 10:55:16 -03:00
30d7836bcf refactor: deduplicate ItemInfo, improve error handling, fix pre-existing bugs
- Move ItemInfo to services/types.rs for sharing between client and server
- Replace .expect() in compression_service with proper error handling
- Add CoreError::PayloadTooLarge variant for semantic error handling
- Export CoreError from lib.rs for library users
- Unify get_item_meta_name/value to take &str instead of String
- Extract item_path() helper in ItemService to reduce duplication
- Add warning logs for silent errors in list.rs
- Fix pre-existing borrow errors: tx moved in export handler,
  item_with_meta partial move in TryFrom implementation
- Fix unused data_dir variables in server code
2026-03-21 10:43:26 -03:00
2cfee5075e fix: panic guards, dedup, and unsafe documentation
- diff.rs: graceful error instead of expect() on item ID in spawned thread
- common.rs: lazy_static regex, avoid unwrap on regex captures
- db.rs: ok_or_else guard on item.id in delete_item
- list/get/info/export/client/list: use settings.meta_filter() helper
- item_service.rs: expect() on meta lock instead of silent swallow
- filter_plugin/mod.rs: extract parse_encoding_option() helper
- main.rs: document unsafe libc::umask block with safety rationale
2026-03-20 17:17:58 -03:00
52e9787edb refactor: deduplicate filter plugins, extract helpers across codebase
Bug fixes:
- client: add error field to ApiResponse to avoid swallowing server errors
- args/config: fix list_format default mismatch (5 vs 7 columns)
- client: url-encode size param in set_item_size

Dedup - filter plugins:
- Extract count_option() and pattern_option() helpers, replace 7 identical options()
- Add #[derive(Clone)] to all filter structs; remove verbose clone_box() impls
- Simplify FilterChain clone() and impl Clone for Box<dyn FilterPlugin>
- Add filter_clone_box! macro for future use
- Fix doctest example missing clone_box

Dedup - server API:
- Extract spawn_body_reader() with LimitBehavior enum for body streaming
- Extract check_binary_content() helper
- Extract stream_with_offset_and_length() helper
- Extract generate_status() helper in status.rs
- Extract append_query_params() helper in client.rs

Dedup - other:
- Extract yaml_value_to_string() in meta_plugin/mod.rs
- Extract item_from_row() in db.rs
- Delete unused DisplayListItem struct

Misc:
- Remove duplicate doc comment in compression_service.rs
2026-03-20 15:54:33 -03:00
00be72f3d0 refactor: rename size to uncompressed_size, add compressed_size and closed columns
Schema changes:
- Rename items.size to items.uncompressed_size for clarity
- Add compressed_size (INTEGER NULL) - tracks compressed file size on disk
- Add closed (BOOLEAN NOT NULL DEFAULT 1) - tracks whether item is fully written
- Existing items default to closed=true via migration

Lifecycle:
- Items created with closed=false, set to true on successful save/import
- Compressed size captured via fs::metadata() after compression writer closes
- Truncated uploads (413) get compressed_size set, closed=true, uncompressed_size=None
- Update command now backfills both uncompressed_size and compressed_size

Also includes bug fixes and dedup from prior review:
- Fix stream_raw_content_response using uncompressed_size for raw byte Content-Length
- ApiResponse::ok()/empty() constructors, TryFrom<ItemWithMeta> for ItemInfo
- tag_names() method on ItemWithMeta, meta_filter() on Settings
- Fix .unwrap() panics in compression engine Read/Write impls
- Fix TOCTOU race in stream_raw_content_response (now uses compressed_size)
- Fix swallowed write errors in meta plugins (digest, magic_file, exec)
- Fix term::stderr().unwrap() panic in item_service
- Deduplicate ItemService::new() calls across 20 API handlers
- ImportMeta supports #[serde(alias = "size")] for backward compat

All 75 tests, 67 doc tests pass. Clippy clean.
2026-03-18 10:58:26 -03:00
49793a0f94 feat: add streaming tar export/import and rename "none" to "raw"
- Add streaming tar-based export (--export produces .keep.tar)
- Add streaming tar import (--import reads .keep.tar archives)
- Add server endpoints GET /api/export and POST /api/import
- Rename CompressionType::None to CompressionType::Raw with "none" as alias
- Add DB migration to update existing "none" compression values to "raw"
- Fix export endpoint to propagate errors to client instead of swallowing
- Fix import endpoint to return 413 on max_body_size instead of truncating

Export streams items as tar archives without loading entire files into memory.
Import extracts items with new IDs, preserving original order. Both work
locally and via client/server mode.

Co-Authored-By: opencode <noreply@opencode.ai>
2026-03-17 21:24:39 -03:00
074ba64805 feat: allow --list to accept item IDs for filtering
- Local and client/server modes now support ID-based filtering
- keep -l 1 2 3 lists specific items by ID
- keep -l --ids-only 1 2 3 outputs just those IDs
- Server API adds optional 'ids' query parameter to GET /api/item/
- KeepClient.list_items gains ids parameter
2026-03-17 17:56:35 -03:00
02f0c8d453 fix: use XDG config directory for default config file location
Changes from manual HOME/.config/keep/config.yml construction to
dirs::config_dir(), which respects XDG_CONFIG_HOME.
2026-03-17 16:07:13 -03:00
c29e37c03e fix: use XDG data directory as default storage location
Changes default from ~/.keep to /keep
(e.g. ~/.local/share/keep on Linux). Uses dirs::data_dir() which
respects XDG_DATA_HOME environment variable.
2026-03-17 15:37:25 -03:00
28c3deaeca fix: expand tilde (~) in config file paths to home directory
Applies to dir, import_data_file, and all server certificate/secret file
paths. Uses existing dirs crate for home directory resolution.
2026-03-17 15:32:30 -03:00
cb56a398fa feat: add --ids-only flag to --list mode for scripting
Outputs one ID per line with no header. Errors if used with any mode
other than --list. Works with both local and client (remote) list.
2026-03-17 15:04:10 -03:00
2452da52ef chore: add license, repository, keywords, and rust-version to Cargo.toml 2026-03-17 14:50:45 -03:00
6347427536 chore: remove bin/keep binary from tracking, add bin/ to gitignore 2026-03-17 14:47:57 -03:00
a8759c4b83 feat: add infer and tree_magic_mini meta plugins, make zstd internal by default
- Add infer crate as meta plugin for MIME type detection
- Add tree_magic_mini crate as alternative meta plugin for MIME type detection
- Add zstd, infer, tree_magic_mini to default features
- Fix static build script to use musl target instead of glibc+crt-static
- Remove hardcoded shell list from --generate-completion help text
- Fix update() in both new plugins to emit MIME metadata when buffer fills
2026-03-17 14:46:51 -03:00
a90c19efc1 feat: add native zstd compression plugin and deduplicate shared compression/meta utilities
- Add zstd crate (v0.13) with native Rust compression engine (level 3)
- Gate behind 'zstd' feature flag, fall back to program-based when disabled
- Extract CompressionService::decompressing_reader/compressing_writer with zstd support
- Extract MetaService::with_collector() to eliminate Arc<Mutex<Vec>> boilerplate
- Extract read_with_bounds() helper for skip+read pattern
- Add input validation for mutually exclusive --id and --tags flags
- Add zstd round-trip tests
2026-03-16 20:03:30 -03:00
35ee71c3cf feat: add export/import modes, unify service layer, fix binary detection
Export/import:
- Add --export and --import modes for both local and client paths
- Use strfmt crate for --export-filename-format templates ({id}, {tags}, {ts}, {compression})
- Import preserves original timestamps via server ?ts= param
- --import-data-file for file-based import; stdin fallback streams with PIPESIZE buffers

Service unification:
- Merge SyncDataService unique methods into ItemService (delete_item now returns Result<Item>)
- Delete AsyncDataService, AsyncItemService, DataService trait (dead code / async-blocking anti-pattern)
- All server handlers use spawn_blocking + ItemService directly
- Extract shared types (ExportMeta, ImportMeta) and helpers (resolve_item_id(s), check_binary_tty)

Binary detection fix:
- Replace broken metadata.get("map") + is_binary(&[]) with actual content sampling
- Both as_meta and allow_binary paths read PIPESIZE sample before deciding
- Never load entire item into memory for binary check

Other fixes:
- Fix lock consistency: all handlers use blocking_lock() in spawn_blocking (no mixed lock().await)
- Use ISO 8601 format for {ts} in export filenames
- Fix resolve_item_ids returning only 1 item for tag lookups
- Fix client get.rs triple-buffering and export.rs whole-file buffering
- Add KeepClient::get_item_content_stream() for streaming reads
- Pass all clippy --features server lints (Path vs PathBuf, &mut conn, etc.)
2026-03-16 08:43:26 -03:00
0a3d61a875 fix: client save with --compression none stored lz4 instead of none
- server_compress was true when compression_type=None, telling server to
  recompress with its default (lz4) instead of storing raw
- compression_type query param was only sent when !server_compress,
  so 'none' was never sent to server
- Fix: server_compress always false in client mode (client handles all
  compression), compression_type always sent to server

Tested: save/get/list/info/filters/delete for lz4, none, gzip on both
local and client/server modes. All operations produce matching results.
2026-03-15 12:46:29 -03:00
eca17b36ee fix: client save logs item ID early, stores compression via proper field and size via update endpoint
- Client save now logs 'New item: {id}' immediately after server response
- Compression type sent as query param, stored in DB compression field (not _client_compression metadata)
- Client set_item_size() sends uncompressed size via POST /api/item/{id}/update?size=N
- Server raw content GET uses actual file size for Content-Length (not uncompressed item.size)
- Removed _client_compression metadata hack from client save and get
- Fixed server handle_update_item to support size-only updates
- Fixed clippy: collapsible_if, too_many_arguments, unnecessary mut refs
- Fixed ListItemsQuery doctest missing meta field
2026-03-15 10:14:55 -03:00
5bad7ac7a6 refactor: decouple meta plugins from DB via SaveMetaFn callback, extract shared utilities
- Add SaveMetaFn callback pattern: meta plugins receive a closure instead of
  &Connection, enabling the same plugin code to work in local, client, and
  server contexts (collect-to-Vec, collect-to-HashMap, or direct DB write)
- Client save now runs meta plugins locally during streaming (smart client
  sets meta=false, server skips its own plugins)
- Add POST /api/item/{id}/update endpoint for re-running plugins on stored
  content without downloading compressed data
- Add client update mode (--update with --meta-plugin flags)
- Extract shared utilities: stream_copy, print_serialized, build_path_table,
  ensure_default_tag to reduce duplication across modes
- Add upsert_tag for idempotent tag addition (INSERT OR IGNORE)
- Add warn logging on save_meta lock failure in BaseMetaPlugin and MetaService
2026-03-14 22:36:59 -03:00
fdc5f1d744 fix: client --list uses list_format from config like local mode
Move apply_color/apply_table_attribute to common.rs for sharing.
Add render_list_table_with_format() that takes ColumnConfig slice
and pre-computed row values. Client list now renders columns based
on settings.list_format, showing empty for columns where server
data is unavailable (e.g. text_line_count, token_count).
2026-03-14 20:01:58 -03:00
f5bae46620 fix: all tables respect table_config from settings
Extract shared render_item_info_table() and render_list_table() in
modes/common.rs. Update client/info, client/list, client/status,
info, status, and status_plugins to use create_table_with_config
with settings.table_config instead of hardcoded presets.

Previously only local --list used table_config; all other tables
(client modes, status, status-plugins) ignored it.
2026-03-14 19:49:31 -03:00
0bc8d9c909 fix: surface server error in get_status and trim table output
- Include error field in get_status() ApiResponse so server error
  messages are surfaced instead of generic 'No status data returned'
- Use trim_lines_end() on table output to match local mode formatting
2026-03-14 19:32:39 -03:00
1a942b4d23 fix: format client --status output as tables instead of raw JSON
Change client get_status() to return StatusInfo struct instead of
serde_json::Value, then render paths, meta plugins, and compression
tables matching the local mode's output style.
2026-03-14 19:25:53 -03:00
886ac98b21 fix: URL-encode query params in client and pass --meta to server on save
- URL-encode all query parameter keys and values in get_json_with_query
  and post_stream. Previously raw JSON like {"project":"alpha"} was
  sent unencoded, causing 'invalid uri character' errors.
- Pass settings.meta (key=value pairs) from client save to server as
  metadata. Previously always passed empty HashMap, so --meta was
  silently ignored in client save mode.
2026-03-14 19:16:39 -03:00
0658d8378f fix: group all server options under Server Options help heading
The --server-password, --server-password-hash, --server-username,
--server-jwt-secret, --server-jwt-secret-file, and --server-max-body-size
options were appearing in the generic Options section instead of the
Server Options section.
2026-03-14 18:56:32 -03:00
ffe71440d9 fix: use explicit snake_case serialization for CompressionType
Per project convention, enum string representations should use
snake_case. Use explicit strum serialize attributes instead of
serialize_all to avoid incorrect splitting of acronyms like
GZip → g_zip and ZStd → z_std.
2026-03-14 18:26:58 -03:00
8acbd34150 fix: add --meta filtering support to client/server list mode
Plumb metadata filter from client CLI through the HTTP API to the
server's data_service.list_items(). The server accepts a JSON-encoded
meta query parameter where null values mean 'key exists' and string
values mean 'exact match'.

Also fix LZ4 compression round-trip for client mode:
- Explicit flush FrameEncoder before drop to avoid sending only the
  frame header when compress=false
- Send _client_compression metadata so client knows actual compression
  on retrieval (server records compression=None when compress=false)
- Use FrameDecoder (frame format) instead of decompress_size_prepended
  (size-prepended format) to match server storage format
2026-03-14 18:22:07 -03:00
f2d93a2812 fix: skip_lines/skip_bytes filters producing empty output on large files
FilteringReader::read() returned Ok(0) (EOF) when a filter consumed a
chunk without producing output. Filters like skip_lines need to see
multiple chunks before outputting anything — returning 0 prematurely
truncated the stream. Loop until the filter produces output or the
underlying reader is truly exhausted.
2026-03-14 16:20:30 -03:00
0af74000d2 fix: eliminate unsafe code via nix, command-fds, and thread-local cookie
Replace 4 unsafe sites with safe wrappers:

- libc::pipe2 → nix::unistd::pipe2 (safe OwnedFd return)
- File::from_raw_fd → File::from(OwnedFd) (safe ownership transfer)
- unsafe impl Send for SendCookie → thread_local! lazy Cookie
  (each thread gets its own independent Cookie, no Send needed)
- pre_exec + libc::fcntl → command-fds crate fd_mappings()
  (handles CLOEXEC clearing safely, also fixes potential fd leak
  on spawn failure via OwnedFd RAII)

Only libc::umask remains as a single unavoidable unsafe site
(no safe Rust wrapper exists for the umask syscall).

Also updates AGENTS.md to remove stale SendCookie exception.
2026-03-14 16:01:54 -03:00
9a1e23e85f fix: use tempdir for db doctests instead of project root
All 27 doctests in db.rs wrote keep.db to the project root via
PathBuf::from("keep.db"). Now use tempfile::tempdir() so the
database is created in a temp directory and cleaned up automatically.
2026-03-14 15:10:47 -03:00
b3ca673b52 feat: add --update mode, --meta/--meta-plugin flags, streaming diff
- Add --update mode to modify tags and metadata for existing items by ID
- Add --meta key=value flag to set metadata during save/update
- Add --meta key (bare) to delete metadata keys or filter by existence
- Add --meta-plugin/-M name:{json} flag for plugin options via CLI
- Env meta plugin now uses options from --meta-plugin instead of only env vars
- Stream decompressed content to diff via /dev/fd pipes (no temp files)
- Wire --list-format CLI arg to settings (was parsed but ignored)
- Allow --info to accept tags (was restricted to numeric IDs only)
- Change DB meta filtering to HashMap<String, Option<String>> for exact match + key existence
- Fix fcntl error checking in diff pre_exec
- Fix README inaccuracies (delete by tag, nonexistent --digest flag, meta plugin key names)
2026-03-14 15:02:16 -03:00
4b51825917 docs: document default mode shortcuts for save and get
- Quick Start: show bare keep <tag> (save) and keep <#> (get) shortcuts
- Save Mode: note that --save is optional when piping content
- Get Mode: clarify that only numeric IDs default to Get mode;
  fix incorrect keep <tag> example that would actually save
2026-03-14 11:48:37 -03:00
2ffa2a977a feat: add shell profiles for zsh, sh, csh/tcsh
- profile.bash: simplified preexec_init (early return), extracted
  ___keep_complete helper for @/@@ completion wrappers
- profile.zsh: add-zsh-hook preexec, wrapper function, @/@@ aliases,
  completions via compdef
- profile.sh: POSIX-compatible for sh/dash/ksh. Wrapper function,
  @/@@ aliases. No preexec or completions.
- profile.csh: alias-based keep wrapper, @/@@ aliases. No preexec
  or completions.
- modulefile: adds KEEP_SH_PROFILE, KEEP_ZSH_PROFILE, KEEP_CSH_PROFILE
- README: updated Shell Integration table and Shell Completion section
2026-03-14 11:36:29 -03:00
1a8ed56b68 feat: add --generate-completion for shell tab completion
- Add clap_complete dependency for bash/zsh/fish/elvish/powershell
- Add --generate-completion <shell> flag that prints completion script to stdout
- profile.bash sources completions via command keep --generate-completion bash
- @ and @@ aliases get completions via wrapper functions that delegate to _keep
- README updated with Shell Completion section
2026-03-14 11:02:38 -03:00
158bf50864 docs: add environment modulefile instructions to README 2026-03-14 10:36:57 -03:00
17be6abaab refactor: streaming, security hardening, and MCP removal
Major overhaul of server architecture and security posture:

- Streaming: Unified all I/O through PIPESIZE (8192-byte) buffers.
  POST bodies stream via MpscReader through the save pipeline. GET
  content streams from disk via decompression to client. Removed
  save_item_with_reader, get_item_content_info, ChannelReader.
  413 responses keep partial items (nonfatal by design).

- Security: XSS protection in all HTML pages via html_escape crate.
  Security headers middleware (nosniff, frame deny, referrer policy).
  CORS tightened to explicit headers. Input validation for tags
  (256 chars), metadata (128/4096), pagination (10k cap). Config
  file reads use from_utf8_lossy. Generic error messages in HTML.
  Diff endpoint has 10 MB per-item cap. max_body_size config option.

- Panics eliminated: Path unwraps → proper error propagation.
  Mutex unwraps → map_err (registries) / expect with message (local).

- MCP removed: Deleted all MCP code, rmcp dependency, mcp feature.

- Docs: Updated README, DESIGN, AGENTS to reflect all changes.
2026-03-14 00:03:42 -03:00
560ba6e20c fix: count_bounded error counting, clippy if-let, auth test dedup, doc tests
- count_bounded: break on iterator error instead of counting errors as tokens
- collapse nested if-let chains with let-chains in auth middleware
- document JWT/Basic Auth as mutually exclusive
- TailTokensFilter::clone uses empty buffer (always pre-filter)
- fix 9 broken doc examples in server/common.rs
- remove 7 duplicate auth tests from auth.rs (covered by auth_tests.rs)
2026-03-13 22:04:38 -03:00
a07bb6b350 feat: plugin-declared parallel execution, switch to env_logger, update deps
Parallel execution (opt-in via MetaPlugin::parallel_safe):
- Add Send bound to MetaPlugin, parallel_safe() method (default false)
- Override to true in digest, tokens, exec, magic_file plugins
- MetaService: std::thread::scope for initialize_plugins and process_chunk
- Extract plugins via NullMetaPlugin sentinel + std::mem::replace (no unsafe)
- Panic tracking: join errors logged, NullMetaPlugin restored and finalized
- MetaPluginExec: Box<dyn Write> -> Box<dyn Write + Send>
- SendCookie wrapper for libmagic Cookie with unsafe impl Send

Logging (stderrlog -> env_logger):
- Custom format: [SSSSSS.mmm] LEVEL [module:] message (time-since-start ms)
- Default level: Warn (matches previous behavior)
- -v: Debug, -vv+: Trace, -q: off
- -vv+ shows module path

Maintenance:
- Bump deps: thiserror 2.0, config 0.15, dns-lookup 3.0, lz4_flex 0.12,
  ringbuf 0.4, rand 0.9, lazy_static 1.5, env_logger 0.11
- Update Cargo.lock (186 transitive packages)
- Clippy fixes: is_multiple_of, to_string_in_format_args, collapsible_if
- Fix double-counting bug in TokensMetaPlugin::update
- Fix schema description using plugin.description()

Co-Authored-By: opencode <noreply@opencode.ai>
2026-03-13 21:49:51 -03:00
e7d8a83369 feat: add plugin schema system, tokenizer cache, and config validation
- Add plugin schema types and runtime discovery for meta/filter plugins
- Rewrite --generate-config to use schema system instead of hardcoded types
- Add Settings::validate_config() for startup validation
- Cache tokenizer instances via static Lazy to avoid repeated BPE loading
- Add split_by_token_iter() and count_bounded() to Tokenizer
- Fix double-counting bug in TokensMetaPlugin when buffer < max_buffer_size
- Eliminate unnecessary allocations in token count methods
- Refactor token filters: remove Option<Tokenizer>, use iterator API
- Fix TailTokensFilter correctness: unbounded buffer instead of ring buffer
- Add encoding option to all token filters
- Add description() to MetaPlugin and FilterPlugin traits
- Fix unused_mut warning in compression engine (feature-gated code)

Co-Authored-By: code-review-bot <noreply@anthropic.com>
2026-03-13 20:23:17 -03:00
914190e119 feat: add LLM token counting meta plugin and token filters
Add tiktoken-based token counting via new 'tokens' feature flag.

New components:
- Shared tokenizer module wrapping tiktoken CoreBPE (cl100k_base, o200k_base)
- TokensMetaPlugin: streaming token counter, tokenizes each chunk independently
- head_tokens(N): stream first N tokens, split at exact boundary when mid-chunk
- skip_tokens(N): skip first N tokens, stream the rest
- tail_tokens(N): bounded ring buffer (~16KB), outputs last N tokens at finalize

All filters are fully streaming — no full-stream buffering.
Meta plugin accuracy: exact for normal text, ±1-2 tokens if long whitespace
sequence spans a chunk boundary.

Also: add 'client' and 'tokens' to default features, add curl to Dockerfile builder stage.
2026-03-13 16:48:31 -03:00
e672ec751e feat: add JWT auth, configurable username, switch password auth to Basic
Add server-side JWT authentication with permission-based access control
(read/write/delete claims). Password authentication now uses HTTP Basic
auth only (replacing Bearer). Add configurable username for both server
and client (--server-username/--client-username, defaults to "keep").

JWT secret supports file-based loading via --server-jwt-secret-file for
Docker secrets. OPTIONS preflight requests bypass auth. HEAD mapped to
read permission.

Co-Authored-By: opencode <noreply@opencode.ai>
2026-03-13 13:56:35 -03:00
af1e0ca570 feat: expand Docker build to all features, add docker-compose.yml
- Build with server, mcp, swagger, client, tls features (all except magic)
- Add KEEP_* environment variable documentation and defaults
- Copy CA certificates for HTTPS client support in scratch image
- Add docker-compose.yml with keep-data and keep-config volumes
2026-03-13 10:08:28 -03:00
d5d58bc52c feat: add lz4 command fallback, remove unused magic.rs
- Add program-based lz4 command fallback when lz4 feature is disabled
- Feature-gate lz4.rs and lz4 tests to compile without lz4_flex
- Delete legacy magic.rs (unused, no feature gating, superseded by magic_file.rs)
2026-03-13 08:51:10 -03:00
b166477202 fix: harden security, eliminate panics, remove dead code, add Dockerfile
Security:
- Use constant-time password comparison (subtle crate) to prevent timing attacks
- Replace permissive CORS with configurable origin-restricted CORS
- Add TLS warning when password auth is used without HTTPS

Bug fixes:
- Convert MetaPlugin panics to anyhow::Result (get_meta_plugin, outputs_mut, options_mut)
- Replace item.id.unwrap() with proper error handling across 15 call sites
- Fix panic on unknown column type in list mode
- Fix conflicting PIPESIZE constant (was 8192 vs 65536, now unified to 8192)
- Add 256MB filter chain buffer limit to prevent OOM
- Gracefully skip unregistered plugins instead of panicking

Dead code removal:
- Delete unused filter parser files (filter_parser.rs, filter.pest, parser/ module)
- ~260 lines of dead PEG parser code removed

Code consolidation:
- Add is_content_binary_from_metadata() helper (was duplicated in 4 places)
- Simplify save_item_raw() to delegate to save_item_raw_streaming() (~90 lines removed)

Incomplete features:
- Populate filter_plugins in status output from global registry
- Add FallbackMagicFileMetaPlugin (was referenced but never implemented)
- Document init_plugins() as intentional no-op

Infrastructure:
- Add Dockerfile (static musl binary on scratch, 4.8MB)
- Add .dockerignore
- Add cors_origin to ServerConfig and config.rs
2026-03-13 07:57:36 -03:00
126 changed files with 11353 additions and 6191 deletions

5
.dockerignore Normal file
View File

@@ -0,0 +1,5 @@
target/
.git/
*.db
keep.db
bin/

1
.gitignore vendored
View File

@@ -2,3 +2,4 @@
.aider*
.crush
keep.db
bin/

View File

@@ -30,10 +30,36 @@ TERM=dumb cargo build --features server # With server feature
- Meta plugins extend `BaseMetaPlugin` for boilerplate reduction
- Enum string representations: `#[strum(serialize_all = "snake_case")]`
- Lint rules: `deny(clippy::all)`, `deny(unsafe_code)` (except `libc::umask` in main.rs)
- Feature flags: `default = ["magic", "lz4", "gzip"]`; optional: `server`, `mcp`, `swagger`
- Feature flags: `default = ["magic", "lz4", "gzip"]`; optional: `server`, `swagger`
## Testing
- Tests in `src/tests/` mirroring `src/` structure; shared helpers in `src/tests/common/test_helpers.rs`
- Key helpers: `create_temp_dir()`, `create_temp_db()`, `test_compression_engine()`
- Test naming: `test_<feature>_<scenario>`
## Streaming Constraint
**At no point should the whole file be in memory at once.** All I/O must use fixed-size buffers:
- `PIPESIZE` = 8192 bytes (`src/common/mod.rs:10`)
- Server POST body streams through `save_item_raw_streaming` via `MpscReader`
- Server GET content streams via streaming reader (not `read_to_end`)
- When `max_body_size` is exceeded, return `413` but keep the partial item (nonfatal by design)
- Filter/meta plugins use `PIPESIZE`-sized buffers
## HTML Rendering
- Use `html_escape` crate for all user-controlled data in HTML pages
- `esc()` for text content, `esc_attr()` for HTML attributes
- Security headers middleware: `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin-when-cross-origin`
## Changelog
The project uses [Keep a Changelog](https://keepachangelog.com/). The changelog lives at `CHANGELOG.md` in the project root.
- **Always update `CHANGELOG.md`** when making changes that affect users (new features, breaking changes, bug fixes, etc.)
- Add entries under the `[Unreleased]` section using these categories: `Added`, `Changed`, `Deprecated`, `Removed`, `Fixed`, `Security`
- Keep descriptions concise and user-focused — what changed from the user's perspective, not implementation details
- Commit changelog updates in the same commit as the feature/fix they document
- Before releasing a new version, move `[Unreleased]` entries to a versioned section (e.g., `[0.2.0] - YYYY-MM-DD`) and add a new empty `[Unreleased]` above it

107
CHANGELOG.md Normal file
View File

@@ -0,0 +1,107 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- New `filter_grep` feature to optionally include the grep filter plugin (regex-based line filtering). Disabling this feature removes the `regex` crate and its ~800 KiB dependency stack from the binary.
- New `meta_all_musl` feature for all MUSL-compatible meta plugins (excludes `meta_magic` which requires libmagic)
- New `filter_all_musl` feature for all MUSL-compatible filter plugins
- Database index on `items(ts)` column for faster ORDER BY sorting
- Server API `ItemInfo` now includes `file_size` — actual filesystem-reported size of the item data file
### Changed
- CLI args now feature-gated: `--server` and related options hidden when built without `server` feature; `--client-*` options hidden when built without `client` feature. Run `--help` only shows relevant options.
- `server` Cargo feature now includes TLS support by default (`axum-server`); `tls` feature removed
- Clap `conflicts_with_all` removed from all mode args — exclusivity now handled by implicit `group("mode")`
- Filter plugins check size before loading content into memory (prevents OOM on large inputs)
- Status page pre-allocates collections with known capacities (meta plugins, compression info)
- `#[inline]` on HTML escape helper functions (`esc`, `esc_attr`) for hot path performance
- Removed `once_cell` crate (replaced with `std::sync::LazyLock` from Rust 1.80)
- Removed `lazy_static` crate (replaced with `std::sync::LazyLock`)
### Breaking
- Plugin feature flags renamed with type prefix for consistency:
- `magic``meta_magic`
- `infer``meta_infer`
- `tree_magic_mini``meta_tree_magic_mini`
- `tokens``meta_tokens`
- `grep``filter_grep`
- `all-meta-plugins``meta_all`
- `all-filter-plugins``filter_all`
### Fixed
- CLI help text typo: "metatdata" → "metadata" in `--get` and `--info` descriptions
### Refactored
- Added module-level documentation to `services/` module
### Documentation
- README.md: Fixed compression table — zstd is native (not external), "none" renamed to "raw"
- DESIGN.md: Updated schema to reflect current `items` table columns and meta plugin inventory
## [0.1.0] - 2026-03-21
### Added
- Streaming tar-based export (`--export`) producing `.keep.tar` archives without loading entire files into memory
- Streaming tar-based import (`--import`) extracting `.keep.tar` archives with new IDs
- Server endpoints `GET /api/export` and `POST /api/import`
- ID-based filtering for `--list` (`keep -l 1 2 3` lists specific items by ID)
- Server API accepts optional `ids` query parameter on `GET /api/item/`
- `--ids-only` flag for `--list` mode for scripting
- `infer` and `tree_magic_mini` meta plugins for MIME type detection
- Native `zstd` compression plugin as default
- Configurable compression via `--compression` flag
- Export/import modes with format detection (JSON, YAML, binary)
- `XDG_CONFIG_HOME` support for default config file location
- `XDG_DATA_HOME` support for default storage location
- Tilde (`~`) expansion in config file paths
### Changed
- `CompressionType::None` renamed to `CompressionType::Raw` (with `"none"` as alias for backward compatibility)
- `items.size` column renamed to `items.uncompressed_size`
- Added `items.compressed_size` column tracking compressed file size on disk
- Added `items.closed` column tracking whether an item is fully written
- Default `list_format` in config now matches CLI default (7 vs 5 columns)
- All filter plugins share deduplicated option implementations
### Refactored
- Extracted `spawn_body_reader()` and `check_binary_content()` helpers for streaming uploads
- Extracted `yaml_value_to_string()` helper for meta plugins
- Extracted `item_path()` helper in `ItemService` to reduce path duplication
- Unified `get_item_meta_name`/`value` to take `&str` instead of `String`
- Shared `ItemInfo` struct between client and server
- Compression service now returns `Result` types instead of panicking via `.expect()`
- `ApiResponse::ok()` and `ApiResponse::empty()` constructors
- `meta_filter()` helper on `Settings` for consistent filtering
- Added `tag_names()` method on `ItemWithMeta`
- `filter_clone_box!` macro for filter plugin cloning
### Fixed
- Panic guards in diff, compression engine, and spawned threads
- Pre-existing borrow errors in export handler and `TryFrom` implementation
- TOCTOU race in `stream_raw_content_response`
- Swallowed write errors in meta plugins (digest, magic_file, exec)
- Truncated uploads (413) now properly store compressed data
- `term::stderr().unwrap()` panic in `item_service`
- `.unwrap()` panics in compression engine `Read`/`Write` impls
- Client API errors now propagate to user instead of being swallowed
- Import endpoint returns 413 on `max_body_size` instead of truncating
- `keep --list` uses `list_format` from config in all modes
- All tables respect `table_config` from settings
- `DisplayListItem` struct removed (was unused)
- `#[serde(alias = "size")]` on `ImportMeta` for backward compatibility

2002
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -2,105 +2,123 @@
name = "keep"
version = "0.1.0"
edition = "2024"
rust-version = "1.85"
description = "Keep and manage temporary files with automatic compression and metadata generation"
readme = "README.md"
license = "MIT"
repository = "https://gitea.gt0.ca/asp/keep"
keywords = ["cli", "files", "compression", "metadata"]
categories = ["command-line-utilities"]
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
anyhow = "1.0.72"
axum = { version = "0.8.4", optional = true }
anyhow = "1.0"
axum = { version = "0.8", optional = true }
derive_more = { version = "2.0", features = ["full"] }
smart-default = "0.7"
thiserror = "1.0"
base64 = "0.22.1"
chrono = { version = "0.4.26", features = ["serde"] }
clap = { version = "4.3.10", features = ["derive", "env"] }
config = "0.14.0"
thiserror = "2.0"
base64 = "0.22"
chrono = { version = "0.4", features = ["serde"] }
clap = { version = "4.6", features = ["derive", "env"] }
clap_complete = "4"
command-fds = "0.3"
config = "0.15"
ctor = "0.2"
directories = "6.0.0"
dns-lookup = "2.0.2"
enum-map = "2.6.1"
flate2 = { version = "1.0.27", features = ["zlib-ng-compat"], optional = true }
directories = "6.0"
dns-lookup = "3.0"
enum-map = "2.7"
flate2 = { version = "1.0", features = ["zlib-ng-compat"], optional = true }
futures = "0.3"
gethostname = "1.0.2"
humansize = "2.1.3"
gethostname = "1.0"
humansize = "2.1"
async-stream = "0.3"
hyper = { version = "1.0", features = ["full"] }
http-body-util = "0.1"
inventory = "0.3"
is-terminal = "0.4.9"
lazy_static = "1.4.0"
libc = "0.2.147"
local-ip-address = "0.6.5"
log = "0.4.19"
lz4_flex = { version = "0.11.1", optional = true }
magic = { version = "0.13.0", optional = true }
nix = "0.30.1"
once_cell = "1.19.0"
comfy-table = "7.2.0"
pwhash = "1.0.0"
regex = "1.9.5"
ringbuf = "0.3"
rmcp = { version = "0.2.0", features = ["server"], optional = true }
rusqlite = { version = "0.37.0", features = ["bundled", "array", "chrono"] }
rusqlite_migration = "2.3.0"
serde = { version = "1.0.219", features = ["derive"] }
serde_json = "1.0.142"
serde_yaml = "0.9.34"
sha2 = "0.10.0"
md5 = "0.7.0"
stderrlog = "0.6.0"
strum = { version = "0.27.2", features = ["derive"] }
term = "1.1.0"
is-terminal = "0.4"
libc = "0.2"
local-ip-address = "0.6"
log = "0.4"
lz4_flex = { version = "0.12", optional = true }
zstd = { version = "0.13", optional = true }
magic = { version = "0.13", optional = true }
infer = { version = "0.19", optional = true }
tree_magic_mini = { version = "3.2", optional = true }
nix = { version = "0.30", features = ["fs", "process"] }
comfy-table = "7.2"
pwhash = "1.0"
regex = { version = "1.10", optional = true }
ringbuf = "0.4"
rusqlite = { version = "0.37", features = ["bundled", "array", "chrono"] }
rusqlite_migration = "2.3"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_yaml = "0.9"
sha2 = "0.10"
md5 = "0.7"
subtle = "2.6"
env_logger = "0.11"
strfmt = "0.2"
strum = { version = "0.27", features = ["derive"] }
term = "1.2"
tokio = { version = "1.0", features = ["full"] }
tokio-stream = "0.1"
tokio-util = "0.7.16"
tower = { version = "0.5.2", optional = true }
tower-http = { version = "0.6.6", features = ["cors", "fs", "trace"], optional = true }
utoipa = { version = "5.4.0", features = ["axum_extras"], optional = true }
utoipa-swagger-ui = { version = "9.0.2", features = ["axum"], optional = true }
uzers = "0.12.1"
which = "8.0.0"
xdg = "2.5.2"
strip-ansi-escapes = "0.2.1"
pest = "2.8.1"
pest_derive = "2.8.1"
dirs = "6.0.0"
similar = { version = "2.7.0", default-features = false, features = ["text"] }
tokio-util = "0.7"
tower = { version = "0.5", optional = true }
tower-http = { version = "0.6", features = ["cors", "fs", "trace"], optional = true }
utoipa = { version = "5.4", features = ["axum_extras"], optional = true }
utoipa-swagger-ui = { version = "9.0", features = ["axum"], optional = true }
uzers = "0.12"
which = "8.0"
xdg = "2.5"
strip-ansi-escapes = "0.2"
tar = "0.4"
pest = "2.8"
pest_derive = "2.8"
dirs = "6.0"
similar = { version = "2.7", default-features = false, features = ["text"] }
html-escape = "0.2"
ureq = { version = "3", features = ["json"], optional = true }
os_pipe = { version = "1", optional = true }
axum-server = { version = "0.8", features = ["tls-rustls"], optional = true }
jsonwebtoken = { version = "10", optional = true, features = ["aws_lc_rs"] }
tiktoken-rs = { version = "0.9", optional = true }
tempfile = "3.3"
[features]
# Default features include core compression engines and swagger UI
default = ["magic", "lz4", "gzip"]
# Default features include core compression engines plugins that support MUSL
default = [
"client",
"gzip",
"filter_grep",
"meta_infer",
"lz4",
"meta_tokens",
"meta_tree_magic_mini",
"zstd"
]
# Full
#default = ["server", "magic", "lz4", "swagger"]
# Server feature (includes axum and related dependencies)
server = ["dep:axum", "dep:tower", "dep:tower-http", "dep:utoipa"]
# Server feature (includes axum and TLS/HTTPS via axum-server; rustls already available via client/ureq)
server = ["dep:axum", "dep:tower", "dep:tower-http", "dep:utoipa", "dep:jsonwebtoken", "dep:axum-server"]
# Compression features
gzip = ["flate2"]
lz4 = ["lz4_flex"]
bzip2 = []
xz = []
zstd = []
zstd = ["dep:zstd"]
# Plugin features (meta and filter)
all-meta-plugins = ["dep:magic"]
all-filter-plugins = []
# Meta plugin features
meta_magic = ["dep:magic"]
meta_infer = ["dep:infer"]
meta_tree_magic_mini = ["dep:tree_magic_mini"]
meta_tokens = ["dep:tiktoken-rs"]
meta_all = ["meta_magic", "meta_infer", "meta_tree_magic_mini", "meta_tokens"]
meta_all_musl = ["meta_infer", "meta_tree_magic_mini", "meta_tokens"]
# Individual plugin features
magic = ["dep:magic"]
# MCP feature (Model Context Protocol support)
mcp = ["dep:rmcp"]
# Filter plugin features
filter_grep = ["dep:regex"]
filter_all = ["filter_grep"]
filter_all_musl = ["filter_grep"]
# Swagger UI feature
swagger = ["dep:utoipa-swagger-ui"]
@@ -108,9 +126,5 @@ swagger = ["dep:utoipa-swagger-ui"]
# Client feature (HTTP client for remote server)
client = ["dep:ureq", "dep:os_pipe"]
# TLS feature (HTTPS server support)
tls = ["dep:axum-server"]
[dev-dependencies]
tempfile = "3.3.0"
rand = "0.8.5"
rand = "0.9"

View File

@@ -33,7 +33,7 @@
- `modes/status.rs` - Show system status and capabilities
- `modes/server.rs` - REST HTTP/HTTPS server mode with OpenAPI documentation
- `modes/client.rs` - Client mode for remote server (streaming save, local decompression)
- `modes/common.rs` - Shared utilities for all modes
- `modes/common.rs` - Shared utilities for all modes (OutputFormat, table creation, `print_serialized`, `build_path_table`, `ensure_default_tag`, `render_item_info_table`, `render_list_table_with_format`)
### Database Module
- `db.rs` - SQLite database operations
@@ -49,24 +49,31 @@
- `compression_engine/program.rs` - External program wrapper
### Meta Plugin Module
- `meta_plugin.rs` - Trait and type definitions
- `meta_plugin.rs` - Trait and type definitions, `SaveMetaFn` callback type
- `meta_plugin/program.rs` - External program wrapper
- `meta_plugin/digest.rs` - Internal digest implementations
- `meta_plugin/system.rs` - System information metadata plugins
**SaveMetaFn Architecture**: Meta plugins are decoupled from direct DB access via a `SaveMetaFn` callback (`Arc<Mutex<dyn FnMut(&str, &str) + Send>>`). The callback is injected at `MetaService` construction and propagated to all plugins via `BaseMetaPlugin`. This enables:
- **Local mode**: Callback collects metadata into a `Vec`, written to DB after plugins finish
- **Client mode**: Callback collects into a `HashMap`, sent to server after streaming completes
- **Server mode**: Callback collects into a `Vec`, written to DB after plugins finish (same as local)
### Common Modules
- `common/is_binary.rs` - Binary file detection utilities
- `common/status.rs` - Status information generation
- `common/mod.rs` - `PIPESIZE` constant (8192), `stream_copy()` streaming utility
### Client Module
- `client.rs` - HTTP client wrapper (ureq-based, supports streaming POST)
- `modes/client/save.rs` - 3-thread streaming save (stdin → tee → compress → pipe → HTTP POST)
- `modes/client/save.rs` - 3-thread streaming save with local meta plugins (stdin → tee → compress → meta plugins → pipe → HTTP POST)
- `modes/client/get.rs` - Get with server-side raw fetch + local decompression
- `modes/client/list.rs` - List delegation to server
- `modes/client/info.rs` - Info delegation to server
- `modes/client/delete.rs` - Delete delegation to server
- `modes/client/diff.rs` - Diff delegation to server
- `modes/client/status.rs` - Status delegation to server
- `modes/client/update.rs` - Update delegation to server (sends plugin names/metadata/tags)
### Utility Modules
- `plugins.rs` - Shared plugin utilities
@@ -110,7 +117,7 @@
## Data Storage
### Database Schema
- `items` table: id (primary key), ts (timestamp), size (optional), compression
- `items` table: id (primary key), ts (timestamp), uncompressed_size (optional), compressed_size (optional), closed (boolean), compression
- `tags` table: id (foreign key to items), name (tag name)
- `metas` table: id (foreign key to items), name (meta key), value (meta value)
- Indexes on tag names and meta names for faster queries
@@ -128,16 +135,20 @@
### Item Operations
- `GET /api/item/` - Get a list of items as JSON. Optional params: `order=newest|oldest`, `start=0`, `count=100`, `tags=tag1,tag2`
- `POST /api/item/` - Add a new item (body: raw content). Query params: `tags`, `metadata` (JSON), `compress=true|false`, `meta=true|false`
- `POST /api/item/` - Add a new item (body: raw content, **streamed** through fixed-size 8192-byte buffers). Query params: `tags`, `metadata` (JSON), `compress=true|false`, `meta=true|false`
- `POST /api/item/<#>/meta` - Add metadata to an existing item (body: JSON object)
- `POST /api/item/<#>/update` - Re-run meta plugins on stored content. Query params: `plugins` (comma-separated), `metadata` (JSON), `tags` (comma-separated, idempotent)
- `DELETE /api/item/<#>` - Delete an item
- `GET /api/item/latest` - Return the latest item as JSON. Optional params: `tags=tag1,tag2`, `allow_binary=true|false`
- `GET /api/item/latest/meta` - Return the latest item metadata as JSON. Optional params: `tags=tag1,tag2`
- `GET /api/item/latest/content` - Return the raw content of the latest item. Optional params: `tags=tag1,tag2`, `decompress=true|false`
- `GET /api/item/latest/content` - Return the raw content of the latest item (**streamed**). Optional params: `tags=tag1,tag2`, `decompress=true|false`
- `GET /api/item/<#>` - Return the item as JSON. Optional params: `allow_binary=true|false`
- `GET /api/item/<#>/meta` - Return the item metadata as JSON
- `GET /api/item/<#>/content` - Return the raw content of the item. Optional params: `decompress=true|false`
- `GET /api/diff` - Diff two items. Params: `id_a`, `id_b`
- `GET /api/item/<#>/content` - Return the raw content of the item (**streamed**). Optional params: `decompress=true|false`
- `GET /api/diff` - Diff two items. Params: `id_a`, `id_b` (individual items capped at 10 MB)
### Server Configuration
- `max_body_size` - Maximum POST body size in bytes (default: unlimited). When exceeded, server returns `413 PAYLOAD_TOO_LARGE` while keeping the partial item already saved through the streaming pipeline. Set to `0` for unlimited.
### Server Modes
- **Plain HTTP** (default): `tokio::net::TcpListener` + `axum::serve()`
@@ -145,10 +156,13 @@
- Conditional selection at startup: cert+key present → HTTPS, otherwise → HTTP
### Client/Server Protocol
- Smart clients (keep CLI) set `compress=false` and `meta=false` on POST, handling compression/metadata locally
- Smart clients (keep CLI) set `compress=false` and `meta=false` on POST, handling compression and meta plugins locally
- Dumb clients (curl) use defaults (`compress=true`, `meta=true`), server handles everything
- Smart client update: sends `plugins` param to server, server runs plugins on stored content (avoids downloading compressed data)
- GET responses include `X-Keep-Compression` header when `decompress=false`
- Streaming save uses chunked transfer encoding for constant memory usage
- **Universal streaming**: All server paths (POST, GET, diff) use `PIPESIZE` (8192) byte buffers
- **413 partial item**: When `max_body_size` is exceeded, the server returns `413` but keeps the partial item already saved through the pipeline (nonfatal design — pipes continue normally)
### Authentication
- Bearer token authentication: `Authorization: Bearer <password>`
@@ -164,26 +178,25 @@
- None (no compression)
## Supported Meta Plugins
- FileMagic - File type detection using file command
- FileMime - MIME type detection using file command
- FileEncoding - File encoding detection using file command
- LineCount - Line count using wc command
- WordCount - Word count using wc command
- Cwd - Current working directory
- Binary - Binary file detection
- Uid - Current user ID
- User - Current username
- Gid - Current group ID
- Group - Current group name
- Shell - Shell path from SHELL environment variable
- ShellPid - Shell process ID from PPID environment variable
- KeepPid - Keep process ID
- DigestSha256 - SHA-256 digest
- DigestMd5 - MD5 digest using md5sum command
- ReadTime - Time taken to read data
- ReadRate - Rate of data reading
- Hostname - System hostname
- FullHostname - Fully qualified domain name
Meta plugins collect metadata during item save. Each plugin produces one or more key-value pairs:
- `magic_file` - File type detection using libmagic (when `magic` feature enabled)
- `infer` - MIME type detection using infer crate (when `infer` feature enabled)
- `tree_magic_mini` - MIME type detection using tree_magic_mini (when `tree_magic_mini` feature enabled)
- `tokens` - LLM token counting using tiktoken (when `tokens` feature enabled)
- `text` - Text analysis: line count, word count, char count, line average length
- `digest` - SHA-256 and MD5 checksums
- `hostname` - System hostname (full and short)
- `cwd` - Current working directory
- `user` - Current username and UID
- `shell` - Shell path from SHELL environment variable
- `shell_pid` - Shell process ID from PPID
- `keep_pid` - Keep process ID
- `env` - Arbitrary environment variables (via `KEEP_META_ENV_*` prefix)
- `exec` - Execute external commands for custom metadata
- `read_time` - Time taken to read content
- `read_rate` - Content read rate (bytes/second)
## Testing Strategy
- Unit tests for each module in `src/tests/`
@@ -207,12 +220,19 @@
- TLS/HTTPS support via rustls when certificate and key are provided
- Proper resource cleanup using RAII patterns
- Safe handling of external processes with proper stdin/stdout management
- **Streaming architecture**: All server I/O uses fixed-size 8192-byte buffers; no full file contents held in memory
- **XSS protection**: All user-controlled data in HTML pages is escaped via `html-escape`
- **Security headers**: `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin-when-cross-origin`
- **CORS**: Explicit allowed headers only (`Content-Type`, `Authorization`, `Accept`); no wildcard headers
- **Input limits**: Tags (256 chars), metadata keys (128 chars), metadata values (4096 chars), pagination (10,000 max)
- **Config file size**: 4 KB cap with `from_utf8_lossy` for safe UTF-8 handling
- **Error sanitization**: Internal errors never exposed in HTML responses
- **No `unsafe_code`**: Enforced via `#![deny(unsafe_code)]` (exceptions: `libc::umask` in main.rs, `unsafe impl Send` for `SendCookie` in magic_file.rs)
## Feature Flags
- `server` - HTTP REST API server (axum-based)
- `tls` - HTTPS/TLS support for server (axum-server + rustls)
- `client` - HTTP client for remote server (ureq-based, includes streaming save)
- `mcp` - Model Context Protocol for AI assistant integration
- `swagger` - OpenAPI/Swagger UI documentation
- `magic` - File type detection via libmagic
- `lz4` - LZ4 compression (internal)

67
Dockerfile Normal file
View File

@@ -0,0 +1,67 @@
# Build stage
FROM rust:1.88-slim AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
cmake \
curl \
make \
gcc \
musl-tools \
pkg-config \
&& rm -rf /var/lib/apt/lists/*
RUN rustup target add x86_64-unknown-linux-musl
WORKDIR /app
# Copy manifests and fetch dependencies (cached layer)
COPY Cargo.toml Cargo.lock ./
RUN mkdir src && echo 'fn main() {}' > src/main.rs && echo '' > src/lib.rs
RUN cargo fetch --target x86_64-unknown-linux-musl
# Copy real source and build static binary
# magic feature excluded (requires shared libmagic; fallback uses `file` command)
COPY src/ src/
RUN cargo build --release --target x86_64-unknown-linux-musl \
--no-default-features --features lz4,gzip,server,swagger,client,tls \
&& strip target/x86_64-unknown-linux-musl/release/keep
# Runtime stage - scratch since binary is fully static
FROM scratch
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/keep /keep
COPY --from=builder /etc/ssl/certs/ /etc/ssl/certs/
EXPOSE 21080
# General options
# ENV KEEP_CONFIG=/config/config.yml
# Mount a volume for persistent storage: -v keep-data:/data
ENV KEEP_DIR=/data
ENV KEEP_LIST_FORMAT="id,time,size,tags,meta:hostname"
# Item options
# ENV KEEP_COMPRESSION=lz4
# ENV KEEP_META_PLUGINS=""
# ENV KEEP_FILTERS=""
# Server options
ENV KEEP_SERVER_ADDRESS=0.0.0.0
ENV KEEP_SERVER_PORT=21080
# ENV KEEP_SERVER_USERNAME="keep"
# ENV KEEP_SERVER_PASSWORD=""
# ENV KEEP_SERVER_PASSWORD_HASH=""
# ENV KEEP_SERVER_JWT_SECRET=""
# ENV KEEP_SERVER_JWT_SECRET_FILE=/config/jwt_secret
# TLS options
# ENV KEEP_SERVER_CERT=/certs/cert.pem
# ENV KEEP_SERVER_KEY=/certs/key.pem
# Client options
# ENV KEEP_CLIENT_URL=""
# ENV KEEP_CLIENT_USERNAME="keep"
# ENV KEEP_CLIENT_PASSWORD=""
# ENV KEEP_CLIENT_JWT=""
ENTRYPOINT ["/keep", "--server"]

372
README.md
View File

@@ -33,7 +33,6 @@ keep --get api-data
- [Server Mode](#server-mode)
- [Client Mode](#client-mode)
- [API Endpoints](#api-endpoints)
- [MCP (Model Context Protocol)](#mcp-model-context-protocol)
- [Shell Integration](#shell-integration)
- [Feature Flags](#feature-flags)
- [License](#license)
@@ -46,7 +45,6 @@ keep --get api-data
- **Filters** — Apply transformations (head, tail, grep, strip ANSI) on retrieval
- **Querying** — List, search, diff items with flexible formatting
- **Client/server architecture** — Optional HTTP server with streaming support
- **MCP support** — Model Context Protocol integration for AI assistants
- **Modular design** — Extensible plugin system for compression, metadata, and filtering
## Installation
@@ -72,6 +70,54 @@ cargo install --path .
# Binary at bin/keep
```
### Environment Module
A TCL modulefile is provided at `modulefile`. To use it, copy or symlink the project directory into your modules path:
```sh
# Symlink into an existing module path (e.g., /usr/local/modules)
ln -s /path/to/keep /usr/local/modules/keep
# Load the module
module load keep
# Verify
keep --status
# Source the shell profile (optional, for shell integration)
source $KEEP_BASH_PROFILE # bash
source $KEEP_ZSH_PROFILE # zsh
source $KEEP_SH_PROFILE # sh/dash/ksh
source $KEEP_CSH_PROFILE # csh/tcsh
```
The modulefile prepends `keep/bin` to `PATH` and sets shell-specific profile variables:
| Variable | Profile | Shell |
|----------|---------|-------|
| `KEEP_BASH_PROFILE` | `profile.bash` | bash |
| `KEEP_ZSH_PROFILE` | `profile.zsh` | zsh |
| `KEEP_SH_PROFILE` | `profile.sh` | sh, dash, ksh93, pdksh, mksh |
| `KEEP_CSH_PROFILE` | `profile.csh` | csh, tcsh |
### Shell Completion
Tab completion is available for `bash`, `zsh`, `fish`, `elvish`, and `powershell`. Completions for `@` (save) and `@@` (get) are available for `bash` and `zsh` only.
**Bash** — add to `~/.bashrc`:
```sh
. <(keep --generate-completion bash)
```
**Zsh** — add to `~/.zshrc`:
```sh
. <(keep --generate-completion zsh)
```
**With `profile.bash` or `profile.zsh`**: Completions for `keep`, `@` (save), and `@@` (get) are loaded automatically when sourcing the profile.
### Build with Server/Client Features
```sh
@@ -82,16 +128,19 @@ cargo build --release --features server
cargo build --release --features client
# Server + client + all optional features
cargo build --release --features server,tls,client,swagger,mcp
cargo build --release --features server,client,swagger
```
## Quick Start
```sh
# Save content with a tag
echo "Hello, world!" | keep --save greeting
# Save content with a tag (--save is optional when piping)
echo "Hello, world!" | keep greeting
# Retrieve by tag
# Retrieve by ID (--get is optional for numeric IDs)
keep 1
# Retrieve by tag (--get is required for tags)
keep --get greeting
# List all stored items
@@ -100,8 +149,8 @@ keep --list
# Get item details
keep --info greeting
# Delete by tag
keep --delete greeting
# Delete by ID
keep --delete 1
```
### Real-World Examples
@@ -130,36 +179,36 @@ keep --list --meta project=myapp
### Save Mode
Save stdin content with tags and metadata.
Save stdin content with tags and metadata. The `--save` flag is optional when piping content.
```sh
# Save (auto-assigned ID, no tag)
echo "data" | keep --save
# Save with a tag
# Save with a tag (--save is optional when piping)
echo "data" | keep --save my-tag
echo "data" | keep my-tag
# Save with multiple tags and metadata
cat report.pdf | keep --save report --meta project=alpha --meta env=prod
# Specify compression and digest algorithm
echo "data" | keep --save my-tag --compression gzip --digest sha256
# Specify compression
echo "data" | keep --save my-tag --compression gzip
```
Tags and metadata make items easy to find later. Tags are simple identifiers; metadata is key-value pairs.
### Get Mode
Retrieve items by ID or tags. This is the default mode when IDs are provided.
Retrieve items by ID. This is the default mode when numeric IDs are provided.
```sh
# Get by ID
# Get by ID (no --get needed for numeric IDs)
keep --get 1
keep 1
# Get by tag
# Get by tag (requires --get flag)
keep --get my-tag
keep my-tag
# Get with filters applied
keep --get 1 --filters "head_lines(10)"
@@ -207,7 +256,7 @@ keep --info --meta key=value
### Update Mode
Update an item's tags and metadata.
Update an item's tags, metadata, and re-run meta plugins.
```sh
# Replace tags
@@ -218,6 +267,9 @@ keep --update 1 --meta key=newvalue
# Remove a metadata key
keep --update 1 --meta key
# Re-run meta plugins on stored content
keep --update 1 --meta-plugin digest --meta-plugin text
```
### Delete Mode
@@ -293,8 +345,8 @@ Items are compressed automatically on save. Default: LZ4.
| `gzip` | Internal | Fast | Good |
| `bzip2` | External | Slow | Better |
| `xz` | External | Slowest | Best |
| `zstd` | External | Fast | Good |
| `none` | Internal | N/A | N/A |
| `zstd` | Internal | Fast | Good |
| `raw` | Internal | N/A | N/A |
```sh
# Specify compression per item
@@ -315,7 +367,7 @@ Metadata is automatically extracted when saving items.
| `env` | `*` | Capture `KEEP_META_*` environment variables |
| `magic_file` | `file_type` | File type detection (requires `magic` feature) |
| `text` | `text_line_count`, `text_word_count` | Line and word counts |
| `user` | `uid`, `user`, `gid`, `group` | Current user info |
| `user` | `user_uid`, `user_name`, `user_gid`, `user_group` | Current user info |
| `shell` | `shell` | Current shell path |
| `shell_pid` | `shell_pid` | Shell process ID |
| `keep_pid` | `keep_pid` | Keep process ID |
@@ -327,8 +379,11 @@ Metadata is automatically extracted when saving items.
| `cwd` | `cwd` | Current working directory |
```sh
# Use specific plugins
echo "data" | keep --save tag --meta-plugins "digest,text,user"
# Use specific plugins (repeatable)
echo "data" | keep --save tag --meta-plugin digest --meta-plugin text --meta-plugin user
# Pass options to a plugin via JSON
echo "data" | keep --save tag --meta-plugin 'tokens:{"options":{"min_length":"2"}}'
# Capture custom metadata via environment
KEEP_META_project=alpha echo "data" | keep --save tag
@@ -346,17 +401,23 @@ KEEP_META_build=1234 echo "data" | keep --save tag --meta env=staging
| `KEEP_DIR` | Storage directory | `~/.keep` |
| `KEEP_CONFIG` | Config file path | `~/.config/keep/config.yml` |
| `KEEP_COMPRESSION` | Compression algorithm | `lz4` |
| `KEEP_META_PLUGINS` | Meta plugins to use | `env` |
| `KEEP_META_PLUGINS` | Meta plugins to use (JSON format: `name[:{json}]`, comma-separated) | `env` |
| `KEEP_FILTERS` | Default filter chain | none |
| `KEEP_LIST_FORMAT` | List column format | built-in defaults |
| `KEEP_SERVER_ADDRESS` | Server bind address | `127.0.0.1` |
| `KEEP_SERVER_PORT` | Server port | `21080` |
| `KEEP_SERVER_USERNAME` | Server Basic auth username | `keep` |
| `KEEP_SERVER_PASSWORD` | Server password | none |
| `KEEP_SERVER_PASSWORD_HASH` | Server password hash | none |
| `KEEP_SERVER_JWT_SECRET` | JWT secret for token auth | none |
| `KEEP_SERVER_JWT_SECRET_FILE` | Path to JWT secret file | none |
| `KEEP_SERVER_MAX_BODY_SIZE` | Maximum POST body size in bytes (0=unlimited) | unlimited |
| `KEEP_SERVER_CERT` | TLS certificate file path (PEM) | none |
| `KEEP_SERVER_KEY` | TLS private key file path (PEM) | none |
| `KEEP_CLIENT_URL` | Remote keep server URL | none |
| `KEEP_CLIENT_USERNAME` | Remote server username | `keep` |
| `KEEP_CLIENT_PASSWORD` | Remote server password | none |
| `KEEP_CLIENT_JWT` | JWT token for remote server | none |
Any config setting can be overridden with `KEEP__<SETTING>` environment variables (double underscore separator).
@@ -409,7 +470,13 @@ meta_plugins:
server:
address: "127.0.0.1"
port: 21080
username: "keep"
password: "secret"
# Maximum POST body size in bytes (0 = unlimited)
# max_body_size: 52428800 # 50 MB
# JWT authentication (takes priority over password)
# jwt_secret: "my-secret-key"
# jwt_secret_file: /path/to/jwt_secret
# TLS (requires tls feature)
# cert_file: /path/to/cert.pem
# key_file: /path/to/key.pem
@@ -417,7 +484,10 @@ server:
# Client settings
client:
url: "http://localhost:21080"
username: "keep"
password: "secret"
# Or use JWT token
# jwt: "eyJhbGciOiJIUzI1NiIs..."
human_readable: true
quiet: false
@@ -444,10 +514,117 @@ keep --server
# Custom address and port
keep --server --server-address 0.0.0.0 --server-port 8080
# With authentication
# With password authentication
keep --server --server-password mypassword
# With custom username
keep --server --server-username admin --server-password mypassword
# With JWT authentication
keep --server --server-jwt-secret my-secret-key
```
#### JWT Authentication
JWT (JSON Web Token) authentication provides permission-based access control. When a JWT secret is configured, the server validates tokens and checks permission claims for each request.
**Configuration:**
```sh
# Via CLI flag
keep --server --server-jwt-secret my-secret-key
# Via environment variable
export KEEP_SERVER_JWT_SECRET=my-secret-key
keep --server
# Via config file (config.yml)
server:
jwt_secret: "my-secret-key"
# Via secret file (for Docker/secrets management)
keep --server --server-jwt-secret-file /path/to/secret
```
**Token format:**
JWTs must use HS256 algorithm with the following claims:
| Claim | Type | Required | Description |
|-------|------|----------|-------------|
| `sub` | string | Yes | Subject (client identifier) |
| `exp` | number | Yes | Expiration time (Unix timestamp) |
| `read` | boolean | No | Permission for GET requests (default: false) |
| `write` | boolean | No | Permission for POST/PUT requests (default: false) |
| `delete` | boolean | No | Permission for DELETE requests (default: false) |
**Permission mapping:**
| HTTP Method | Required Permission |
|-------------|-------------------|
| `GET` | `read` |
| `POST`, `PUT`, `PATCH` | `write` |
| `DELETE` | `delete` |
**Example token payload:**
```json
{
"sub": "ci-pipeline",
"exp": 1735689600,
"read": true,
"write": true,
"delete": false
}
```
**Generating tokens:**
The server does not generate tokens — use any JWT library or tool:
```sh
# Using jwt-cli (https://github.com/mike-engel/jwt-cli)
jwt encode --secret my-secret-key \
--exp=$(date -d '+24 hours' +%s) \
'{"sub":"my-client","read":true,"write":true,"delete":false}'
# Using Python
python3 -c "
import jwt, time
token = jwt.encode({
'sub': 'my-client',
'exp': int(time.time()) + 86400,
'read': True, 'write': True, 'delete': False
}, 'my-secret-key', algorithm='HS256')
print(token)
"
```
**Using tokens:**
```sh
# With curl
curl -H "Authorization: Bearer <jwt-token>" http://localhost:21080/api/item/
# The keep client uses --client-jwt for JWT tokens
keep --client-url http://server:21080 --client-jwt <jwt-token> --save my-tag
```
**Response codes:**
| Code | Meaning |
|------|---------|
| `200` | Authorized |
| `401` | Missing, invalid, or expired token |
| `403` | Valid token but insufficient permissions |
**Notes:**
- When `jwt_secret` is set, password authentication is disabled — all requests must present a valid JWT Bearer token
- JWT and password authentication are mutually exclusive — when both `jwt_secret` and `password` are configured, only JWT is used
- Permission fields default to `false` if omitted — tokens must explicitly grant permissions
- JWT authentication requires the `server` feature (jsonwebtoken is included automatically)
#### HTTPS / TLS
Build with the `tls` feature to enable HTTPS:
@@ -493,6 +670,33 @@ keep --client-url https://localhost:21080 --save my-tag
The server accepts data from both dumb clients (raw HTTP/curl) and smart clients (the keep CLI).
#### Server Streaming
The server streams all data through fixed-size buffers (8192 bytes). At no point is the entire file content held in memory.
- **POST**: Body streams through the compression and storage pipeline in chunks. When `max_body_size` is exceeded, the server returns `413 PAYLOAD_TOO_LARGE` while keeping the partial item already saved through the pipeline.
- **GET**: Content streams from disk through decompression to the client using the same fixed-size buffers.
- **Diff**: Individual items are capped at 10 MB for the diff endpoint to prevent unbounded memory use.
##### Max Body Size
Control the maximum accepted body size with:
```sh
# Via CLI flag (bytes)
keep --server --server-max-body-size 52428800
# Via environment variable
export KEEP_SERVER__MAX_BODY_SIZE=52428800
keep --server
# Via config file (config.yml)
server:
max_body_size: 52428800 # 50 MB
```
When set to `0` or omitted, no limit is enforced.
#### Server Query Parameters
The server supports query parameters that control processing:
@@ -505,6 +709,14 @@ The server supports query parameters that control processing:
| `meta` | `true` | `false` = client handles metadata, skip server-side plugins |
| `decompress` | `true` | `false` = return raw compressed bytes on GET |
The `POST /api/item/{id}/update` endpoint accepts additional parameters:
| Parameter | Default | Description |
|-----------|---------|-------------|
| `plugins` | none | Comma-separated plugin names to re-run on stored content |
| `metadata` | none | JSON-encoded metadata overrides to apply |
| `tags` | none | Comma-separated tags to add (idempotent) |
When using a smart client, these are set automatically. For curl, the server handles everything by default.
#### Example: Curl as a Dumb Client
@@ -533,18 +745,26 @@ cargo build --release --features client
keep --client-url http://server:21080 --save my-tag
export KEEP_CLIENT_URL=http://server:21080
# With authentication
# With password authentication
keep --client-url http://server:21080 --client-password mypassword --save my-tag
export KEEP_CLIENT_PASSWORD=mypassword
# With custom username
keep --client-url http://server:21080 --client-username admin --client-password mypassword --save my-tag
# With JWT authentication
keep --client-url http://server:21080 --client-jwt <jwt-token> --save my-tag
export KEEP_CLIENT_JWT=<jwt-token>
```
#### How Client Mode Works
Client mode uses **local plugins** and **remote storage**:
1. **Save**: Local compression and metadata plugins run on the client; compressed data streams to the server
1. **Save**: Local compression and meta plugins run on the client; compressed data streams to the server. Smart clients set `meta=false` so the server skips its own plugins.
2. **Get**: Server sends raw compressed data; client decompresses locally and applies filters
3. **Other operations** (list, info, delete, diff): Delegated directly to the server
3. **Update**: Meta plugins run on the server to avoid downloading compressed data for re-processing
4. **Other operations** (list, info, delete, diff): Delegated directly to the server
This means client behavior is consistent with local mode — the same compression settings and filters apply.
@@ -553,24 +773,25 @@ This means client behavior is consistent with local mode — the same compressio
Client save uses a 3-thread streaming pipeline for constant memory usage regardless of data size:
```
┌──────────────┐ OS pipe ┌────────────────┐
┌───────────────────┐ OS pipe ┌────────────────┐
│ Reader thread ├──────────────────┤ Streamer thread│
│ │ (compressed │ │
│ stdin → tee │ bytes) │ pipe → POST │
│ → hash │ │ (chunked) │
│ → compress│ │ │
└──────────────┘ └────────────────┘
│ → compress │ │ │
│ → meta plugins │
└───────────────────┘ └────────────────┘
│ │
▼ ▼
stdout + Server stores blob
SHA-256 digest
computed metadata
```
- **Reader thread**: Reads stdin, tees output to stdout, computes SHA-256, compresses data, writes to OS pipe
- **Reader thread**: Reads stdin, tees output to stdout, computes SHA-256 via digest plugin, compresses data, runs meta plugins (hostname, text, etc.), writes to OS pipe
- **Streamer thread**: Reads compressed bytes from pipe, streams to server via chunked HTTP POST
- **Main thread**: After streaming completes, sends computed metadata (digest, hostname, size) to server
- **Main thread**: After streaming completes, sends plugin-collected metadata to server
Memory usage is O(PIPESIZE) — typically 64KB — regardless of how much data is being stored.
Memory usage is O(PIPESIZE) — typically 8 KB — regardless of how much data is being stored.
#### Example: Remote Pipeline
@@ -603,20 +824,36 @@ keep --client-url http://logserver:21080 --list --meta project=myapp
| `GET` | `/api/item/{id}/meta` | Item metadata by ID |
| `GET` | `/api/item/{id}/info` | Item info by ID |
| `POST` | `/api/item/{id}/meta` | Add metadata to existing item (body: JSON object) |
| `POST` | `/api/item/{id}/update` | Re-run meta plugins on stored content (params: `plugins`, `metadata`, `tags`) |
| `DELETE` | `/api/item/{id}` | Delete item by ID |
| `GET` | `/api/diff` | Diff two items (`id_a`, `id_b` params) |
#### Authentication
```sh
# Bearer token
curl -H "Authorization: Bearer mypassword" http://localhost:21080/api/status
The server supports three authentication modes:
# Basic auth
**1. Password (HTTP Basic auth):**
```sh
# Default username is "keep"
curl -u keep:mypassword http://localhost:21080/api/status
# Custom username
curl -u admin:mypassword http://localhost:21080/api/status
```
When no password is configured, authentication is disabled.
**2. JWT (permission-based):**
```sh
# Valid JWT with read permission allows GET requests
curl -H "Authorization: Bearer <jwt-token>" http://localhost:21080/api/item/
```
See [JWT Authentication](#jwt-authentication) for token format and configuration.
**3. No authentication:**
When neither password nor JWT secret is configured, authentication is disabled.
#### Swagger UI
@@ -628,42 +865,54 @@ cargo build --features server,swagger
Swagger UI available at `/swagger`, OpenAPI spec at `/openapi.json`.
## MCP (Model Context Protocol)
#### Security
AI assistant integration via the Model Context Protocol. Enable with the `mcp` feature.
The server applies the following security measures:
```sh
cargo build --features server,mcp
```
MCP endpoint available at `/mcp/sse` when the server is running.
### Available Tools
| Tool | Description | Parameters |
|------|-------------|------------|
| `save_item` | Save new content | `content`, `tags[]`, `metadata{}` |
| `get_item` | Get item by ID | `id` |
| `get_latest_item` | Get latest item | `tags[]` |
| `list_items` | List items | `tags[]`, `limit`, `offset` |
| `search_items` | Search items | `tags[]`, `metadata{}` |
- **Input validation**: Item IDs are validated as positive integers; tags and metadata have length limits (256 and 128 characters respectively).
- **XSS protection**: All user-controlled data rendered into HTML pages is escaped.
- **Security headers**: Responses include `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, and `Referrer-Policy: strict-origin-when-cross-origin`.
- **CORS**: Explicit allowed headers (`Content-Type`, `Authorization`, `Accept`); no wildcard headers.
- **Path traversal**: Item IDs are validated to prevent directory traversal attacks.
- **Internal errors**: Internal error details are never exposed in HTML responses — only generic messages are shown.
## Shell Integration
Source `profile.bash` to enable shell integration:
Profile scripts are provided for several shells. Source the appropriate one to enable shell integration:
| Profile | Shells | Features |
|---------|--------|----------|
| `profile.bash` | bash | Preexec hook, wrapper function, `@`/`@@` aliases, tab completions |
| `profile.zsh` | zsh | Preexec hook, wrapper function, `@`/`@@` aliases, tab completions |
| `profile.sh` | sh, dash, ksh93, pdksh, mksh | Wrapper function, `@`/`@@` aliases |
| `profile.csh` | csh, tcsh | Alias-based `keep` wrapper, `@`/`@@` aliases |
```sh
# bash
source /path/to/keep/profile.bash
# zsh
source /path/to/keep/profile.zsh
# sh, dash, ksh
source /path/to/keep/profile.sh
# csh/tcsh
source /path/to/keep/profile.csh
```
This provides:
All profiles provide:
- **`keep` function** — Captures the current command in metadata automatically
- **`@` alias** — Shorthand for `keep --save`
- **`@@` alias** — Shorthand for `keep --get`
Bash and zsh profiles additionally provide:
- **`keep` function** — Captures the current command in metadata automatically
- **Tab completion** — For `keep`, `@`, and `@@`
```sh
# Save with automatic command capture
# Save with automatic command capture (bash/zsh)
curl -s api.example.com | @ api-response
# Quick retrieve
@@ -680,7 +929,6 @@ curl -s api.example.com | @ api-response
| `server` | No | HTTP REST API server |
| `tls` | No | HTTPS/TLS server support (requires `server`) |
| `client` | No | HTTP client for remote server |
| `mcp` | No | Model Context Protocol support |
| `swagger` | No | Swagger UI for API docs |
| `bzip2` | No | BZip2 compression (external program) |
| `xz` | No | XZ compression (external program) |
@@ -697,7 +945,7 @@ cargo build --features server,tls
cargo build --features client
# Everything
cargo build --features server,tls,client,mcp,swagger,magic
cargo build --features server,tls,client,swagger,magic
```
## License

View File

@@ -2,7 +2,6 @@
set -ex
export RUSTFLAGS='-C target-feature=+crt-static'
cargo build --release --target x86_64-unknown-linux-gnu
cargo build --release --target x86_64-unknown-linux-musl
mkdir -p bin
cp target/x86_64-unknown-linux-gnu/release/keep ./bin/
cp target/x86_64-unknown-linux-musl/release/keep ./bin/

32
docker-compose.yml Normal file
View File

@@ -0,0 +1,32 @@
services:
keep:
build: .
ports:
- "21080:21080"
volumes:
- keep-data:/data
- keep-config:/config
environment:
- KEEP_SERVER_ADDRESS=0.0.0.0
- KEEP_SERVER_PORT=21080
# - KEEP_SERVER_USERNAME=keep
# - KEEP_SERVER_PASSWORD=changeme
# - KEEP_SERVER_PASSWORD_HASH=
# - KEEP_SERVER_JWT_SECRET=
# - KEEP_SERVER_JWT_SECRET_FILE=/config/jwt_secret
# - KEEP_COMPRESSION=lz4
# - KEEP_META_PLUGINS=
# - KEEP_FILTERS=
- KEEP_CONFIG=/config/config.yml
# - KEEP_SERVER_CERT=/certs/cert.pem
# - KEEP_SERVER_KEY=/certs/key.pem
# - KEEP_CLIENT_USERNAME=keep
# - KEEP_CLIENT_JWT=""
restart: unless-stopped
# For TLS, mount certificate files:
# volumes:
# - ./certs:/certs:ro
volumes:
keep-data:
keep-config:

View File

@@ -15,3 +15,6 @@ module-whatis Keep
prepend-path PATH $mydir/bin
setenv KEEP_BASH_PROFILE ${mydir}/profile.bash
setenv KEEP_ZSH_PROFILE ${mydir}/profile.zsh
setenv KEEP_SH_PROFILE ${mydir}/profile.sh
setenv KEEP_CSH_PROFILE ${mydir}/profile.csh

View File

@@ -6,18 +6,10 @@ function __keep_preexec {
}
function __keep_preexec_init {
local found=false
local f
for f in "${preexec_functions[@]}"; do
if [[ $f = __keep_preexec ]]; then
found=true
break
fi
[[ $f = __keep_preexec ]] && return
done
if [[ $found = false ]]; then
preexec_functions+=(__keep_preexec)
fi
}
function keep {
@@ -40,4 +32,20 @@ function @@ {
keep --get "$@"
}
# Shell completions
. <(command keep --generate-completion bash)
___keep_complete() {
local mode="$1"
COMP_WORDS=(keep "$mode" "${COMP_WORDS[@]:1}")
COMP_CWORD=$((COMP_CWORD + 1))
_keep
}
___keep_save_completion() { ___keep_complete --save; }
___keep_get_completion() { ___keep_complete --get; }
complete -F ___keep_save_completion @
complete -F ___keep_get_completion @@
__keep_preexec_init

11
profile.csh Normal file
View File

@@ -0,0 +1,11 @@
#!/bin/csh
# Profile for csh and tcsh.
# Preexec hooks are not available; KEEP_META_command is not set.
if ( ! $?KEEP_META_tty ) then
setenv KEEP_META_tty `tty`
endif
alias keep 'env KEEP_META_tty=${KEEP_META_tty} command keep \!*'
alias @ 'keep --save \!*'
alias @@ 'keep --get \!*'

13
profile.sh Normal file
View File

@@ -0,0 +1,13 @@
#!/bin/sh
# POSIX-compatible profile for sh, dash, ksh93, pdksh, mksh, and other POSIX shells.
# Preexec hooks are not available in these shells; KEEP_META_command is not set.
KEEP_META_tty=${KEEP_META_tty:-$(tty)}
keep() {
export KEEP_META_tty
command keep "$@"
}
alias @='keep --save'
alias @@='keep --get'

38
profile.zsh Normal file
View File

@@ -0,0 +1,38 @@
#!/bin/zsh
autoload -U add-zsh-hook
__keep_preexec() {
KEEP_META_command="$1"
KEEP_META_tty=${KEEP_META_tty:-$(tty)}
}
add-zsh-hook preexec __keep_preexec
keep() {
if [[ $ZSH_SUBSHELL -le 2 ]]; then
export KEEP_META_command
fi
export KEEP_META_tty
command keep "$@"
}
alias @='keep --save'
alias @@='keep --get'
# Shell completions
. <(command keep --generate-completion zsh)
___keep_complete() {
local mode="$1"
local -a words
words=(keep "$mode" "${words[@]:1}")
((CURRENT++))
_keep
}
___keep_save_completion() { ___keep_complete --save; }
___keep_get_completion() { ___keep_complete --get; }
compdef ___keep_save_completion @
compdef ___keep_get_completion @@

View File

@@ -2,6 +2,7 @@ use std::path::PathBuf;
use std::str::FromStr;
use clap::*;
use clap_complete::Shell;
/// Main struct for command-line arguments, parsed via Clap.
#[derive(Parser, Debug, Clone)]
@@ -23,70 +24,157 @@ pub struct Args {
/// Struct for mode-specific arguments, defining CLI flags for different operations.
#[derive(Parser, Debug, Clone)]
pub struct ModeArgs {
#[arg(group("mode"), help_heading("Mode Options"), short, long, conflicts_with_all(["get", "diff", "list", "delete", "info", "status"]))]
#[arg(group("mode"), help_heading("Mode Options"), short, long)]
#[arg(help("Save an item using any tags or metadata provided"))]
pub save: bool,
#[arg(group("mode"), help_heading("Mode Options"), short, long, conflicts_with_all(["save", "diff", "list", "delete", "info", "status"]))]
#[arg(help(
"Get an item either by it's ID or by a combination of matching tags and metatdata"
))]
#[arg(group("mode"), help_heading("Mode Options"), short, long)]
#[arg(help("Get an item either by its ID or by a combination of matching tags and metadata"))]
pub get: bool,
#[arg(group("mode"), help_heading("Mode Options"), long, conflicts_with_all(["save", "get", "list", "delete", "info", "status"]))]
#[arg(group("mode"), help_heading("Mode Options"), long)]
#[arg(help("Show a diff between two items by ID"))]
pub diff: bool,
#[arg(group("mode"), help_heading("Mode Options"), short, long, conflicts_with_all(["save", "get", "diff", "delete", "info", "status"]))]
#[arg(group("mode"), help_heading("Mode Options"), short, long)]
#[arg(help("List items, filtering on tags or metadata if given"))]
pub list: bool,
#[arg(group("mode"), help_heading("Mode Options"), short, long, conflicts_with_all(["save", "get", "diff", "list", "info", "status"]))]
#[arg(group("mode"), help_heading("Mode Options"), short, long)]
#[arg(help("Delete items either by ID or by matching tags"))]
#[arg(requires = "ids_or_tags")]
pub delete: bool,
#[arg(group("mode"), help_heading("Mode Options"), short, long, conflicts_with_all(["save", "get", "diff", "list", "delete", "status"]))]
#[arg(help(
"Get an item either by it's ID or by a combination of matching tags and metatdata"
))]
#[arg(group("mode"), help_heading("Mode Options"), short, long)]
#[arg(help("Get an item either by its ID or by a combination of matching tags and metadata"))]
pub info: bool,
#[arg(group("mode"), help_heading("Mode Options"), short('S'), long, conflicts_with_all(["save", "get", "diff", "list", "delete", "info", "server", "status_plugins"]))]
#[arg(group("mode"), help_heading("Mode Options"), short('u'), long)]
#[arg(help("Update an item's tags and metadata by ID"))]
pub update: bool,
#[arg(group("mode"), help_heading("Mode Options"), short('S'), long)]
#[arg(help("Show status of directories and supported compression algorithms"))]
pub status: bool,
#[arg(group("mode"), help_heading("Mode Options"), long, conflicts_with_all(["save", "get", "diff", "list", "delete", "info", "status", "server"]))]
#[arg(group("mode"), help_heading("Mode Options"), long)]
#[arg(help("Show available plugins and their configurations"))]
pub status_plugins: bool,
#[arg(group("mode"), help_heading("Mode Options"), long, conflicts_with_all(["save", "get", "diff", "list", "delete", "info", "status"]))]
#[arg(group("mode"), help_heading("Mode Options"), long)]
#[arg(help("Export items to a .keep.tar archive (requires IDs or tags)"))]
pub export: bool,
#[arg(group("mode"), help_heading("Mode Options"), long, value_name("FILE"))]
#[arg(help("Import items from a .keep.tar archive or legacy .meta.yml file"))]
pub import: Option<String>,
#[cfg(feature = "server")]
#[arg(group("mode"), help_heading("Mode Options"), long)]
#[arg(help("Start REST HTTP server"))]
pub server: bool,
#[arg(group("mode"), help_heading("Mode Options"), long, conflicts_with_all(["save", "get", "diff", "list", "delete", "info", "status", "server"]))]
#[arg(group("mode"), help_heading("Mode Options"), long)]
#[arg(help("Generate default configuration and output to stdout"))]
pub generate_config: bool,
#[arg(help_heading("Mode Options"), long)]
#[arg(help("Generate shell completion script"))]
pub generate_completion: Option<Shell>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_ADDRESS"))]
#[arg(help("Server address to bind to"))]
pub server_address: Option<String>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_PORT"))]
#[arg(help("Server port to bind to"))]
pub server_port: Option<u16>,
#[cfg(feature = "tls")]
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_CERT"))]
#[arg(help("Path to TLS certificate file (PEM) for HTTPS"))]
pub server_cert: Option<PathBuf>,
#[cfg(feature = "tls")]
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_KEY"))]
#[arg(help("Path to TLS private key file (PEM) for HTTPS"))]
pub server_key: Option<PathBuf>,
}
/// Represents a meta plugin argument with optional JSON config.
///
/// Parsed from `name` or `name:{"options":{...},"outputs":{...}}` syntax.
#[derive(Debug, Clone)]
pub struct MetaPluginArg {
pub name: String,
pub options: Option<serde_json::Value>,
}
impl FromStr for MetaPluginArg {
type Err = anyhow::Error;
fn from_str(s: &str) -> Result<Self, Self::Err> {
if let Some((name, json_str)) = s.split_once(':') {
let value: serde_json::Value = serde_json::from_str(json_str)
.map_err(|e| anyhow::anyhow!("Invalid JSON for meta plugin '{}': {}", name, e))?;
Ok(MetaPluginArg {
name: name.to_string(),
options: Some(value),
})
} else {
Ok(MetaPluginArg {
name: s.to_string(),
options: None,
})
}
}
}
/// Represents a metadata key-value argument.
///
/// Parsed from `key=value` (set) or `key` (delete/filter by existence).
#[derive(Debug, Clone)]
pub enum MetaArg {
/// Set metadata with a value.
Set { key: String, value: String },
/// Bare key without a value (delete in update mode, filter by existence otherwise).
Key(String),
}
impl MetaArg {
/// Returns the key.
pub fn key(&self) -> &str {
match self {
MetaArg::Set { key, .. } | MetaArg::Key(key) => key,
}
}
/// Returns the value if this is a Set variant.
pub fn value(&self) -> Option<&str> {
match self {
MetaArg::Set { value, .. } => Some(value),
MetaArg::Key(_) => None,
}
}
}
impl FromStr for MetaArg {
type Err = anyhow::Error;
fn from_str(s: &str) -> Result<Self, Self::Err> {
if let Some((key, value)) = s.split_once('=') {
Ok(MetaArg::Set {
key: key.to_string(),
value: value.to_string(),
})
} else {
Ok(MetaArg::Key(s.to_string()))
}
}
}
/// Struct for item-specific arguments, such as compression and plugins.
#[derive(Parser, Debug, Clone)]
pub struct ItemArgs {
@@ -97,15 +185,32 @@ pub struct ItemArgs {
#[arg(
help_heading("Item Options"),
short('M'),
long,
long = "meta-plugin",
value_parser = clap::value_parser!(MetaPluginArg),
env("KEEP_META_PLUGINS")
)]
#[arg(help("Meta plugins to use when saving items"))]
pub meta_plugins: Vec<String>,
#[arg(help("Meta plugin to use (repeatable): name or name:{json}"))]
pub meta_plugins: Vec<MetaPluginArg>,
#[arg(help_heading("Item Options"), long)]
#[arg(help("Metadata key=value to set (or key to delete in --update)"))]
pub meta: Vec<String>,
#[arg(help_heading("Item Options"), long, env("KEEP_FILTERS"))]
#[arg(help("Filter string to apply to content when getting items"))]
pub filters: Option<String>,
#[arg(help_heading("Export Options"), long, default_value = "{name}_{ts}")]
#[arg(help("Template for export tar filename (appends .keep.tar). Variables: {name} {ts}"))]
pub export_filename_format: String,
#[arg(help_heading("Export Options"), long, value_name("NAME"))]
#[arg(help("Export name used for {name} variable (default: export_<common-tags>)"))]
pub export_name: Option<String>,
#[arg(help_heading("Import Options"), long, value_name("DATA_FILE"))]
#[arg(help("Data file for import (reads from stdin if omitted)"))]
pub import_data_file: Option<PathBuf>,
}
/// Struct for general options, including verbosity, paths, and output settings.
@@ -122,7 +227,7 @@ pub struct OptionsArgs {
#[arg(
long,
env("KEEP_LIST_FORMAT"),
default_value("id,time,size,tags,meta:hostname")
default_value("id,time,size,meta:text_line_count,tags,meta:hostname_short,meta:command")
)]
#[arg(help("A comma separated list of columns to display with --list"))]
pub list_format: String,
@@ -131,6 +236,10 @@ pub struct OptionsArgs {
#[arg(help("Display file sizes with units"))]
pub human_readable: bool,
#[arg(long)]
#[arg(help("Only output item IDs (for scripting)"))]
pub ids_only: bool,
#[arg(short, long, action = clap::ArgAction::Count, conflicts_with("quiet"))]
#[arg(help("Increase message verbosity, can be given more than once"))]
pub verbose: u8,
@@ -143,14 +252,42 @@ pub struct OptionsArgs {
#[arg(help("Output format (only works with --info, --status, --list)"))]
pub output_format: Option<String>,
#[arg(long, env("KEEP_SERVER_PASSWORD"))]
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_PASSWORD"))]
#[arg(help("Password for server authentication (requires --server)"))]
pub server_password: Option<String>,
#[arg(long, env("KEEP_SERVER_PASSWORD_HASH"))]
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_PASSWORD_HASH"))]
#[arg(help("Password hash for server authentication (requires --server)"))]
pub server_password_hash: Option<String>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_USERNAME"))]
#[arg(help(
"Username for server Basic authentication (requires --server, defaults to 'keep')"
))]
pub server_username: Option<String>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_JWT_SECRET"))]
#[arg(help("JWT secret for token-based authentication (requires --server)"))]
pub server_jwt_secret: Option<String>,
#[cfg(feature = "server")]
#[arg(
help_heading("Server Options"),
long,
env("KEEP_SERVER_JWT_SECRET_FILE")
)]
#[arg(help("Path to file containing JWT secret (requires --server)"))]
pub server_jwt_secret_file: Option<PathBuf>,
#[cfg(feature = "server")]
#[arg(help_heading("Server Options"), long, env("KEEP_SERVER_MAX_BODY_SIZE"))]
#[arg(help("Maximum request body size in bytes (requires --server, default: unlimited)"))]
pub server_max_body_size: Option<u64>,
#[cfg(feature = "client")]
#[arg(long, env("KEEP_CLIENT_URL"), help_heading("Client Options"))]
#[arg(help("Remote keep server URL for client mode"))]
@@ -161,6 +298,16 @@ pub struct OptionsArgs {
#[arg(help("Password for remote keep server authentication"))]
pub client_password: Option<String>,
#[cfg(feature = "client")]
#[arg(long, env("KEEP_CLIENT_USERNAME"), help_heading("Client Options"))]
#[arg(help("Username for remote keep server authentication (defaults to 'keep')"))]
pub client_username: Option<String>,
#[cfg(feature = "client")]
#[arg(long, env("KEEP_CLIENT_JWT"), help_heading("Client Options"))]
#[arg(help("JWT token for remote keep server authentication"))]
pub client_jwt: Option<String>,
#[arg(
long,
help("Force output even when binary data would be sent to a TTY")

View File

@@ -1,33 +1,62 @@
use crate::services::error::CoreError;
use crate::services::{ItemInfo, error::CoreError};
use base64::Engine;
use serde::de::DeserializeOwned;
use std::collections::HashMap;
use std::io::Read;
/// Item information returned from the server API.
#[derive(Debug, Clone, serde::Deserialize, serde::Serialize)]
pub struct ItemInfo {
pub id: i64,
pub ts: String,
pub size: Option<i64>,
pub compression: String,
pub tags: Vec<String>,
pub metadata: HashMap<String, String>,
/// Percent-encode a value for use in a URL query string.
fn url_encode(s: &str) -> String {
let mut result = String::with_capacity(s.len() * 3);
for byte in s.bytes() {
match byte {
b'A'..=b'Z' | b'a'..=b'z' | b'0'..=b'9' | b'-' | b'_' | b'.' | b'~' => {
result.push(byte as char);
}
_ => {
result.push('%');
result.push(char::from_digit((byte >> 4) as u32, 16).unwrap());
result.push(char::from_digit((byte & 0xF) as u32, 16).unwrap());
}
}
}
result
}
fn append_query_params(url: &mut String, params: &[(&str, &str)]) {
if !params.is_empty() {
url.push('?');
for (i, (key, value)) in params.iter().enumerate() {
if i > 0 {
url.push('&');
}
url.push_str(&format!("{}={}", url_encode(key), url_encode(value)));
}
}
}
pub struct KeepClient {
base_url: String,
agent: ureq::Agent,
username: Option<String>,
password: Option<String>,
jwt: Option<String>,
}
impl KeepClient {
pub fn new(base_url: &str, password: Option<String>) -> Result<Self, CoreError> {
pub fn new(
base_url: &str,
username: Option<String>,
password: Option<String>,
jwt: Option<String>,
) -> Result<Self, CoreError> {
let base_url = base_url.trim_end_matches('/').to_string();
let agent = ureq::Agent::new_with_defaults();
Ok(Self {
base_url,
agent,
username,
password,
jwt,
})
}
@@ -35,14 +64,40 @@ impl KeepClient {
&self.base_url
}
pub fn username(&self) -> Option<&String> {
self.username.as_ref()
}
pub fn password(&self) -> Option<&String> {
self.password.as_ref()
}
pub fn jwt(&self) -> Option<&String> {
self.jwt.as_ref()
}
fn url(&self, path: &str) -> String {
format!("{}{}", self.base_url, path)
}
/// Get the Authorization header value for the current credentials.
///
/// JWT token is sent as `Bearer <token>`.
/// Password is sent as `Basic base64(username:password)`
/// where username defaults to "keep".
fn auth_header(&self) -> Option<String> {
if let Some(ref jwt) = self.jwt {
Some(format!("Bearer {jwt}"))
} else if let Some(ref password) = self.password {
let username = self.username.as_deref().unwrap_or("keep");
let credentials = format!("{username}:{password}");
let encoded = base64::engine::general_purpose::STANDARD.encode(&credentials);
Some(format!("Basic {encoded}"))
} else {
None
}
}
fn handle_error<T>(&self, result: Result<T, ureq::Error>) -> Result<T, CoreError> {
match result {
Ok(v) => Ok(v),
@@ -57,8 +112,8 @@ impl KeepClient {
pub fn get_json<T: DeserializeOwned>(&self, path: &str) -> Result<T, CoreError> {
let url = self.url(path);
let mut req = self.agent.get(&url);
if let Some(ref password) = self.password {
req = req.header("Authorization", &format!("Bearer {password}"));
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
let response = self.handle_error(req.call())?;
let body: T = self.handle_error(response.into_body().read_json())?;
@@ -71,18 +126,10 @@ impl KeepClient {
params: &[(&str, &str)],
) -> Result<T, CoreError> {
let mut url = self.url(path);
if !params.is_empty() {
url.push('?');
for (i, (key, value)) in params.iter().enumerate() {
if i > 0 {
url.push('&');
}
url.push_str(&format!("{key}={value}"));
}
}
append_query_params(&mut url, params);
let mut req = self.agent.get(&url);
if let Some(ref password) = self.password {
req = req.header("Authorization", &format!("Bearer {password}"));
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
let response = self.handle_error(req.call())?;
let body: T = self.handle_error(response.into_body().read_json())?;
@@ -92,8 +139,8 @@ impl KeepClient {
pub fn get_bytes(&self, path: &str) -> Result<Vec<u8>, CoreError> {
let url = self.url(path);
let mut req = self.agent.get(&url);
if let Some(ref password) = self.password {
req = req.header("Authorization", &format!("Bearer {password}"));
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
let response = self.handle_error(req.call())?;
let mut body = response.into_body();
@@ -124,19 +171,11 @@ impl KeepClient {
params: &[(&str, &str)],
) -> Result<ItemInfo, CoreError> {
let mut url = self.url(path);
if !params.is_empty() {
url.push('?');
for (i, (key, value)) in params.iter().enumerate() {
if i > 0 {
url.push('&');
}
url.push_str(&format!("{key}={value}"));
}
}
append_query_params(&mut url, params);
let mut req = self.agent.post(&url);
if let Some(ref password) = self.password {
req = req.header("Authorization", &format!("Bearer {password}"));
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
req = req.header("Content-Type", "application/octet-stream");
@@ -162,47 +201,84 @@ impl KeepClient {
pub fn delete(&self, path: &str) -> Result<(), CoreError> {
let url = self.url(path);
let mut req = self.agent.delete(&url);
if let Some(ref password) = self.password {
req = req.header("Authorization", &format!("Bearer {password}"));
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
self.handle_error(req.call())?;
Ok(())
}
pub fn get_status(&self) -> Result<serde_json::Value, CoreError> {
self.get_json("/api/status")
pub fn get_status(&self) -> Result<crate::common::status::StatusInfo, CoreError> {
#[derive(serde::Deserialize)]
struct ApiResponse {
data: Option<crate::common::status::StatusInfo>,
error: Option<String>,
}
let response: ApiResponse = self.get_json("/api/status")?;
response.data.ok_or_else(|| {
CoreError::Other(anyhow::anyhow!(
"{}",
response
.error
.unwrap_or_else(|| "No status data returned".to_string())
))
})
}
pub fn get_item_info(&self, id: i64) -> Result<ItemInfo, CoreError> {
#[derive(serde::Deserialize)]
struct ApiResponse {
data: Option<ItemInfo>,
error: Option<String>,
}
let response: ApiResponse = self.get_json(&format!("/api/item/{id}/info"))?;
response.data.ok_or_else(|| {
CoreError::Other(anyhow::anyhow!(
"{}",
response
.data
.ok_or_else(|| CoreError::Other(anyhow::anyhow!("Item not found")))
.error
.unwrap_or_else(|| "Item not found".to_string())
))
})
}
pub fn list_items(
&self,
ids: &[i64],
tags: &[String],
order: &str,
start: u64,
count: u64,
meta: &HashMap<String, Option<String>>,
) -> Result<Vec<ItemInfo>, CoreError> {
#[derive(serde::Deserialize)]
struct ApiResponse {
data: Option<Vec<ItemInfo>>,
error: Option<String>,
}
let mut params: Vec<(String, String)> = Vec::new();
params.push(("order".to_string(), order.to_string()));
params.push(("start".to_string(), start.to_string()));
params.push(("count".to_string(), count.to_string()));
if !ids.is_empty() {
params.push((
"ids".to_string(),
ids.iter()
.map(|i| i.to_string())
.collect::<Vec<_>>()
.join(","),
));
}
if !tags.is_empty() {
params.push(("tags".to_string(), tags.join(",")));
}
if !meta.is_empty() {
let meta_json = serde_json::to_string(meta).map_err(|e| {
CoreError::Other(anyhow::anyhow!("Failed to serialize meta filter: {}", e))
})?;
params.push(("meta".to_string(), meta_json));
}
let param_refs: Vec<(&str, &str)> = params
.iter()
@@ -210,7 +286,13 @@ impl KeepClient {
.collect();
let response: ApiResponse = self.get_json_with_query("/api/item/", &param_refs)?;
Ok(response.data.unwrap_or_default())
if let Some(data) = response.data {
return Ok(data);
}
if let Some(err) = response.error {
return Err(CoreError::Other(anyhow::anyhow!("Server error: {err}")));
}
Ok(Vec::new())
}
pub fn save_item(
@@ -254,8 +336,8 @@ impl KeepClient {
) -> Result<(), CoreError> {
let url = self.url(&format!("/api/item/{id}/meta"));
let mut req = self.agent.post(&url);
if let Some(ref password) = self.password {
req = req.header("Authorization", &format!("Bearer {password}"));
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
req = req.header("Content-Type", "application/json");
@@ -267,15 +349,44 @@ impl KeepClient {
Ok(())
}
/// Set the uncompressed size for an item.
pub fn set_item_size(&self, id: i64, size: u64) -> Result<(), CoreError> {
let url = format!(
"{}?uncompressed_size={}",
self.url(&format!("/api/item/{id}/update")),
url_encode(&size.to_string())
);
let mut req = self.agent.post(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
self.handle_error(req.send(ureq::SendBody::from_reader(&mut std::io::empty())))?;
Ok(())
}
pub fn get_item_content_raw(&self, id: i64) -> Result<(Vec<u8>, String), CoreError> {
let (mut reader, compression) = self.get_item_content_stream(id)?;
let mut bytes = Vec::new();
reader
.read_to_end(&mut bytes)
.map_err(|e| CoreError::Other(anyhow::anyhow!("{}", e)))?;
Ok((bytes, compression))
}
/// Get a streaming reader for item content without decompression.
///
/// Returns a reader over the HTTP response body and the compression type
/// from the X-Keep-Compression header. The caller can stream through
/// decompression readers without buffering the entire file in memory.
pub fn get_item_content_stream(&self, id: i64) -> Result<(Box<dyn Read>, String), CoreError> {
let url = format!(
"{}?decompress=false",
self.url(&format!("/api/item/{id}/content"))
);
let mut req = self.agent.get(&url);
if let Some(ref password) = self.password {
req = req.header("Authorization", &format!("Bearer {password}"));
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
let response = self.handle_error(req.call())?;
@@ -284,15 +395,11 @@ impl KeepClient {
.headers()
.get("X-Keep-Compression")
.and_then(|v| v.to_str().ok())
.unwrap_or("none")
.unwrap_or("raw")
.to_string();
let mut body = response.into_body();
let bytes = body
.read_to_vec()
.map_err(|e| CoreError::Other(anyhow::anyhow!("{}", e)))?;
Ok((bytes, compression))
let reader = response.into_body().into_reader();
Ok((Box::new(reader), compression))
}
pub fn diff_items(&self, id_a: i64, id_b: i64) -> Result<Vec<String>, CoreError> {
@@ -307,4 +414,101 @@ impl KeepClient {
let response: ApiResponse = self.get_json_with_query("/api/diff", &param_refs)?;
Ok(response.data.unwrap_or_default())
}
/// Export items to a tar archive, streaming the response to a file.
///
/// # Arguments
///
/// * `ids` - Item IDs to export (mutually exclusive with tags).
/// * `tags` - Tags to search for items (mutually exclusive with ids).
/// * `dest` - Destination file path.
pub fn export_items_to_file(
&self,
ids: &[i64],
tags: &[String],
dest: &std::path::Path,
) -> Result<(), CoreError> {
let mut params: Vec<(String, String)> = Vec::new();
if !ids.is_empty() {
let id_strs: Vec<String> = ids.iter().map(|id| id.to_string()).collect();
params.push(("ids".to_string(), id_strs.join(",")));
}
if !tags.is_empty() {
params.push(("tags".to_string(), tags.join(",")));
}
let param_refs: Vec<(&str, &str)> = params
.iter()
.map(|(k, v)| (k.as_str(), v.as_str()))
.collect();
let mut url = self.url("/api/export");
append_query_params(&mut url, &param_refs);
let mut req = self.agent.get(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
let response = self.handle_error(req.call())?;
let mut reader = response.into_body().into_reader();
let mut file = std::fs::File::create(dest).map_err(CoreError::Io)?;
let mut buf = [0u8; crate::common::PIPESIZE];
loop {
let n = reader.read(&mut buf).map_err(CoreError::Io)?;
if n == 0 {
break;
}
std::io::Write::write_all(&mut file, &buf[..n]).map_err(CoreError::Io)?;
}
Ok(())
}
/// Import items from a tar archive, streaming the file to the server.
///
/// # Arguments
///
/// * `tar_path` - Path to the `.keep.tar` file.
///
/// # Returns
///
/// A list of newly assigned item IDs.
pub fn import_tar_file(&self, tar_path: &std::path::Path) -> Result<Vec<i64>, CoreError> {
#[derive(serde::Deserialize)]
struct ApiResponse {
data: Option<ImportResponse>,
error: Option<String>,
}
#[derive(serde::Deserialize)]
struct ImportResponse {
ids: Vec<i64>,
}
let mut file = std::fs::File::open(tar_path).map_err(CoreError::Io)?;
let url = self.url("/api/import");
let mut req = self.agent.post(&url);
if let Some(ref auth) = self.auth_header() {
req = req.header("Authorization", auth);
}
req = req.header("Content-Type", "application/x-tar");
let response = self.handle_error(req.send(ureq::SendBody::from_reader(&mut file)))?;
let body = response
.into_body()
.read_to_string()
.map_err(|e| CoreError::InvalidInput(format!("Cannot read response: {e}")))?;
let api_response: ApiResponse = serde_json::from_str(&body)
.map_err(|e| CoreError::InvalidInput(format!("Cannot parse response: {e}")))?;
if let Some(error) = api_response.error {
return Err(CoreError::InvalidInput(error));
}
Ok(api_response.data.map(|d| d.ids).unwrap_or_default())
}
}

View File

@@ -229,3 +229,25 @@ fn calculate_printable_ratio(data: &[u8]) -> f64 {
printable_count as f64 / data.len() as f64
}
/// Check if content is binary, using metadata as a fast path.
///
/// First checks for a "text" metadata field:
/// - "false" means binary
/// - "true" means text
/// - Absent or other values fall back to byte sampling
///
/// # Arguments
///
/// * `metadata` - Key-value metadata map (e.g., from `meta_as_map()`)
/// * `data` - Byte sample to analyze if metadata is inconclusive
pub fn is_content_binary_from_metadata(
metadata: &std::collections::HashMap<String, String>,
data: &[u8],
) -> bool {
if let Some(text_val) = metadata.get("text") {
text_val == "false"
} else {
is_binary(data)
}
}

View File

@@ -3,5 +3,89 @@ pub mod is_binary;
/// Detects if data is binary or text based on signatures and printable ratios.
pub mod status;
/// Plugin schema types and discovery functions.
pub mod schema;
/// Standard buffer size for I/O operations (8KB)
pub const PIPESIZE: usize = 8192;
/// Reads chunks from `reader` until EOF, passing each chunk to `f`.
///
/// Uses a fixed PIPESIZE buffer to ensure bounded memory usage.
pub fn stream_copy<R: std::io::Read + ?Sized>(
reader: &mut R,
mut f: impl FnMut(&[u8]) -> std::io::Result<()>,
) -> std::io::Result<()> {
let mut buffer = [0u8; PIPESIZE];
loop {
let n = reader.read(&mut buffer)?;
if n == 0 {
break;
}
f(&buffer[..n])?;
}
Ok(())
}
/// Reads content from a reader with offset and length bounds.
///
/// Skips `offset` bytes from the reader, then reads up to `length` bytes
/// (or all remaining if `length` is 0). Uses PIPESIZE buffers throughout.
///
/// # Arguments
///
/// * `reader` - The source reader positioned at the start.
/// * `offset` - Number of bytes to skip before reading.
/// * `length` - Maximum bytes to read (0 = read all remaining).
/// * `content_len` - Total content size (used to cap skip/read amounts).
///
/// # Returns
///
/// A `Vec<u8>` containing the requested byte range.
pub fn read_with_bounds<R: std::io::Read>(
reader: &mut R,
offset: u64,
length: u64,
content_len: u64,
) -> std::io::Result<Vec<u8>> {
// Skip offset bytes
let skip = std::cmp::min(offset, content_len);
let mut remaining = skip;
let mut buf = [0u8; PIPESIZE];
while remaining > 0 {
let to_read = std::cmp::min(remaining, buf.len() as u64) as usize;
match reader.read(&mut buf[..to_read]) {
Ok(0) => break,
Ok(n) => remaining -= n as u64,
Err(e) => return Err(e),
}
}
// Read bounded content
let max_bytes = if length > 0 {
std::cmp::min(length, content_len.saturating_sub(offset))
} else {
content_len.saturating_sub(offset)
};
let mut result = Vec::with_capacity(std::cmp::min(max_bytes, 64 * 1024) as usize);
let mut bytes_read = 0u64;
while bytes_read < max_bytes {
let to_read = std::cmp::min(max_bytes - bytes_read, buf.len() as u64) as usize;
match reader.read(&mut buf[..to_read]) {
Ok(0) => break,
Ok(n) => {
result.extend_from_slice(&buf[..n]);
bytes_read += n as u64;
}
Err(e) => return Err(e),
}
}
Ok(result)
}
/// Sanitize a timestamp string for use in filenames.
///
/// Replaces colons with hyphens (e.g., `2026-03-17T12:00:00Z` → `2026-03-17T12-00-00Z`).
pub fn sanitize_ts_string(ts: &str) -> String {
ts.replace(':', "-")
}

166
src/common/schema.rs Normal file
View File

@@ -0,0 +1,166 @@
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use strum::IntoEnumIterator;
/// Value type for a plugin option.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum OptionType {
String,
Integer,
Boolean,
Any,
}
impl OptionType {
/// Infer the option type from a YAML value.
pub fn from_yaml_value(value: &serde_yaml::Value) -> Self {
match value {
serde_yaml::Value::Bool(_) => OptionType::Boolean,
serde_yaml::Value::Number(_) => OptionType::Integer,
serde_yaml::Value::String(_) => OptionType::String,
_ => OptionType::Any,
}
}
}
/// Schema for a single plugin option.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OptionSchema {
pub name: String,
pub option_type: OptionType,
pub default: Option<serde_yaml::Value>,
pub required: bool,
}
/// Schema for a single plugin output.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OutputSchema {
pub name: String,
pub description: String,
}
/// Schema describing a plugin's configuration requirements.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct PluginSchema {
pub name: String,
pub description: String,
pub options: Vec<OptionSchema>,
pub outputs: Vec<OutputSchema>,
}
/// Gathers schemas from all registered meta plugins.
///
/// Iterates all `MetaPluginType` variants, attempts to create a default instance,
/// and collects their schemas. Plugins that fail to register (e.g., feature-gated)
/// are silently skipped.
pub fn gather_meta_plugin_schemas() -> Vec<PluginSchema> {
use crate::meta_plugin::{MetaPluginType, get_meta_plugin};
let mut schemas = Vec::new();
let mut sorted_types: Vec<MetaPluginType> = MetaPluginType::iter().collect();
sorted_types.sort_by_key(|t| t.to_string());
for plugin_type in sorted_types {
let plugin = match get_meta_plugin(plugin_type.clone(), None, None) {
Ok(p) => p,
Err(_) => continue,
};
let name = plugin.meta_type().to_string();
let options: Vec<OptionSchema> = plugin
.options()
.iter()
.map(|(key, value)| {
let option_type = OptionType::from_yaml_value(value);
let (default, required) = if value.is_null() {
(None, true)
} else {
(Some(value.clone()), false)
};
OptionSchema {
name: key.clone(),
option_type,
default,
required,
}
})
.collect();
let mut outputs: Vec<OutputSchema> = Vec::new();
for (key, value) in plugin.outputs() {
if !value.is_null() {
outputs.push(OutputSchema {
name: key.clone(),
description: key.clone(),
});
}
}
// Also include default outputs if outputs map is empty
if outputs.is_empty() {
for output_name in plugin.default_outputs() {
outputs.push(OutputSchema {
name: output_name.clone(),
description: output_name,
});
}
}
schemas.push(PluginSchema {
name,
description: plugin.description().to_string(),
options,
outputs,
});
}
schemas
}
/// Gathers schemas from all registered filter plugins.
///
/// Uses the global filter plugin registry to discover all registered filters,
/// creates a default instance of each, and collects their option schemas.
pub fn gather_filter_plugin_schemas() -> Vec<PluginSchema> {
use crate::services::filter_service::get_available_filter_plugins;
let plugins = get_available_filter_plugins().unwrap_or_default();
let mut schemas: Vec<PluginSchema> = plugins
.into_iter()
.map(|(name, creator)| {
let plugin = creator();
let options: Vec<OptionSchema> = plugin
.options()
.iter()
.map(|opt| {
let option_type = match &opt.default {
Some(serde_json::Value::Bool(_)) => OptionType::Boolean,
Some(serde_json::Value::Number(_)) => OptionType::Integer,
Some(serde_json::Value::String(_)) => OptionType::String,
_ => OptionType::Any,
};
OptionSchema {
name: opt.name.clone(),
option_type,
default: opt.default.as_ref().map(|v| {
// Convert serde_json::Value to serde_yaml::Value
serde_yaml::to_value(v).unwrap_or(serde_yaml::Value::Null)
}),
required: opt.required,
}
})
.collect();
PluginSchema {
name: name.clone(),
description: plugin.description().to_string(),
options,
outputs: Vec::new(),
}
})
.collect();
schemas.sort_by(|a, b| a.name.cmp(&b.name));
schemas
}

View File

@@ -27,6 +27,22 @@ pub struct StatusInfo {
pub configured_meta_plugins: Option<Vec<crate::config::MetaPluginConfig>>,
}
impl Default for StatusInfo {
fn default() -> Self {
Self {
paths: PathInfo {
data: String::new(),
database: String::new(),
},
compression: Vec::new(),
meta_plugins: std::collections::HashMap::new(),
enabled_meta_plugins: Vec::new(),
filter_plugins: Vec::new(),
configured_meta_plugins: None,
}
}
}
#[derive(serde::Serialize, serde::Deserialize)]
#[cfg_attr(feature = "server", derive(ToSchema))]
pub struct PathInfo {
@@ -59,21 +75,21 @@ pub fn generate_status_info(
db_path: PathBuf,
enabled_meta_plugins: &[MetaPluginType],
enabled_compression_type: Option<CompressionType>,
) -> StatusInfo {
) -> anyhow::Result<StatusInfo> {
log::debug!("STATUS: Starting status info generation");
let path_info = PathInfo {
data: data_path
.into_os_string()
.into_string()
.expect("Unable to convert data path to string"),
.map_err(|_| anyhow::anyhow!("Unable to convert data path to string"))?,
database: db_path
.into_os_string()
.into_string()
.expect("Unable to convert DB path to string"),
.map_err(|_| anyhow::anyhow!("Unable to convert DB path to string"))?,
};
let _default_type = crate::compression_engine::default_compression_type();
let mut compression_info = Vec::new();
let mut compression_info = Vec::with_capacity(CompressionType::iter().count());
// Sort compression types by their string representation
let mut sorted_compression_types: Vec<CompressionType> = CompressionType::iter().collect();
@@ -125,7 +141,8 @@ pub fn generate_status_info(
});
}
let mut meta_plugins_map = std::collections::HashMap::new();
let mut meta_plugins_map =
std::collections::HashMap::with_capacity(MetaPluginType::iter().count());
let mut enabled_meta_plugins_vec = Vec::new();
// Sort meta plugin types by their string representation to avoid creating plugins just for sorting
@@ -134,9 +151,16 @@ pub fn generate_status_info(
for meta_plugin_type in sorted_meta_plugins {
log::debug!("STATUS: Processing meta plugin type: {meta_plugin_type:?}");
log::debug!("STATUS: About to call get_meta_plugin");
let meta_plugin = crate::meta_plugin::get_meta_plugin(meta_plugin_type.clone(), None, None);
log::debug!("STATUS: Created meta plugin instance");
let meta_plugin =
match crate::meta_plugin::get_meta_plugin(meta_plugin_type.clone(), None, None) {
Ok(p) => p,
Err(e) => {
log::warn!(
"STATUS: Skipping unregistered meta plugin {meta_plugin_type:?}: {e}"
);
continue;
}
};
// Get meta name first to avoid borrowing issues
log::debug!("STATUS: Getting meta name...");
@@ -175,12 +199,26 @@ pub fn generate_status_info(
);
}
StatusInfo {
// Populate filter plugin info from the global registry
let filter_plugins_map = crate::services::filter_service::get_available_filter_plugins()?;
let filter_plugins_info: Vec<FilterPluginInfo> = filter_plugins_map
.into_iter()
.map(|(name, creator)| {
let plugin = creator();
FilterPluginInfo {
name: name.clone(),
options: plugin.options(),
description: format!("{name} filter plugin"),
}
})
.collect();
Ok(StatusInfo {
paths: path_info,
compression: compression_info,
meta_plugins: meta_plugins_map,
enabled_meta_plugins: enabled_meta_plugins_vec,
filter_plugins: Vec::new(),
filter_plugins: filter_plugins_info,
configured_meta_plugins: None,
}
})
}

View File

@@ -93,10 +93,22 @@ impl<W: Write> Drop for AutoFinishGzEncoder<W> {
#[cfg(feature = "gzip")]
impl<W: Write> Write for AutoFinishGzEncoder<W> {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
self.encoder.as_mut().unwrap().write(buf)
match self.encoder.as_mut() {
Some(encoder) => encoder.write(buf),
None => Err(io::Error::new(
io::ErrorKind::BrokenPipe,
"encoder already finished",
)),
}
}
fn flush(&mut self) -> io::Result<()> {
self.encoder.as_mut().unwrap().flush()
match self.encoder.as_mut() {
Some(encoder) => encoder.flush(),
None => Err(io::Error::new(
io::ErrorKind::BrokenPipe,
"encoder already finished",
)),
}
}
}

View File

@@ -1,23 +1,34 @@
#[cfg(feature = "lz4")]
use anyhow::Result;
#[cfg(feature = "lz4")]
use log::*;
#[cfg(feature = "lz4")]
use std::io::Write;
#[cfg(feature = "lz4")]
use lz4_flex::frame::{FrameDecoder, FrameEncoder};
#[cfg(feature = "lz4")]
use std::fs::File;
#[cfg(feature = "lz4")]
use std::io::Read;
#[cfg(feature = "lz4")]
use std::path::PathBuf;
#[cfg(feature = "lz4")]
use crate::compression_engine::CompressionEngine;
#[cfg(feature = "lz4")]
#[derive(Debug, Eq, PartialEq, Clone, Default)]
pub struct CompressionEngineLZ4 {}
#[cfg(feature = "lz4")]
impl CompressionEngineLZ4 {
pub fn new() -> CompressionEngineLZ4 {
CompressionEngineLZ4 {}
}
}
#[cfg(feature = "lz4")]
impl CompressionEngine for CompressionEngineLZ4 {
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read + Send>> {
debug!("COMPRESSION: Opening {:?} using {:?}", file_path, *self);

View File

@@ -7,16 +7,15 @@ use strum::{Display, EnumIter, EnumString};
use log::*;
use lazy_static::lazy_static;
extern crate enum_map;
use enum_map::enum_map;
use enum_map::{Enum, EnumMap};
pub mod gzip;
pub mod lz4;
pub mod none;
pub mod program;
pub mod raw;
pub mod zstd;
use crate::compression_engine::program::CompressionEngineProgram;
@@ -34,12 +33,18 @@ use crate::compression_engine::program::CompressionEngineProgram;
#[derive(Debug, Eq, PartialEq, Clone, EnumIter, Display, EnumString, enum_map::Enum)]
#[strum(ascii_case_insensitive)]
pub enum CompressionType {
#[strum(serialize = "lz4")]
LZ4,
#[strum(serialize = "gzip")]
GZip,
#[strum(serialize = "bzip2")]
BZip2,
#[strum(serialize = "xz")]
XZ,
#[strum(serialize = "zstd")]
ZStd,
None,
#[strum(to_string = "raw", serialize = "raw", serialize = "none")]
Raw,
}
/// Trait defining the interface for compression engines.
@@ -173,10 +178,14 @@ impl Clone for Box<dyn CompressionEngine> {
}
}
lazy_static! {
static ref COMPRESSION_ENGINES: EnumMap<CompressionType, Box<dyn CompressionEngine>> = {
let mut em = enum_map! {
CompressionType::LZ4 => Box::new(crate::compression_engine::lz4::CompressionEngineLZ4::new()) as Box<dyn CompressionEngine>,
fn init_compression_engines() -> EnumMap<CompressionType, Box<dyn CompressionEngine>> {
#[allow(unused_mut)]
let mut em: EnumMap<CompressionType, Box<dyn CompressionEngine>> = enum_map! {
CompressionType::LZ4 => Box::new(crate::compression_engine::program::CompressionEngineProgram::new(
"lz4",
vec!["-c"],
vec!["-d", "-c"]
)) as Box<dyn CompressionEngine>,
CompressionType::GZip => Box::new(crate::compression_engine::program::CompressionEngineProgram::new(
"gzip",
vec!["-c"],
@@ -197,7 +206,7 @@ lazy_static! {
vec!["-c"],
vec!["-d", "-c"]
)) as Box<dyn CompressionEngine>,
CompressionType::None => Box::new(crate::compression_engine::none::CompressionEngineNone::new()) as Box<dyn CompressionEngine>
CompressionType::Raw => Box::new(crate::compression_engine::raw::CompressionEngineRaw::new()) as Box<dyn CompressionEngine>
};
#[cfg(feature = "gzip")]
@@ -207,10 +216,27 @@ lazy_static! {
as Box<dyn CompressionEngine>;
}
#[cfg(feature = "lz4")]
{
em[CompressionType::LZ4] =
Box::new(crate::compression_engine::lz4::CompressionEngineLZ4::new())
as Box<dyn CompressionEngine>;
}
#[cfg(feature = "zstd")]
{
em[CompressionType::ZStd] =
Box::new(crate::compression_engine::zstd::CompressionEngineZstd::new())
as Box<dyn CompressionEngine>;
}
em
};
}
static COMPRESSION_ENGINES: std::sync::LazyLock<
EnumMap<CompressionType, Box<dyn CompressionEngine>>,
> = std::sync::LazyLock::new(init_compression_engines);
pub fn default_compression_type() -> CompressionType {
CompressionType::LZ4
}
@@ -220,9 +246,6 @@ pub fn get_compression_engine(ct: CompressionType) -> Result<Box<dyn Compression
if engine.is_supported() {
Ok(engine.clone())
} else {
Err(anyhow!(
"Compression engine for {} is not supported",
ct.to_string()
))
Err(anyhow!("Compression engine for {ct} is not supported",))
}
}

View File

@@ -15,7 +15,13 @@ pub struct ProgramReader {
impl Read for ProgramReader {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
self.stdout.as_mut().unwrap().read(buf)
match self.stdout.as_mut() {
Some(stdout) => stdout.read(buf),
None => Err(std::io::Error::new(
std::io::ErrorKind::BrokenPipe,
"stdout already taken",
)),
}
}
}
@@ -33,11 +39,23 @@ pub struct ProgramWriter {
impl Write for ProgramWriter {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
self.stdin.as_mut().unwrap().write(buf)
match self.stdin.as_mut() {
Some(stdin) => stdin.write(buf),
None => Err(std::io::Error::new(
std::io::ErrorKind::BrokenPipe,
"stdin already taken",
)),
}
}
fn flush(&mut self) -> std::io::Result<()> {
self.stdin.as_mut().unwrap().flush()
match self.stdin.as_mut() {
Some(stdin) => stdin.flush(),
None => Err(std::io::Error::new(
std::io::ErrorKind::BrokenPipe,
"stdin already taken",
)),
}
}
}

View File

@@ -7,15 +7,15 @@ use std::path::PathBuf;
use crate::compression_engine::CompressionEngine;
#[derive(Debug, Eq, PartialEq, Clone, Default)]
pub struct CompressionEngineNone {}
pub struct CompressionEngineRaw {}
impl CompressionEngineNone {
pub fn new() -> CompressionEngineNone {
CompressionEngineNone {}
impl CompressionEngineRaw {
pub fn new() -> CompressionEngineRaw {
CompressionEngineRaw {}
}
}
impl CompressionEngine for CompressionEngineNone {
impl CompressionEngine for CompressionEngineRaw {
fn is_supported(&self) -> bool {
true
}

View File

@@ -0,0 +1,54 @@
#[cfg(feature = "zstd")]
use anyhow::Result;
#[cfg(feature = "zstd")]
use log::*;
#[cfg(feature = "zstd")]
use std::io::Write;
#[cfg(feature = "zstd")]
use std::fs::File;
#[cfg(feature = "zstd")]
use std::io::Read;
#[cfg(feature = "zstd")]
use std::path::PathBuf;
#[cfg(feature = "zstd")]
use zstd::stream::read::Decoder;
#[cfg(feature = "zstd")]
use zstd::stream::write::Encoder;
#[cfg(feature = "zstd")]
use crate::compression_engine::CompressionEngine;
#[cfg(feature = "zstd")]
#[derive(Debug, Eq, PartialEq, Clone, Default)]
pub struct CompressionEngineZstd {}
#[cfg(feature = "zstd")]
impl CompressionEngineZstd {
pub fn new() -> CompressionEngineZstd {
CompressionEngineZstd {}
}
}
#[cfg(feature = "zstd")]
impl CompressionEngine for CompressionEngineZstd {
fn open(&self, file_path: PathBuf) -> Result<Box<dyn Read + Send>> {
debug!("COMPRESSION: Opening {:?} using {:?}", file_path, *self);
let file = File::open(file_path)?;
Ok(Box::new(Decoder::new(file)?))
}
fn create(&self, file_path: PathBuf) -> Result<Box<dyn Write>> {
debug!("COMPRESSION: Writing to {:?} using {:?}", file_path, *self);
let file = File::create(file_path)?;
let zstd_write = Encoder::new(file, 3)?.auto_finish();
Ok(Box::new(zstd_write))
}
fn clone_box(&self) -> Box<dyn CompressionEngine> {
Box::new(self.clone())
}
}

View File

@@ -4,7 +4,7 @@ use dirs;
use log::{debug, error};
use serde::{Deserialize, Serialize};
use std::fs;
use std::path::PathBuf;
use std::path::{Path, PathBuf};
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
#[serde(rename_all = "lowercase")]
@@ -143,11 +143,16 @@ impl<'de> serde::Deserialize<'de> for ColumnConfig {
pub struct ServerConfig {
pub address: Option<String>,
pub port: Option<u16>,
pub username: Option<String>,
pub password_file: Option<PathBuf>,
pub password: Option<String>,
pub password_hash: Option<String>,
pub jwt_secret: Option<String>,
pub jwt_secret_file: Option<PathBuf>,
pub cert_file: Option<PathBuf>,
pub key_file: Option<PathBuf>,
pub cors_origin: Option<String>,
pub max_body_size: Option<u64>,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
@@ -158,7 +163,9 @@ pub struct CompressionPluginConfig {
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct ClientConfig {
pub url: Option<String>,
pub username: Option<String>,
pub password: Option<String>,
pub jwt: Option<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
@@ -184,6 +191,8 @@ pub struct Settings {
pub table_config: TableConfig,
#[serde(default)]
pub human_readable: bool,
#[serde(default)]
pub ids_only: bool,
pub output_format: Option<String>,
#[serde(default)]
pub quiet: bool,
@@ -197,7 +206,23 @@ pub struct Settings {
#[serde(skip)]
pub client_url: Option<String>,
#[serde(skip)]
pub client_username: Option<String>,
#[serde(skip)]
pub client_password: Option<String>,
#[serde(skip)]
pub client_jwt: Option<String>,
// Metadata key-value pairs from --meta CLI flag
#[serde(skip)]
pub meta: Vec<(String, Option<String>)>,
// Export filename format template (--export-filename-format)
#[serde(skip)]
pub export_filename_format: String,
// Export name for {name} variable (--export-name)
#[serde(skip)]
pub export_name: Option<String>,
// Import data file path (--import-data-file)
#[serde(skip)]
pub import_data_file: Option<std::path::PathBuf>,
}
impl Settings {
@@ -210,15 +235,13 @@ impl Settings {
} else if let Ok(env_config) = std::env::var("KEEP_CONFIG") {
PathBuf::from(env_config)
} else {
let default_path = if let Ok(home_dir) = std::env::var("HOME") {
let mut path = PathBuf::from(home_dir);
path.push(".config");
path.push("keep");
path.push("config.yml");
path
} else {
PathBuf::from("~/.config/keep/config.yml")
};
let default_path = dirs::config_dir()
.map(|mut p| {
p.push("keep");
p.push("config.yml");
p
})
.unwrap_or_else(|| PathBuf::from("~/.config/keep/config.yml"));
debug!("CONFIG: Using default config path: {default_path:?}");
default_path
};
@@ -246,13 +269,21 @@ impl Settings {
// Override with CLI args
if let Some(dir) = &args.options.dir {
debug!("CONFIG: Overriding dir with CLI arg: {dir:?}");
config_builder = config_builder.set_override("dir", dir.to_str().unwrap())?;
config_builder = config_builder.set_override(
"dir",
dir.to_str()
.ok_or_else(|| anyhow::anyhow!("non-UTF-8 directory path"))?,
)?;
}
if args.options.human_readable {
config_builder = config_builder.set_override("human_readable", true)?;
}
if args.options.ids_only {
config_builder = config_builder.set_override("ids_only", true)?;
}
if let Some(output_format) = &args.options.output_format {
config_builder =
config_builder.set_override("output_format", output_format.as_str())?;
@@ -270,55 +301,59 @@ impl Settings {
config_builder = config_builder.set_override("force", true)?;
}
#[cfg(feature = "server")]
if let Some(server_password) = &args.options.server_password {
config_builder =
config_builder.set_override("server.password", server_password.as_str())?;
}
#[cfg(feature = "server")]
if let Some(server_password_hash) = &args.options.server_password_hash {
config_builder = config_builder
.set_override("server.password_hash", server_password_hash.as_str())?;
}
#[cfg(feature = "server")]
if let Some(server_username) = &args.options.server_username {
config_builder =
config_builder.set_override("server.username", server_username.as_str())?;
}
#[cfg(feature = "server")]
if let Some(server_address) = &args.mode.server_address {
config_builder =
config_builder.set_override("server.address", server_address.as_str())?;
}
#[cfg(feature = "server")]
if let Some(server_port) = args.mode.server_port {
config_builder = config_builder.set_override("server.port", server_port)?;
}
#[cfg(feature = "tls")]
#[cfg(feature = "server")]
if let Some(server_cert) = &args.mode.server_cert {
config_builder = config_builder
.set_override("server.cert_file", server_cert.to_string_lossy().as_ref())?;
}
#[cfg(feature = "tls")]
#[cfg(feature = "server")]
if let Some(server_key) = &args.mode.server_key {
config_builder = config_builder
.set_override("server.key_file", server_key.to_string_lossy().as_ref())?;
}
#[cfg(feature = "server")]
if let Some(max_body_size) = args.options.server_max_body_size {
config_builder = config_builder.set_override("server.max_body_size", max_body_size)?;
}
if let Some(compression) = &args.item.compression {
config_builder =
config_builder.set_override("compression_plugin.name", compression.as_str())?;
}
if !args.item.meta_plugins.is_empty() {
let meta_plugins: Vec<std::collections::HashMap<String, String>> = args
.item
.meta_plugins
.iter()
.map(|name| {
let mut map = std::collections::HashMap::new();
map.insert("name".to_string(), name.clone());
map
})
.collect();
config_builder = config_builder.set_override("meta_plugins", meta_plugins)?;
}
// Build MetaPluginConfig entries from --meta-plugin args (name[:json])
// These are handled after config deserialization (see below).
let config = config_builder.build()?;
debug!("CONFIG: Built config, attempting to deserialize");
@@ -414,6 +449,59 @@ impl Settings {
}]);
}
// Override meta_plugins from --meta-plugin CLI args
if !args.item.meta_plugins.is_empty() {
debug!("CONFIG: Overriding meta_plugins from --meta-plugin CLI args");
let cli_plugins: Vec<MetaPluginConfig> = args
.item
.meta_plugins
.iter()
.map(|arg| {
let mut options = std::collections::HashMap::new();
let mut outputs = std::collections::HashMap::new();
if let Some(serde_json::Value::Object(obj)) = &arg.options {
// Extract options and outputs from JSON value
if let Some(serde_json::Value::Object(opts_obj)) =
obj.get("options")
{
for (k, v) in opts_obj {
let yaml_str = serde_json::to_string(v).unwrap_or_default();
let yaml_val: serde_yaml::Value =
serde_yaml::from_str(&yaml_str)
.unwrap_or(serde_yaml::Value::Null);
options.insert(k.clone(), yaml_val);
}
}
if let Some(serde_json::Value::Object(outs_obj)) =
obj.get("outputs")
{
for (k, v) in outs_obj {
let val_str = match v {
serde_json::Value::String(s) => s.clone(),
_ => v.to_string(),
};
outputs.insert(k.clone(), val_str);
}
}
}
MetaPluginConfig {
name: arg.name.clone(),
options,
outputs,
}
})
.collect();
settings.meta_plugins = Some(cli_plugins);
}
// Override list_format from --list-format CLI arg
if args.options.list_format
!= "id,time,size,meta:text_line_count,tags,meta:hostname_short,meta:command"
{
debug!("CONFIG: Overriding list_format from --list-format CLI arg");
settings.list_format = Settings::parse_list_format(&args.options.list_format);
}
// Set dir to default if not provided or is empty
if settings.dir == PathBuf::new() {
debug!("CONFIG: Setting default dir: {default_dir:?}");
@@ -428,11 +516,59 @@ impl Settings {
.client_url
.clone()
.or_else(|| settings.client.as_ref().and_then(|c| c.url.clone()));
settings.client_username = args
.options
.client_username
.clone()
.or_else(|| settings.client.as_ref().and_then(|c| c.username.clone()));
settings.client_password = args
.options
.client_password
.clone()
.or_else(|| settings.client.as_ref().and_then(|c| c.password.clone()));
settings.client_jwt = args
.options
.client_jwt
.clone()
.or_else(|| settings.client.as_ref().and_then(|c| c.jwt.clone()));
}
// Parse --meta key=value and bare key arguments
settings.meta = args
.item
.meta
.iter()
.map(|s| {
if let Some((key, value)) = s.split_once('=') {
(key.to_string(), Some(value.to_string()))
} else {
(s.to_string(), None)
}
})
.collect();
// Set export filename format from CLI args
settings.export_filename_format = args.item.export_filename_format.clone();
settings.export_name = args.item.export_name.clone();
settings.import_data_file = args.item.import_data_file.clone();
// Expand ~ in all path fields
settings.dir = Settings::expand_tilde(&settings.dir);
settings.import_data_file = settings
.import_data_file
.as_ref()
.map(|p| Settings::expand_tilde(p));
if let Some(ref mut server) = settings.server {
server.password_file = server
.password_file
.as_ref()
.map(|p| Settings::expand_tilde(p));
server.jwt_secret_file = server
.jwt_secret_file
.as_ref()
.map(|p| Settings::expand_tilde(p));
server.cert_file = server.cert_file.as_ref().map(|p| Settings::expand_tilde(p));
server.key_file = server.key_file.as_ref().map(|p| Settings::expand_tilde(p));
}
debug!("CONFIG: Final settings: {settings:?}");
@@ -447,24 +583,42 @@ impl Settings {
pub fn default_dir() -> anyhow::Result<PathBuf> {
let mut path =
dirs::home_dir().ok_or_else(|| anyhow::anyhow!("No home directory found"))?;
path.push(".keep");
dirs::data_dir().ok_or_else(|| anyhow::anyhow!("No data directory found"))?;
path.push("keep");
if !path.exists() {
std::fs::create_dir_all(&path)?;
}
Ok(path)
}
/// Expand a leading `~` in a path to the user's home directory.
///
/// Returns the path unchanged if it doesn't start with `~` or if the
/// home directory cannot be determined.
fn expand_tilde(path: &Path) -> PathBuf {
let path_str = path.to_string_lossy();
if let Some(rest) = path_str.strip_prefix("~/") {
if let Some(home) = dirs::home_dir() {
return home.join(rest);
}
} else if path_str == "~" {
if let Some(home) = dirs::home_dir() {
return home;
}
}
path.to_path_buf()
}
/// Get server password from password_file or directly from config if configured
pub fn get_server_password(&self) -> Result<Option<String>> {
if let Some(server) = &self.server {
// First check for password_file
if let Some(password_file) = &server.password_file {
debug!("CONFIG: Reading password from file: {password_file:?}");
let password = fs::read_to_string(password_file)
.with_context(|| format!("Failed to read password file: {password_file:?}"))?
.trim()
.to_string();
let password = fs::read(password_file)
.with_context(|| format!("Failed to read password file: {password_file:?}"))?;
let end = password.len().min(4096);
let password = String::from_utf8_lossy(&password[..end]).trim().to_string();
return Ok(Some(password));
}
@@ -486,6 +640,37 @@ impl Settings {
self.server.as_ref().and_then(|s| s.password_hash.clone())
}
pub fn server_username(&self) -> Option<String> {
self.server.as_ref().and_then(|s| s.username.clone())
}
/// Get JWT secret from jwt_secret_file or directly from config if configured
pub fn get_server_jwt_secret(&self) -> Result<Option<String>> {
if let Some(server) = &self.server {
// First check for jwt_secret_file
if let Some(jwt_secret_file) = &server.jwt_secret_file {
debug!("CONFIG: Reading JWT secret from file: {jwt_secret_file:?}");
let secret = fs::read(jwt_secret_file).with_context(|| {
format!("Failed to read JWT secret file: {jwt_secret_file:?}")
})?;
let end = secret.len().min(4096);
let secret = String::from_utf8_lossy(&secret[..end]).trim().to_string();
return Ok(Some(secret));
}
// Fall back to direct jwt_secret field
if let Some(secret) = &server.jwt_secret {
debug!("CONFIG: Using JWT secret from config");
return Ok(Some(secret.clone()));
}
}
Ok(None)
}
pub fn server_jwt_secret(&self) -> Option<String> {
self.get_server_jwt_secret().ok().flatten()
}
pub fn server_address(&self) -> Option<String> {
self.server.as_ref().and_then(|s| s.address.clone())
}
@@ -502,6 +687,10 @@ impl Settings {
self.server.as_ref().and_then(|s| s.key_file.clone())
}
pub fn server_cors_origin(&self) -> Option<String> {
self.server.as_ref().and_then(|s| s.cors_origin.clone())
}
pub fn compression(&self) -> Option<String> {
self.compression_plugin.as_ref().map(|c| c.name.clone())
}
@@ -512,4 +701,142 @@ impl Settings {
.map(|plugins| plugins.iter().map(|p| p.name.clone()).collect())
.unwrap_or_default()
}
/// Returns the metadata filter as a HashMap.
///
/// Converts the `meta` field (list of key-value pairs from CLI --meta flags)
/// into a `HashMap<String, Option<String>>` suitable for filtering.
pub fn meta_filter(&self) -> std::collections::HashMap<String, Option<String>> {
self.meta.iter().cloned().collect()
}
/// Validates the configuration against plugin schemas.
///
/// Checks that:
/// - All configured meta plugin names are valid and registered
/// - Required options are present for each meta plugin
/// - Compression plugin name (if set) is a valid compression type
///
/// Returns a list of warning strings. An empty list means the config is valid.
pub fn validate_config(&self) -> Vec<String> {
use crate::common::schema::gather_meta_plugin_schemas;
use crate::compression_engine::CompressionType;
use strum::IntoEnumIterator;
let mut warnings = Vec::new();
// Validate compression plugin
if let Some(ref comp) = self.compression_plugin {
let valid_types: Vec<String> =
CompressionType::iter().map(|ct| ct.to_string()).collect();
if !valid_types.contains(&comp.name) {
warnings.push(format!(
"Unknown compression_plugin.name: '{}'. Valid types: {}",
comp.name,
valid_types.join(", ")
));
}
}
// Validate meta plugins
if let Some(ref plugins) = self.meta_plugins {
let schemas = gather_meta_plugin_schemas();
let schema_map: std::collections::HashMap<&str, &crate::common::schema::PluginSchema> =
schemas.iter().map(|s| (s.name.as_str(), s)).collect();
for plugin in plugins {
match schema_map.get(plugin.name.as_str()) {
Some(schema) => {
// Check required options
for opt in &schema.options {
if opt.required && !plugin.options.contains_key(&opt.name) {
warnings.push(format!(
"Meta plugin '{}': missing required option '{}'",
plugin.name, opt.name
));
}
}
}
None => {
warnings.push(format!(
"Unknown meta plugin: '{}'. Available: {}",
plugin.name,
schema_map.keys().copied().collect::<Vec<_>>().join(", ")
));
}
}
}
}
warnings
}
/// Parse a comma-separated column list string into Vec<ColumnConfig>.
///
/// Maps known column names to their default labels and alignment.
/// For unknown names (including meta:* columns), uses the name as its own label.
fn parse_list_format(input: &str) -> Vec<ColumnConfig> {
input
.split(',')
.map(|s| s.trim())
.filter(|s| !s.is_empty())
.map(|name| {
let (label, align) = match name {
"id" => ("Item", ColumnAlignment::Right),
"time" => ("Time", ColumnAlignment::Right),
"size" => ("Size", ColumnAlignment::Right),
"meta:text_line_count" => ("Lines", ColumnAlignment::Right),
"meta:token_count" => ("Tokens", ColumnAlignment::Right),
"tags" => ("Tags", ColumnAlignment::Left),
"meta:hostname_short" => ("Host", ColumnAlignment::Left),
"meta:hostname" => ("Host", ColumnAlignment::Left),
"meta:command" => ("Command", ColumnAlignment::Left),
"compression" => ("Compression", ColumnAlignment::Left),
other if other.starts_with("meta:") => {
let sub = other.strip_prefix("meta:").unwrap_or(other);
(sub, ColumnAlignment::Left)
}
other => (other, ColumnAlignment::Left),
};
ColumnConfig {
name: name.to_string(),
label: label.to_string(),
align,
..Default::default()
}
})
.collect()
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::path::Path;
#[test]
fn test_expand_tilde_with_slash() {
let home = dirs::home_dir().unwrap();
let result = Settings::expand_tilde(Path::new("~/foo/bar"));
assert_eq!(result, home.join("foo/bar"));
}
#[test]
fn test_expand_tilde_bare() {
let home = dirs::home_dir().unwrap();
let result = Settings::expand_tilde(Path::new("~"));
assert_eq!(result, home);
}
#[test]
fn test_expand_tilde_absolute() {
let result = Settings::expand_tilde(Path::new("/etc/keep"));
assert_eq!(result, PathBuf::from("/etc/keep"));
}
#[test]
fn test_expand_tilde_relative() {
let result = Settings::expand_tilde(Path::new("foo/bar"));
assert_eq!(result, PathBuf::from("foo/bar"));
}
}

308
src/db.rs
View File

@@ -1,8 +1,7 @@
use anyhow::{Context, Error, Result, anyhow};
use chrono::prelude::*;
use lazy_static::lazy_static;
use log::*;
use rusqlite::{Connection, OpenFlags, params};
use rusqlite::{Connection, OpenFlags, Row, params};
use rusqlite_migration::{M, Migrations};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
@@ -19,7 +18,7 @@ and query utilities for efficient data access.
# Schema
The database uses three main tables:
- `items`: Core item information (ID, timestamp, size, compression).
- `items`: Core item information (ID, timestamp, uncompressed_size, compressed_size, closed, compression).
- `tags`: Item-tag associations (many-to-many).
- `metas`: Item-metadata associations (many-to-many).
@@ -42,30 +41,26 @@ let conn = db::open(PathBuf::from("keep.db"))?;
```
Insert an item:
```ignore
let item = db::Item { id: None, ts: Utc::now(), size: None, compression: "lz4".to_string() };
let item = db::Item { id: None, ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
let id = db::insert_item(&conn, item)?;
```
*/
lazy_static! {
// Database schema migrations for the Keep application.
//
// Defines the sequence of migrations to create and update the schema.
// Applied automatically when opening a database connection.
static ref MIGRATIONS: Migrations<'static> = Migrations::new(vec![
static MIGRATIONS: std::sync::LazyLock<Migrations<'static>> = std::sync::LazyLock::new(|| {
Migrations::new(vec![
M::up(
"CREATE TABLE items(
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
ts TEXT NOT NULL,
size INTEGER NULL,
compression TEXT NOT NULL)"
compression TEXT NOT NULL)",
),
M::up(
"CREATE TABLE tags (
id INTEGER NOT NULL,
name TEXT NOT NULL,
FOREIGN KEY(id) REFERENCES items(id) ON DELETE CASCADE,
PRIMARY KEY(id, name));"
PRIMARY KEY(id, name));",
),
M::up(
"CREATE TABLE metas (
@@ -73,12 +68,17 @@ lazy_static! {
name TEXT NOT NULL,
value TEXT NOT NULL,
FOREIGN KEY(id) REFERENCES items(id) ON DELETE CASCADE,
PRIMARY KEY(id, name));"
PRIMARY KEY(id, name));",
),
M::up("CREATE INDEX idx_tags_name ON tags(name)"),
M::up("CREATE INDEX idx_metas_name ON metas(name)"),
]);
}
M::up("CREATE INDEX idx_items_ts ON items(ts)"),
M::up("UPDATE items SET compression = 'raw' WHERE compression = 'none'"),
M::up("ALTER TABLE items RENAME COLUMN size TO uncompressed_size"),
M::up("ALTER TABLE items ADD COLUMN compressed_size INTEGER NULL"),
M::up("ALTER TABLE items ADD COLUMN closed BOOLEAN NOT NULL DEFAULT 1"),
])
});
/// Represents an item stored in the database.
///
@@ -88,7 +88,9 @@ lazy_static! {
///
/// * `id` - Unique identifier, `None` for new items.
/// * `ts` - Creation timestamp in UTC.
/// * `size` - Content size in bytes, `None` if not set.
/// * `uncompressed_size` - Uncompressed content size in bytes, `None` if not set.
/// * `compressed_size` - Compressed file size on disk, `None` if not set.
/// * `closed` - Whether the item has been fully written and closed.
/// * `compression` - Compression algorithm used.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Item {
@@ -96,12 +98,27 @@ pub struct Item {
pub id: Option<i64>,
/// Timestamp when the item was created.
pub ts: DateTime<Utc>,
/// Size of the item content in bytes, None if not set.
pub size: Option<i64>,
/// Uncompressed size of the item content in bytes, None if not set.
pub uncompressed_size: Option<i64>,
/// Compressed file size on disk in bytes, None if not set.
pub compressed_size: Option<i64>,
/// Whether the item has been fully written and closed.
pub closed: bool,
/// Compression algorithm used for the item content.
pub compression: String,
}
fn item_from_row(row: &Row) -> Result<Item> {
Ok(Item {
id: row.get(0)?,
ts: row.get(1)?,
uncompressed_size: row.get(2)?,
compressed_size: row.get(3)?,
closed: row.get(4)?,
compression: row.get(5)?,
})
}
/// Represents a tag associated with an item.
///
/// Defines the relationship between items and tags in a many-to-many structure.
@@ -162,8 +179,10 @@ pub struct Meta {
/// # use keep::db;
/// # use keep::db::*;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// # Ok(())
/// # }
@@ -213,13 +232,17 @@ pub fn open(path: PathBuf) -> Result<Connection, Error> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item {
/// id: None,
/// ts: Utc::now(),
/// size: None,
/// uncompressed_size: None,
/// compressed_size: None,
/// closed: false,
/// compression: "lz4".to_string(),
/// };
/// let id = db::insert_item(&conn, item)?;
@@ -230,8 +253,8 @@ pub fn open(path: PathBuf) -> Result<Connection, Error> {
pub fn insert_item(conn: &Connection, item: Item) -> Result<i64> {
debug!("DB: Inserting item: {item:?}");
conn.execute(
"INSERT INTO items (ts, size, compression) VALUES (?1, ?2, ?3)",
params![item.ts, item.size, item.compression],
"INSERT INTO items (ts, uncompressed_size, compressed_size, closed, compression) VALUES (?1, ?2, ?3, ?4, ?5)",
params![item.ts, item.uncompressed_size, item.compressed_size, item.closed, item.compression],
)?;
Ok(conn.last_insert_rowid())
}
@@ -260,8 +283,10 @@ pub fn insert_item(conn: &Connection, item: Item) -> Result<i64> {
/// # use keep::db::*;
/// # use keep::compression_engine::CompressionType;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let compression = CompressionType::LZ4;
/// let item = db::create_item(&conn, compression)?;
@@ -276,7 +301,9 @@ pub fn create_item(
let item = Item {
id: None,
ts: chrono::Utc::now(),
size: None,
uncompressed_size: None,
compressed_size: None,
closed: false,
compression: compression_type.to_string(),
};
let item_id = insert_item(conn, item.clone())?;
@@ -286,6 +313,37 @@ pub fn create_item(
})
}
/// Creates a new item with a specific timestamp (for import).
///
/// # Arguments
///
/// * `conn` - Database connection.
/// * `ts` - Timestamp to use for the item.
/// * `compression` - Compression type string (e.g., "lz4", "gzip", "raw").
///
/// # Returns
///
/// * `Result<Item>` - The created item with its ID set.
pub fn insert_item_with_ts(
conn: &Connection,
ts: chrono::DateTime<chrono::Utc>,
compression: &str,
) -> Result<Item> {
let item = Item {
id: None,
ts,
uncompressed_size: None,
compressed_size: None,
closed: false,
compression: compression.to_string(),
};
let item_id = insert_item(conn, item.clone())?;
Ok(Item {
id: Some(item_id),
..item
})
}
/// Adds a tag to an item.
///
/// Inserts a new tag association in the `tags` table.
@@ -312,10 +370,12 @@ pub fn create_item(
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: None, ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: None, ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let item_id = db::insert_item(&conn, item)?;
/// db::add_tag(&conn, item_id, "important")?;
/// # Ok(())
@@ -329,6 +389,18 @@ pub fn add_tag(conn: &Connection, item_id: i64, tag_name: &str) -> Result<()> {
insert_tag(conn, tag)
}
/// Adds a tag to an item, ignoring if the tag already exists.
///
/// Uses `INSERT OR IGNORE` to make the operation idempotent.
pub fn upsert_tag(conn: &Connection, item_id: i64, tag_name: &str) -> Result<()> {
debug!("DB: Upserting tag: item={item_id}, tag={tag_name}");
conn.execute(
"INSERT OR IGNORE INTO tags (id, name) VALUES (?1, ?2)",
params![item_id, tag_name],
)?;
Ok(())
}
/// Adds metadata to an item.
///
/// Inserts a new metadata entry in the `metas` table.
@@ -356,10 +428,12 @@ pub fn add_tag(conn: &Connection, item_id: i64, tag_name: &str) -> Result<()> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: None, ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: None, ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let item_id = db::insert_item(&conn, item)?;
/// db::add_meta(&conn, item_id, "mime_type", "text/plain")?;
/// # Ok(())
@@ -399,10 +473,12 @@ pub fn add_meta(conn: &Connection, item_id: i64, name: &str, value: &str) -> Res
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: Some(1), size: Some(1024), compression: "lz4".to_string(), ts: Utc::now() };
/// let item = Item { id: Some(1), ts: Utc::now(), uncompressed_size: Some(1024), compressed_size: Some(512), closed: true, compression: "lz4".to_string() };
/// db::update_item(&conn, item)?;
/// # Ok(())
/// # }
@@ -410,8 +486,8 @@ pub fn add_meta(conn: &Connection, item_id: i64, name: &str, value: &str) -> Res
pub fn update_item(conn: &Connection, item: Item) -> Result<()> {
debug!("DB: Updating item: {item:?}");
conn.execute(
"UPDATE items SET size=?2, compression=?3 WHERE id=?1",
params![item.id, item.size, item.compression,],
"UPDATE items SET uncompressed_size=?2, compressed_size=?3, closed=?4, compression=?5 WHERE id=?1",
params![item.id, item.uncompressed_size, item.compressed_size, item.closed, item.compression,],
)?;
Ok(())
}
@@ -441,17 +517,22 @@ pub fn update_item(conn: &Connection, item: Item) -> Result<()> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: Some(1), ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: Some(1), ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// db::delete_item(&conn, item)?;
/// # Ok(())
/// # }
/// ```
pub fn delete_item(conn: &Connection, item: Item) -> Result<()> {
debug!("DB: Deleting item: {item:?}");
conn.execute("DELETE FROM items WHERE id=?1", params![item.id])?;
let id = item
.id
.ok_or_else(|| anyhow::anyhow!("Cannot delete item: ID is None"))?;
conn.execute("DELETE FROM items WHERE id=?1", params![id])?;
Ok(())
}
@@ -479,8 +560,10 @@ pub fn delete_item(conn: &Connection, item: Item) -> Result<()> {
/// # use keep::db;
/// # use keep::db::*;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let meta = Meta { id: 1, name: "temp".to_string(), value: "".to_string() };
/// db::query_delete_meta(&conn, meta)?;
@@ -521,10 +604,12 @@ pub fn query_delete_meta(conn: &Connection, meta: Meta) -> Result<()> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: None, ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: None, ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let item_id = db::insert_item(&conn, item)?;
/// let meta = Meta { id: item_id, name: "mime_type".to_string(), value: "text/plain".to_string() };
/// db::query_upsert_meta(&conn, meta)?;
@@ -565,10 +650,12 @@ pub fn query_upsert_meta(conn: &Connection, meta: Meta) -> Result<()> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: None, ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: None, ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let item_id = db::insert_item(&conn, item)?;
/// // Insert new metadata
/// let meta = Meta { id: item_id, name: "source".to_string(), value: "cli".to_string() };
@@ -614,10 +701,12 @@ pub fn store_meta(conn: &Connection, meta: Meta) -> Result<()> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: None, ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: None, ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let item_id = db::insert_item(&conn, item)?;
/// let tag = Tag { id: item_id, name: "work".to_string() };
/// db::insert_tag(&conn, tag)?;
@@ -657,10 +746,12 @@ pub fn insert_tag(conn: &Connection, tag: Tag) -> Result<()> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: Some(1), ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: Some(1), ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// db::delete_item_tags(&conn, item)?;
/// # Ok(())
/// # }
@@ -697,12 +788,14 @@ pub fn delete_item_tags(conn: &Connection, item: Item) -> Result<()> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: None, ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: None, ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let item_id = db::insert_item(&conn, item)?;
/// let item = Item { id: Some(item_id), ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: Some(item_id), ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let tags = vec!["project_a".to_string(), "urgent".to_string()];
/// db::set_item_tags(&conn, item, &tags)?;
/// # Ok(())
@@ -750,8 +843,10 @@ pub fn set_item_tags(conn: &Connection, item: Item, tags: &Vec<String>) -> Resul
/// # use keep::db;
/// # use keep::db::*;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let all_items = db::query_all_items(&conn)?;
/// assert!(all_items.len() >= 0);
@@ -761,19 +856,13 @@ pub fn set_item_tags(conn: &Connection, item: Item, tags: &Vec<String>) -> Resul
pub fn query_all_items(conn: &Connection) -> Result<Vec<Item>> {
debug!("DB: Querying all items");
let mut statement = conn
.prepare("SELECT id, ts, size, compression FROM items ORDER BY id ASC")
.prepare("SELECT id, ts, uncompressed_size, compressed_size, closed, compression FROM items ORDER BY id ASC")
.context("Problem preparing SQL statement")?;
let mut rows = statement.query(params![])?;
let mut items = Vec::new();
while let Some(row) = rows.next()? {
let item = Item {
id: row.get(0)?,
ts: row.get(1)?,
size: row.get(2)?,
compression: row.get(3)?,
};
items.push(item);
items.push(item_from_row(row)?);
}
Ok(items)
@@ -802,8 +891,10 @@ pub fn query_all_items(conn: &Connection) -> Result<Vec<Item>> {
/// # use keep::db;
/// # use keep::db::*;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let tags = vec!["work".to_string(), "urgent".to_string()];
/// let tagged_items = db::query_tagged_items(&conn, &tags)?;
@@ -817,7 +908,9 @@ pub fn query_tagged_items<'a>(conn: &'a Connection, tags: &'a Vec<String>) -> Re
"
SELECT items.id,
items.ts,
items.size,
items.uncompressed_size,
items.compressed_size,
items.closed,
items.compression,
count(tags_match.id) as tags_score
FROM items,
@@ -840,13 +933,7 @@ pub fn query_tagged_items<'a>(conn: &'a Connection, tags: &'a Vec<String>) -> Re
let mut items = Vec::new();
while let Some(row) = rows.next()? {
let item = Item {
id: row.get(0)?,
ts: row.get(1)?,
size: row.get(2)?,
compression: row.get(3)?,
};
items.push(item);
items.push(item_from_row(row)?);
}
Ok(items)
@@ -870,8 +957,10 @@ pub fn query_tagged_items<'a>(conn: &'a Connection, tags: &'a Vec<String>) -> Re
/// # use keep::db;
/// # use keep::db::*;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let items = db::get_items(&conn)?;
/// # Ok(())
@@ -908,11 +997,13 @@ pub fn get_items(conn: &Connection) -> Result<Vec<Item>> {
/// # use keep::db::*;
/// # use std::collections::HashMap;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let tags = vec!["project".to_string()];
/// let meta = HashMap::from([("status".to_string(), "active".to_string())]);
/// let meta = HashMap::from([("status".to_string(), Some("active".to_string()))]);
/// let matching = db::get_items_matching(&conn, &tags, &meta)?;
/// # Ok(())
/// # }
@@ -920,7 +1011,7 @@ pub fn get_items(conn: &Connection) -> Result<Vec<Item>> {
pub fn get_items_matching(
conn: &Connection,
tags: &Vec<String>,
meta: &HashMap<String, String>,
meta: &HashMap<String, Option<String>>,
) -> Result<Vec<Item>> {
debug!("DB: Getting items matching: tags={tags:?} meta={meta:?}");
@@ -947,7 +1038,10 @@ pub fn get_items_matching(
Some(m) => m,
None => return false,
};
meta.iter().all(|(k, v)| item_meta.get(k) == Some(v))
meta.iter().all(|(k, v)| match v {
Some(val) => item_meta.get(k) == Some(val),
None => item_meta.contains_key(k),
})
})
.collect();
Ok(filtered_items)
@@ -979,8 +1073,10 @@ pub fn get_items_matching(
/// # use keep::db::*;
/// # use std::collections::HashMap;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let tags = vec!["latest".to_string()];
/// let item = db::get_item_matching(&conn, &tags, &HashMap::new())?;
@@ -990,7 +1086,7 @@ pub fn get_items_matching(
pub fn get_item_matching(
conn: &Connection,
tags: &Vec<String>,
meta: &HashMap<String, String>,
meta: &HashMap<String, Option<String>>,
) -> Result<Option<Item>> {
debug!("DB: Get item matching tags: {tags:?}, meta: {meta:?}");
let items = get_items_matching(conn, tags, meta)?;
@@ -1021,10 +1117,12 @@ pub fn get_item_matching(
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: None, ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: None, ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let item_id = db::insert_item(&conn, item)?;
/// let item = db::get_item(&conn, item_id)?;
/// assert!(item.is_some());
@@ -1036,7 +1134,7 @@ pub fn get_item(conn: &Connection, item_id: i64) -> Result<Option<Item>> {
let mut statement = conn
.prepare_cached(
"
SELECT id, ts, size, compression
SELECT id, ts, uncompressed_size, compressed_size, closed, compression
FROM items
WHERE items.id = ?1",
)
@@ -1048,8 +1146,10 @@ pub fn get_item(conn: &Connection, item_id: i64) -> Result<Option<Item>> {
Some(row) => Ok(Some(Item {
id: row.get(0)?,
ts: row.get(1)?,
size: row.get(2)?,
compression: row.get(3)?,
uncompressed_size: row.get(2)?,
compressed_size: row.get(3)?,
closed: row.get(4)?,
compression: row.get(5)?,
})),
None => Ok(None),
}
@@ -1077,8 +1177,10 @@ pub fn get_item(conn: &Connection, item_id: i64) -> Result<Option<Item>> {
/// # use keep::db;
/// # use keep::db::*;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let latest = db::get_item_last(&conn)?;
/// # Ok(())
@@ -1089,7 +1191,7 @@ pub fn get_item_last(conn: &Connection) -> Result<Option<Item>> {
let mut statement = conn
.prepare_cached(
"
SELECT id, ts, size, compression
SELECT id, ts, uncompressed_size, compressed_size, closed, compression
FROM items
ORDER BY id DESC
LIMIT 1",
@@ -1102,8 +1204,10 @@ pub fn get_item_last(conn: &Connection) -> Result<Option<Item>> {
Some(row) => Ok(Some(Item {
id: row.get(0)?,
ts: row.get(1)?,
size: row.get(2)?,
compression: row.get(3)?,
uncompressed_size: row.get(2)?,
compressed_size: row.get(3)?,
closed: row.get(4)?,
compression: row.get(5)?,
})),
None => Ok(None),
}
@@ -1133,10 +1237,12 @@ pub fn get_item_last(conn: &Connection) -> Result<Option<Item>> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: Some(1), ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: Some(1), ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let tags = db::get_item_tags(&conn, &item)?;
/// # Ok(())
/// # }
@@ -1184,10 +1290,12 @@ pub fn get_item_tags(conn: &Connection, item: &Item) -> Result<Vec<Tag>> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: Some(1), ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let item = Item { id: Some(1), ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let meta = db::get_item_meta(&conn, &item)?;
/// # Ok(())
/// # }
@@ -1237,15 +1345,17 @@ pub fn get_item_meta(conn: &Connection, item: &Item) -> Result<Vec<Meta>> {
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: Some(1), ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let meta = db::get_item_meta_name(&conn, &item, "mime_type".to_string())?;
/// let item = Item { id: Some(1), ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let meta = db::get_item_meta_name(&conn, &item, "mime_type")?;
/// # Ok(())
/// # }
/// ```
pub fn get_item_meta_name(conn: &Connection, item: &Item, name: String) -> Result<Option<Meta>> {
pub fn get_item_meta_name(conn: &Connection, item: &Item, name: &str) -> Result<Option<Meta>> {
debug!("DB: Getting item meta name: {item:?} {name:?}");
let mut statement = conn
.prepare_cached("SELECT id, name, value FROM metas WHERE id=?1 AND name=?2")
@@ -1287,15 +1397,17 @@ pub fn get_item_meta_name(conn: &Connection, item: &Item, name: String) -> Resul
/// # use keep::db::*;
/// # use chrono::Utc;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let item = Item { id: Some(1), ts: Utc::now(), size: None, compression: "lz4".to_string() };
/// let value = db::get_item_meta_value(&conn, &item, "source".to_string())?;
/// let item = Item { id: Some(1), ts: Utc::now(), uncompressed_size: None, compressed_size: None, closed: false, compression: "lz4".to_string() };
/// let value = db::get_item_meta_value(&conn, &item, "source")?;
/// # Ok(())
/// # }
/// ```
pub fn get_item_meta_value(conn: &Connection, item: &Item, name: String) -> Result<Option<String>> {
pub fn get_item_meta_value(conn: &Connection, item: &Item, name: &str) -> Result<Option<String>> {
debug!("DB: Getting item meta value: {item:?} {name:?}");
let mut statement = conn
.prepare_cached("SELECT value FROM metas WHERE id=?1 AND name=?2")
@@ -1331,8 +1443,10 @@ pub fn get_item_meta_value(conn: &Connection, item: &Item, name: String) -> Resu
/// # use keep::db;
/// # use keep::db::*;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let ids = vec![1, 2, 3];
/// let tags_map = db::get_tags_for_items(&conn, &ids)?;
@@ -1398,8 +1512,10 @@ pub fn get_tags_for_items(
/// # use keep::db;
/// # use keep::db::*;
/// # use std::path::PathBuf;
/// # use tempfile;
/// # fn main() -> anyhow::Result<()> {
/// let db_path = PathBuf::from("keep.db");
/// let _tmp = tempfile::tempdir()?;
/// let db_path = _tmp.path().join("keep.db");
/// let conn = db::open(db_path)?;
/// let ids = vec![1, 2, 3];
/// let meta_map = db::get_meta_for_items(&conn, &ids)?;

167
src/export_tar.rs Normal file
View File

@@ -0,0 +1,167 @@
use anyhow::{Context, Result, anyhow};
use log::debug;
use std::collections::HashSet;
use std::fs;
use std::io::{Read, Seek, Write};
use std::path::Path;
use tar::{Builder, Header};
use crate::filter_plugin::FilterChain;
use crate::modes::common::ExportMeta;
use crate::services::item_service::ItemService;
use crate::services::types::ItemWithMeta;
/// Compute the intersection of all items' tag sets.
///
/// Returns sorted tags that are present on ALL items.
pub fn common_tags(items: &[ItemWithMeta]) -> Vec<String> {
if items.is_empty() {
return Vec::new();
}
let mut common: HashSet<String> = items[0].tag_names().into_iter().collect();
for item in items.iter().skip(1) {
let item_tags: HashSet<String> = item.tag_names().into_iter().collect();
common = common.intersection(&item_tags).cloned().collect();
}
let mut result: Vec<String> = common.into_iter().collect();
result.sort();
result
}
/// Resolve the export name from the CLI arg or compute default from common tags.
///
/// If `arg` is Some, uses that value directly.
/// Otherwise, computes `export_<common-tags>` or just `export` if no common tags.
pub fn export_name(arg: &Option<String>, items: &[ItemWithMeta]) -> String {
if let Some(name) = arg {
return name.clone();
}
let tags = common_tags(items);
if tags.is_empty() {
"export".to_string()
} else {
format!("export_{}", tags.join("_"))
}
}
/// Write items to a tar archive, streaming data without loading files into memory.
///
/// The archive contains `<dir_name>/<id>.data.<compression>` and
/// `<dir_name>/<id>.meta.yml` for each item.
///
/// # Arguments
///
/// * `writer` - The output writer (e.g., a File).
/// * `dir_name` - Top-level directory name inside the tar.
/// * `items` - Items to export.
/// * `data_path` - Path to the data storage directory.
/// * `filter_chain` - Optional filter chain for transforming content on export.
/// * `item_service` - Item service for streaming content.
/// * `conn` - Database connection for filter chain operations.
pub fn write_export_tar<W: Write>(
writer: W,
dir_name: &str,
items: &[ItemWithMeta],
data_path: &Path,
filter_chain: Option<&FilterChain>,
item_service: &ItemService,
conn: &rusqlite::Connection,
) -> Result<()> {
let mut builder = Builder::new(writer);
for item_with_meta in items {
let item_id = item_with_meta.item.id.context("Item missing ID")?;
let compression = &item_with_meta.item.compression;
let item_tags = item_with_meta.tag_names();
let meta_map = item_with_meta.meta_as_map();
let data_path_entry = format!("{dir_name}/{item_id}.data.{compression}");
let meta_path_entry = format!("{dir_name}/{item_id}.meta.yml");
// Meta entry (small, in-memory is fine)
let export_meta = ExportMeta {
ts: item_with_meta.item.ts,
compression: compression.clone(),
uncompressed_size: item_with_meta.item.uncompressed_size,
tags: item_tags,
metadata: meta_map,
};
let meta_yaml = serde_yaml::to_string(&export_meta)?;
let meta_bytes = meta_yaml.into_bytes();
let meta_len = meta_bytes.len() as u64;
let mut meta_header = Header::new_gnu();
meta_header.set_size(meta_len);
meta_header.set_mode(0o644);
meta_header.set_path(&meta_path_entry)?;
meta_header.set_cksum();
builder
.append(&meta_header, meta_bytes.as_slice())
.with_context(|| format!("Cannot write meta entry for item {item_id}"))?;
debug!("EXPORT_TAR: Wrote meta entry {meta_path_entry}");
// Data entry
let mut item_file_path = data_path.to_path_buf();
item_file_path.push(item_id.to_string());
if let Some(chain) = filter_chain {
// Filtered export: spool through filter chain to a temp file,
// then stream the temp file into the tar with known size.
let (mut reader, _, _) = item_service.get_item_content_info_streaming_with_chain(
conn,
item_id,
Some(chain),
)?;
let mut tmp = tempfile::NamedTempFile::new()
.context("Cannot create temp file for filtered export")?;
let mut buf = [0u8; crate::common::PIPESIZE];
loop {
let n = reader.read(&mut buf)?;
if n == 0 {
break;
}
tmp.write_all(&buf[..n])?;
}
tmp.flush()?;
let total_size = tmp.as_file().metadata()?.len();
tmp.rewind()?;
let mut data_header = Header::new_gnu();
data_header.set_size(total_size);
data_header.set_mode(0o644);
data_header.set_path(&data_path_entry)?;
data_header.set_cksum();
builder
.append(&data_header, &mut tmp)
.with_context(|| format!("Cannot write data entry for item {item_id}"))?;
debug!("EXPORT_TAR: Wrote filtered data entry {data_path_entry} ({total_size} bytes)");
} else {
// Unfiltered export: stream raw compressed file
let file = fs::File::open(&item_file_path)
.with_context(|| format!("Cannot open data file: {}", item_file_path.display()))?;
let file_size = file.metadata()?.len();
let mut data_header = Header::new_gnu();
data_header.set_size(file_size);
data_header.set_mode(0o644);
data_header.set_path(&data_path_entry)?;
data_header.set_cksum();
builder
.append(&data_header, file)
.with_context(|| format!("Cannot write data entry for item {item_id}"))?;
debug!("EXPORT_TAR: Wrote data entry {data_path_entry} ({file_size} bytes)");
}
}
builder.finish().context("Cannot finalize tar archive")?;
debug!("EXPORT_TAR: Archive finalized");
Ok(())
}

View File

@@ -1,47 +0,0 @@
# This Pest grammar defines the syntax for filter chains used in the Keep application.
# Filters can be chained with commas and may have named or unnamed options with JSON-like values.
WHITESPACE = _{ " " | "\t" | "\n" | "\r" }
# Top-level rule for parsing multiple filters separated by commas.
filters = { filter ~ ("," ~ filters)? }
# A single filter consisting of a name optionally followed by parenthesized options.
filter = { filter_name ~ ("(" ~ options ~ ")")? }
# The name of a filter, starting with an ASCII letter followed by alphanumeric characters or underscores.
filter_name = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
# A list of comma-separated options within parentheses.
options = { option ~ ("," ~ options)? }
# A single option, optionally with a name followed by an equals sign and a value.
option = { (option_name ~ "=")? ~ option_value }
# The name of an option, starting with an ASCII letter followed by alphanumeric characters or underscores.
option_name = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
# The value of an option, which can be a JSON number, string, or boolean.
option_value = {
JSON_NUMBER |
JSON_STRING |
JSON_BOOLEAN
}
# JSON number format supporting integers, decimals, and scientific notation.
JSON_NUMBER = @{
("-")? ~
("0" | ASCII_NONZERO_DIGIT ~ ASCII_DIGIT*) ~
("." ~ ASCII_DIGIT*)? ~
(("e" | "E") ~ ("+" | "-")? ~ ASCII_DIGIT+)?
}
# JSON string format with escaped characters.
JSON_STRING = ${
"\"" ~
(("\\" ~ ANY) | (!("\"" | "\\") ~ ANY))* ~
"\""
}
# JSON boolean values: true or false.
JSON_BOOLEAN = ${ "true" | "false" }

View File

@@ -1,131 +0,0 @@
use pest::Parser;
use pest_derive::Parser;
use std::collections::HashMap;
#[derive(Parser)]
#[grammar = "filter.pest"]
pub struct FilterParser;
#[derive(Debug)]
pub struct Filter {
pub name: String,
pub options: HashMap<String, serde_json::Value>,
}
pub fn parse_filter_string(input: &str) -> Result<Vec<Filter>, Box<dyn std::error::Error>> {
let mut filters = Vec::new();
let pairs = FilterParser::parse(Rule::filters, input)?;
for pair in pairs {
if pair.as_rule() == Rule::filter {
let mut name = String::new();
let mut options = HashMap::new();
for inner_pair in pair.into_inner() {
match inner_pair.as_rule() {
Rule::filter_name => {
name = inner_pair.as_str().to_string();
}
Rule::options => {
for option_pair in inner_pair.into_inner() {
if option_pair.as_rule() == Rule::option {
let mut option_name = None;
let mut option_value = None;
for option_inner in option_pair.into_inner() {
match option_inner.as_rule() {
Rule::option_name => {
option_name = Some(option_inner.as_str().to_string());
}
Rule::option_value => {
option_value = Some(parse_option_value(option_inner.as_str())?);
}
_ => {}
}
}
if let Some(value) = option_value {
// If no name is provided, use the filter name as the key
let key = option_name.unwrap_or_else(|| name.clone());
options.insert(key, value);
}
}
}
}
_ => {}
}
}
filters.push(Filter { name, options });
}
}
Ok(filters)
}
fn parse_option_value(input: &str) -> Result<serde_json::Value, Box<dyn std::error::Error>> {
// Try to parse as number
if let Ok(num) = input.parse::<i64>() {
return Ok(serde_json::Value::Number(num.into()));
}
if let Ok(num) = input.parse::<f64>() {
if let Some(number) = serde_json::Number::from_f64(num) {
return Ok(serde_json::Value::Number(number));
}
}
// Try to parse as boolean
if let Ok(boolean) = input.parse::<bool>() {
return Ok(serde_json::Value::Bool(boolean));
}
// Treat as string (remove quotes if present)
let value = if input.starts_with('"') && input.ends_with('"') {
input[1..input.len()-1].to_string()
} else if input.starts_with('\'') && input.ends_with('\'') {
input[1..input.len()-1].to_string()
} else {
input.to_string()
};
Ok(serde_json::Value::String(value))
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_simple_filter() {
let result = parse_filter_string("grep").unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].name, "grep");
assert!(result[0].options.is_empty());
}
#[test]
fn test_parse_filter_with_options() {
let result = parse_filter_string("head_lines(10)").unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].name, "head_lines");
assert_eq!(result[0].options["head_lines"], 10);
}
#[test]
fn test_parse_filter_with_named_options() {
let result = parse_filter_string("grep(pattern=\"error\")").unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].name, "grep");
assert_eq!(result[0].options["pattern"], "error");
}
#[test]
fn test_parse_multiple_filters() {
let result = parse_filter_string("head_lines(10), grep(pattern=\"error\")").unwrap();
assert_eq!(result.len(), 2);
assert_eq!(result[0].name, "head_lines");
assert_eq!(result[0].options["head_lines"], 10);
assert_eq!(result[1].name, "grep");
assert_eq!(result[1].options["pattern"], "error");
}
}

View File

@@ -1,8 +1,8 @@
use super::{FilterPlugin, FilterOption};
use std::io::{Result, Read, Write};
use std::process::{Command, Stdio, Child};
use which::which;
use super::{FilterOption, FilterPlugin};
use log::*;
use std::io::{Read, Result, Write};
use std::process::{Child, Command, Stdio};
use which::which;
/// A filter that executes an external program and pipes input through it.
///
@@ -43,16 +43,13 @@ impl ExecFilter {
/// let filter = ExecFilter::new("grep", vec!["-i", "error"], false);
/// assert!(filter.supported);
/// ```
pub fn new(
program: &str,
args: Vec<&str>,
split_whitespace: bool,
) -> ExecFilter {
pub fn new(program: &str, args: Vec<&str>, split_whitespace: bool) -> ExecFilter {
let program_path = which(program);
let supported = program_path.is_ok();
ExecFilter {
program: program_path.map_or_else(|| program.to_string(), |p| p.to_string_lossy().to_string()),
program: program_path
.map_or_else(|| program.to_string(), |p| p.to_string_lossy().to_string()),
args: args.iter().map(|s| s.to_string()).collect(),
supported,
split_whitespace,
@@ -101,7 +98,10 @@ impl FilterPlugin for ExecFilter {
));
}
debug!("FILTER_EXEC: Executing command: {} {:?}", self.program, self.args);
debug!(
"FILTER_EXEC: Executing command: {} {:?}",
self.program, self.args
);
// Read all input first
let mut input_data = Vec::new();
@@ -142,8 +142,7 @@ impl FilterPlugin for ExecFilter {
std::io::copy(&mut stdout, writer)?;
// Wait for the child process to finish
let output = child.wait_with_output()
.map_err(|e| {
let output = child.wait_with_output().map_err(|e| {
std::io::Error::new(
std::io::ErrorKind::Other,
format!("Failed to wait on child process: {}", e),
@@ -165,13 +164,6 @@ impl FilterPlugin for ExecFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// Creates a new instance without active process handles.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(ExecFilter {
program: self.program.clone(),
@@ -205,6 +197,10 @@ impl FilterPlugin for ExecFilter {
},
]
}
fn description(&self) -> &str {
"Pipe input through an external command"
}
}
// Register the plugin at module initialization time
@@ -221,5 +217,6 @@ fn register_exec_filter() {
stdin_writer: None,
stdout_reader: None,
})
});
})
.expect("Failed to register exec filter");
}

View File

@@ -87,21 +87,6 @@ impl FilterPlugin for GrepFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// Creates a new GrepFilter with the same regex pattern.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
///
/// # Examples
///
/// ```
/// # use keep::filter_plugin::{FilterPlugin, GrepFilter};
/// let filter = GrepFilter::new("test".to_string()).unwrap();
/// let cloned = filter.clone_box();
/// ```
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
regex: self.regex.clone(),
@@ -126,10 +111,10 @@ impl FilterPlugin for GrepFilter {
/// assert!(opts[0].required);
/// ```
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "pattern".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::pattern_option()
}
fn description(&self) -> &str {
"Filter lines matching a regex pattern"
}
}

View File

@@ -3,14 +3,7 @@ use crate::common::PIPESIZE;
use crate::services::filter_service::register_filter_plugin;
use std::io::{BufRead, Read, Result, Write};
/// A filter that reads the first N bytes from the input stream.
///
/// Limits the output to the initial bytes specified in the configuration.
/// Useful for previewing file contents without reading everything.
///
/// # Fields
///
/// * `remaining` - Number of bytes left to read before stopping.
#[derive(Clone)]
pub struct HeadBytesFilter {
remaining: usize,
}
@@ -94,21 +87,6 @@ impl FilterPlugin for HeadBytesFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// Creates an independent copy with the same configuration.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` clone.
///
/// # Examples
///
/// ```
/// # use keep::filter_plugin::{FilterPlugin, HeadBytesFilter};
/// let filter = HeadBytesFilter::new(100);
/// let cloned = filter.clone_box();
/// ```
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
@@ -134,15 +112,15 @@ impl FilterPlugin for HeadBytesFilter {
/// assert!(opts[0].required);
/// ```
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Read the first N bytes"
}
}
/// A filter that reads the first N lines from the input stream.
#[derive(Clone)]
pub struct HeadLinesFilter {
remaining: usize,
}
@@ -224,21 +202,6 @@ impl FilterPlugin for HeadLinesFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// Creates an independent copy with the same configuration.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` clone.
///
/// # Examples
///
/// ```
/// # use keep::filter_plugin::{FilterPlugin, HeadLinesFilter};
/// let filter = HeadLinesFilter::new(5);
/// let cloned = filter.clone_box();
/// ```
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
@@ -246,35 +209,20 @@ impl FilterPlugin for HeadLinesFilter {
}
/// Returns the configuration options for this filter.
///
/// Defines the "count" parameter as required with no default.
///
/// # Returns
///
/// Vector of `FilterOption` describing parameters.
///
/// # Examples
///
/// ```
/// # use keep::filter_plugin::{FilterPlugin, HeadLinesFilter};
/// let filter = HeadLinesFilter::new(5);
/// let opts = filter.options();
/// assert_eq!(opts.len(), 1);
/// assert_eq!(opts[0].name, "count");
/// assert!(opts[0].required);
/// ```
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Read the first N lines"
}
}
// Register the plugin at module initialization time
#[ctor::ctor]
fn register_head_filters() {
register_filter_plugin("head_bytes", || Box::new(HeadBytesFilter::new(0)));
register_filter_plugin("head_lines", || Box::new(HeadLinesFilter::new(0)));
register_filter_plugin("head_bytes", || Box::new(HeadBytesFilter::new(0)))
.expect("Failed to register head_bytes filter");
register_filter_plugin("head_lines", || Box::new(HeadLinesFilter::new(0)))
.expect("Failed to register head_lines filter");
}

View File

@@ -2,6 +2,7 @@ use std::io::{Read, Result, Write};
use std::str::FromStr;
use strum::EnumString;
#[cfg(feature = "filter_grep")]
pub mod grep;
/// Filter plugin module for processing input streams.
///
@@ -16,7 +17,7 @@ pub mod grep;
/// ```
/// # use std::io::{Read, Write};
/// # use keep::filter_plugin::parse_filter_string;
/// let mut chain = parse_filter_string("head_lines(10)|grep(pattern=error)")?;
/// let mut chain = parse_filter_string("head_lines(10)|tail_lines(5)")?;
/// # let mut reader: &mut dyn Read = &mut std::io::empty();
/// # let mut writer: Vec<u8> = Vec::new();
/// # chain.filter(&mut reader, &mut writer)?;
@@ -26,10 +27,13 @@ pub mod head;
pub mod skip;
pub mod strip_ansi;
pub mod tail;
#[cfg(feature = "meta_tokens")]
pub mod tokens;
pub mod utils;
use std::collections::HashMap;
#[cfg(feature = "filter_grep")]
pub use grep::GrepFilter;
pub use head::{HeadBytesFilter, HeadLinesFilter};
pub use skip::{SkipBytesFilter, SkipLinesFilter};
@@ -106,18 +110,16 @@ pub trait FilterPlugin: Send {
/// struct MyFilter;
/// impl FilterPlugin for MyFilter {
/// fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
/// // Read and filter data
/// let mut buf = [0; 1024];
/// loop {
/// let n = reader.read(&mut buf)?;
/// if n == 0 { break; }
/// // Apply filter logic to buf[0..n]
/// writer.write_all(&buf[0..n])?;
/// }
/// Ok(())
/// }
/// fn clone_box(&self) -> Box<dyn FilterPlugin> {
/// Box::new(MyFilter)
/// Box::new(Self)
/// }
/// fn options(&self) -> Vec<FilterOption> {
/// vec![]
@@ -129,22 +131,6 @@ pub trait FilterPlugin: Send {
Ok(())
}
/// Clones this plugin into a new boxed instance.
///
/// This method is required for dynamic dispatch and cloning in filter chains.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` clone of the current plugin.
///
/// # Examples
///
/// ```
/// # use keep::filter_plugin::FilterPlugin;
/// fn example_clone_box(filter: &dyn FilterPlugin) -> Box<dyn FilterPlugin> {
/// filter.clone_box()
/// }
/// ```
fn clone_box(&self) -> Box<dyn FilterPlugin>;
/// Returns the configuration options for this plugin.
@@ -170,6 +156,31 @@ pub trait FilterPlugin: Send {
/// }
/// ```
fn options(&self) -> Vec<FilterOption>;
/// Returns a human-readable description of this filter.
///
/// # Returns
///
/// A description string (empty by default).
fn description(&self) -> &str {
""
}
}
pub fn count_option() -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
}
pub fn pattern_option() -> Vec<FilterOption> {
vec![FilterOption {
name: "pattern".to_string(),
default: None,
required: true,
}]
}
/// Enum representing the different types of filters.
@@ -190,8 +201,57 @@ pub enum FilterType {
TailLines,
SkipBytes,
SkipLines,
#[cfg(feature = "filter_grep")]
Grep,
StripAnsi,
#[cfg(feature = "meta_tokens")]
HeadTokens,
#[cfg(feature = "meta_tokens")]
SkipTokens,
#[cfg(feature = "meta_tokens")]
TailTokens,
}
/// Maximum buffer size (256 MB) for filter chain intermediate results.
/// Prevents OOM on large files by rejecting inputs that exceed this limit.
const MAX_FILTER_BUFFER_SIZE: usize = 256 * 1024 * 1024;
struct BoundedVecWriter {
data: Vec<u8>,
limit: usize,
}
impl BoundedVecWriter {
fn new(limit: usize) -> Self {
Self {
data: Vec::new(),
limit,
}
}
fn into_inner(self) -> Vec<u8> {
self.data
}
}
impl std::io::Write for BoundedVecWriter {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
if self.data.len() + buf.len() > self.limit {
return Err(std::io::Error::new(
std::io::ErrorKind::InvalidData,
format!(
"Input size exceeds maximum filter buffer size ({} bytes)",
MAX_FILTER_BUFFER_SIZE
),
));
}
self.data.write_all(buf)?;
Ok(buf.len())
}
fn flush(&mut self) -> std::io::Result<()> {
Ok(())
}
}
/// A chain of filter plugins applied sequentially.
@@ -241,16 +301,27 @@ impl Clone for FilterChain {
}
impl Clone for Box<dyn FilterPlugin> {
/// Clones the boxed filter plugin.
///
/// # Returns
///
/// A new boxed clone of the filter plugin.
fn clone(&self) -> Self {
self.clone_box()
}
}
#[macro_export]
macro_rules! filter_clone_box {
($self:expr) => {
Box::new($self.clone())
};
($self:expr, $field:ident) => {
Box::new(Self { $field: $self.$field.clone() })
};
($self:expr, $field:ident, $($rest:ident),+) => {
Box::new(Self {
$field: $self.$field.clone(),
$($rest: $self.$rest.clone()),+
})
};
}
impl Default for FilterChain {
fn default() -> Self {
Self::new()
@@ -288,9 +359,8 @@ impl FilterChain {
/// # Examples
///
/// ```
/// # use keep::filter_plugin::{FilterChain, GrepFilter};
/// # use keep::filter_plugin::FilterChain;
/// let mut chain = FilterChain::new();
/// chain.add_plugin(Box::new(GrepFilter::new("error".to_string()).unwrap()));
/// ```
pub fn add_plugin(&mut self, plugin: Box<dyn FilterPlugin>) {
self.plugins.push(plugin);
@@ -330,9 +400,10 @@ impl FilterChain {
}
// For multiple plugins, we need to chain them together
// We'll use a temporary buffer to hold intermediate results
let mut current_data = Vec::new();
std::io::copy(reader, &mut current_data)?;
// We'll use a bounded buffer to hold intermediate results
let mut bounded_writer = BoundedVecWriter::new(MAX_FILTER_BUFFER_SIZE);
std::io::copy(reader, &mut bounded_writer)?;
let mut current_data = bounded_writer.into_inner();
// Store the plugins length to avoid borrowing issues
let plugins_len = self.plugins.len();
@@ -348,6 +419,18 @@ impl FilterChain {
// For intermediate plugins, write to a buffer
let mut output_vec = Vec::new();
self.plugins[i].filter(&mut input, &mut output_vec)?;
if output_vec.len() > MAX_FILTER_BUFFER_SIZE {
return Err(std::io::Error::new(
std::io::ErrorKind::InvalidData,
format!(
"Filter output size ({} bytes) exceeds maximum filter buffer size ({} bytes).",
output_vec.len(),
MAX_FILTER_BUFFER_SIZE
),
));
}
current_data = output_vec;
}
}
@@ -454,6 +537,7 @@ fn create_filter_with_options(
// Get the default options for this filter type by creating a temporary instance
// To do this, we need to create a default instance of the appropriate filter
let option_defs = match filter_type {
#[cfg(feature = "filter_grep")]
FilterType::Grep => grep::GrepFilter::new("".to_string())?.options(),
FilterType::HeadBytes => head::HeadBytesFilter::new(0).options(),
FilterType::HeadLines => head::HeadLinesFilter::new(0).options(),
@@ -462,6 +546,12 @@ fn create_filter_with_options(
FilterType::SkipBytes => skip::SkipBytesFilter::new(0).options(),
FilterType::SkipLines => skip::SkipLinesFilter::new(0).options(),
FilterType::StripAnsi => strip_ansi::StripAnsiFilter::new().options(),
#[cfg(feature = "meta_tokens")]
FilterType::HeadTokens => tokens::HeadTokensFilter::new(0).options(),
#[cfg(feature = "meta_tokens")]
FilterType::SkipTokens => tokens::SkipTokensFilter::new(0).options(),
#[cfg(feature = "meta_tokens")]
FilterType::TailTokens => tokens::TailTokensFilter::new(0).options(),
};
let mut options = HashMap::new();
@@ -530,6 +620,7 @@ fn create_specific_filter(
options: &HashMap<String, serde_json::Value>,
) -> Result<Box<dyn FilterPlugin>> {
match filter_type {
#[cfg(feature = "filter_grep")]
FilterType::Grep => {
let pattern = options
.get("pattern")
@@ -630,7 +721,74 @@ fn create_specific_filter(
}
Ok(Box::new(strip_ansi::StripAnsiFilter::new()))
}
#[cfg(feature = "meta_tokens")]
FilterType::HeadTokens => {
let count = options
.get("count")
.and_then(|v| v.as_u64())
.map(|n| n as usize)
.ok_or_else(|| {
std::io::Error::new(
std::io::ErrorKind::InvalidInput,
"head_tokens filter requires 'count' parameter",
)
})?;
let (encoding, tokenizer) = parse_encoding_option(options);
let mut f = tokens::HeadTokensFilter::new(count);
f.tokenizer = tokenizer;
f.encoding = encoding;
Ok(Box::new(f))
}
#[cfg(feature = "meta_tokens")]
FilterType::SkipTokens => {
let count = options
.get("count")
.and_then(|v| v.as_u64())
.map(|n| n as usize)
.ok_or_else(|| {
std::io::Error::new(
std::io::ErrorKind::InvalidInput,
"skip_tokens filter requires 'count' parameter",
)
})?;
let (encoding, tokenizer) = parse_encoding_option(options);
let mut f = tokens::SkipTokensFilter::new(count);
f.tokenizer = tokenizer;
f.encoding = encoding;
Ok(Box::new(f))
}
#[cfg(feature = "meta_tokens")]
FilterType::TailTokens => {
let count = options
.get("count")
.and_then(|v| v.as_u64())
.map(|n| n as usize)
.ok_or_else(|| {
std::io::Error::new(
std::io::ErrorKind::InvalidInput,
"tail_tokens filter requires 'count' parameter",
)
})?;
let (encoding, tokenizer) = parse_encoding_option(options);
let mut f = tokens::TailTokensFilter::new(count);
f.tokenizer = tokenizer;
f.encoding = encoding;
Ok(Box::new(f))
}
}
}
#[cfg(feature = "meta_tokens")]
fn parse_encoding_option(
options: &std::collections::HashMap<String, serde_json::Value>,
) -> (crate::tokenizer::TokenEncoding, crate::tokenizer::Tokenizer) {
let encoding = options
.get("encoding")
.and_then(|v| v.as_str())
.and_then(|s| s.parse::<crate::tokenizer::TokenEncoding>().ok())
.unwrap_or_default();
let tokenizer = crate::tokenizer::get_tokenizer(encoding).clone();
(encoding, tokenizer)
}
/// Parses an option value from a string into a JSON value.

View File

@@ -4,6 +4,7 @@ use crate::services::filter_service::register_filter_plugin;
use std::io::{BufRead, Read, Result, Write};
/// A filter that skips the first N bytes from the input stream.
#[derive(Clone)]
pub struct SkipBytesFilter {
remaining: usize,
}
@@ -49,11 +50,6 @@ impl FilterPlugin for SkipBytesFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
@@ -61,20 +57,17 @@ impl FilterPlugin for SkipBytesFilter {
}
/// Returns the configuration options for this filter.
///
/// # Returns
///
/// A vector of `FilterOption` describing the filter's configurable parameters.
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Skip the first N bytes"
}
}
/// A filter that skips the first N lines from the input stream.
#[derive(Clone)]
pub struct SkipLinesFilter {
remaining: usize,
}
@@ -114,11 +107,6 @@ impl FilterPlugin for SkipLinesFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
@@ -126,22 +114,20 @@ impl FilterPlugin for SkipLinesFilter {
}
/// Returns the configuration options for this filter.
///
/// # Returns
///
/// A vector of `FilterOption` describing the filter's configurable parameters.
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Skip the first N lines"
}
}
// Register the plugin at module initialization time
#[ctor::ctor]
fn register_skip_filters() {
register_filter_plugin("skip_bytes", || Box::new(SkipBytesFilter::new(0)));
register_filter_plugin("skip_lines", || Box::new(SkipLinesFilter::new(0)));
register_filter_plugin("skip_bytes", || Box::new(SkipBytesFilter::new(0)))
.expect("Failed to register skip_bytes filter");
register_filter_plugin("skip_lines", || Box::new(SkipLinesFilter::new(0)))
.expect("Failed to register skip_lines filter");
}

View File

@@ -7,7 +7,7 @@ use strip_ansi_escapes::Writer;
/// # Fields
///
/// None, stateless filter.
#[derive(Default)]
#[derive(Default, Clone)]
pub struct StripAnsiFilter;
impl StripAnsiFilter {
@@ -39,21 +39,15 @@ impl FilterPlugin for StripAnsiFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self)
}
/// Returns the configuration options for this filter (none required).
///
/// # Returns
///
/// An empty vector since this filter has no configurable options.
fn options(&self) -> Vec<FilterOption> {
Vec::new() // strip_ansi doesn't take any options
Vec::new()
}
fn description(&self) -> &str {
"Strip ANSI escape sequences"
}
}

View File

@@ -4,7 +4,7 @@ use crate::services::filter_service::register_filter_plugin;
use std::collections::VecDeque;
use std::io::{BufRead, Read, Result, Write};
/// A filter that reads the last N bytes from the input stream.
#[derive(Clone)]
pub struct TailBytesFilter {
buffer: VecDeque<u8>,
count: usize,
@@ -58,11 +58,6 @@ impl FilterPlugin for TailBytesFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
buffer: self.buffer.clone(),
@@ -71,20 +66,17 @@ impl FilterPlugin for TailBytesFilter {
}
/// Returns the configuration options for this filter.
///
/// # Returns
///
/// A vector of `FilterOption` describing the filter's configurable parameters.
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Read the last N bytes"
}
}
/// A filter that reads the last N lines from the input stream.
#[derive(Clone)]
pub struct TailLinesFilter {
lines: VecDeque<String>,
count: usize,
@@ -132,11 +124,6 @@ impl FilterPlugin for TailLinesFilter {
Ok(())
}
/// Clones this filter into a new boxed instance.
///
/// # Returns
///
/// A new `Box<dyn FilterPlugin>` representing a clone of this filter.
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
lines: self.lines.clone(),
@@ -145,22 +132,20 @@ impl FilterPlugin for TailLinesFilter {
}
/// Returns the configuration options for this filter.
///
/// # Returns
///
/// A vector of `FilterOption` describing the filter's configurable parameters.
fn options(&self) -> Vec<FilterOption> {
vec![FilterOption {
name: "count".to_string(),
default: None,
required: true,
}]
crate::filter_plugin::count_option()
}
fn description(&self) -> &str {
"Read the last N lines"
}
}
// Register the plugin at module initialization time
#[ctor::ctor]
fn register_tail_filters() {
register_filter_plugin("tail_bytes", || Box::new(TailBytesFilter::new(0)));
register_filter_plugin("tail_lines", || Box::new(TailLinesFilter::new(0)));
register_filter_plugin("tail_bytes", || Box::new(TailBytesFilter::new(0)))
.expect("Failed to register tail_bytes filter");
register_filter_plugin("tail_lines", || Box::new(TailLinesFilter::new(0)))
.expect("Failed to register tail_lines filter");
}

500
src/filter_plugin/tokens.rs Normal file
View File

@@ -0,0 +1,500 @@
use super::{FilterOption, FilterPlugin};
use crate::common::PIPESIZE;
use crate::services::filter_service::register_filter_plugin;
use crate::tokenizer::{TokenEncoding, Tokenizer, get_tokenizer};
use std::io::{Read, Result, Write};
// ---------------------------------------------------------------------------
// head_tokens
// ---------------------------------------------------------------------------
#[derive(Clone)]
pub struct HeadTokensFilter {
pub remaining: usize,
pub tokenizer: Tokenizer,
pub encoding: TokenEncoding,
}
impl HeadTokensFilter {
pub fn new(count: usize) -> Self {
let encoding = TokenEncoding::default();
Self {
remaining: count,
tokenizer: get_tokenizer(encoding).clone(),
encoding,
}
}
}
impl FilterPlugin for HeadTokensFilter {
fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
if self.remaining == 0 {
return Ok(());
}
let tokenizer = &self.tokenizer;
let mut buffer = vec![0u8; PIPESIZE];
let mut total_tokens = 0usize;
loop {
let n = reader.read(&mut buffer)?;
if n == 0 {
break;
}
let chunk = &buffer[..n];
let text = String::from_utf8_lossy(chunk);
let chunk_tokens = tokenizer.count(&text);
if total_tokens + chunk_tokens <= self.remaining {
// Entire chunk fits — write it directly
writer.write_all(chunk)?;
total_tokens += chunk_tokens;
if total_tokens >= self.remaining {
break;
}
} else {
// Cutoff is within this chunk — use iterator to find exact
// boundary without allocating all token strings
let tokens_to_write = self.remaining - total_tokens;
let mut byte_pos = 0usize;
for token_str in tokenizer.split_by_token_iter(&text).take(tokens_to_write) {
byte_pos += token_str
.map_err(|e| std::io::Error::other(e.to_string()))?
.len();
}
let write_len = map_lossy_pos_to_bytes(chunk, &text, byte_pos);
writer.write_all(&chunk[..write_len])?;
break;
}
}
Ok(())
}
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
tokenizer: self.tokenizer.clone(),
encoding: self.encoding,
})
}
fn options(&self) -> Vec<FilterOption> {
vec![
FilterOption {
name: "count".to_string(),
default: None,
required: true,
},
FilterOption {
name: "encoding".to_string(),
default: Some(serde_json::Value::String("cl100k_base".to_string())),
required: false,
},
]
}
fn description(&self) -> &str {
"Read the first N LLM tokens"
}
}
// ---------------------------------------------------------------------------
// skip_tokens
// ---------------------------------------------------------------------------
#[derive(Clone)]
pub struct SkipTokensFilter {
pub remaining: usize,
pub tokenizer: Tokenizer,
pub encoding: TokenEncoding,
}
impl SkipTokensFilter {
pub fn new(count: usize) -> Self {
let encoding = TokenEncoding::default();
Self {
remaining: count,
tokenizer: get_tokenizer(encoding).clone(),
encoding,
}
}
}
impl FilterPlugin for SkipTokensFilter {
fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
if self.remaining == 0 {
return std::io::copy(reader, writer).map(|_| ());
}
let tokenizer = &self.tokenizer;
let mut buffer = vec![0u8; PIPESIZE];
let mut total_tokens = 0usize;
let mut done_skipping = false;
loop {
let n = reader.read(&mut buffer)?;
if n == 0 {
break;
}
if done_skipping {
writer.write_all(&buffer[..n])?;
continue;
}
let chunk = &buffer[..n];
let text = String::from_utf8_lossy(chunk);
let chunk_tokens = tokenizer.count(&text);
if total_tokens + chunk_tokens <= self.remaining {
// Entire chunk is skipped
total_tokens += chunk_tokens;
if total_tokens >= self.remaining {
done_skipping = true;
}
} else {
// Cutoff is within this chunk — use iterator to skip past
// the boundary without allocating all token strings
let tokens_to_skip = self.remaining - total_tokens;
let mut byte_pos = 0usize;
for token_str in tokenizer.split_by_token_iter(&text).take(tokens_to_skip) {
byte_pos += token_str
.map_err(|e| std::io::Error::other(e.to_string()))?
.len();
}
let skip_len = map_lossy_pos_to_bytes(chunk, &text, byte_pos);
if skip_len < n {
writer.write_all(&chunk[skip_len..])?;
}
done_skipping = true;
}
}
Ok(())
}
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
remaining: self.remaining,
tokenizer: self.tokenizer.clone(),
encoding: self.encoding,
})
}
fn options(&self) -> Vec<FilterOption> {
vec![
FilterOption {
name: "count".to_string(),
default: None,
required: true,
},
FilterOption {
name: "encoding".to_string(),
default: Some(serde_json::Value::String("cl100k_base".to_string())),
required: false,
},
]
}
fn description(&self) -> &str {
"Skip the first N LLM tokens"
}
}
// ---------------------------------------------------------------------------
// tail_tokens
// ---------------------------------------------------------------------------
/// A filter that outputs only the last N tokens of the input stream.
///
#[derive(Clone)]
pub struct TailTokensFilter {
pub count: usize,
/// Buffer holding all bytes from the stream.
buffer: Vec<u8>,
pub tokenizer: Tokenizer,
pub encoding: TokenEncoding,
}
impl TailTokensFilter {
pub fn new(count: usize) -> Self {
let encoding = TokenEncoding::default();
Self {
count,
buffer: Vec::with_capacity(PIPESIZE),
tokenizer: get_tokenizer(encoding).clone(),
encoding,
}
}
}
impl FilterPlugin for TailTokensFilter {
fn filter(&mut self, reader: &mut dyn Read, writer: &mut dyn Write) -> Result<()> {
if self.count == 0 {
return Ok(());
}
let tokenizer = &self.tokenizer;
// Buffer all bytes from the stream
std::io::copy(reader, &mut self.buffer)?;
if self.buffer.is_empty() {
return Ok(());
}
let text = String::from_utf8_lossy(&self.buffer);
let token_strs = tokenizer
.split_by_token(&text)
.map_err(|e| std::io::Error::other(e.to_string()))?;
if token_strs.len() <= self.count {
// All tokens fit — write everything
writer.write_all(&self.buffer)?;
} else {
// Write only the last N tokens
let skip = token_strs.len() - self.count;
let mut byte_offset = 0usize;
for token_str in token_strs.iter().take(skip) {
byte_offset += token_str.len();
}
let write_len = map_lossy_pos_to_bytes(&self.buffer, &text, byte_offset);
if write_len < self.buffer.len() {
writer.write_all(&self.buffer[write_len..])?;
}
}
Ok(())
}
fn clone_box(&self) -> Box<dyn FilterPlugin> {
Box::new(Self {
count: self.count,
buffer: Vec::new(),
tokenizer: self.tokenizer.clone(),
encoding: self.encoding,
})
}
fn options(&self) -> Vec<FilterOption> {
vec![
FilterOption {
name: "count".to_string(),
default: None,
required: true,
},
FilterOption {
name: "encoding".to_string(),
default: Some(serde_json::Value::String("cl100k_base".to_string())),
required: false,
},
]
}
fn description(&self) -> &str {
"Read the last N LLM tokens"
}
}
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
/// Map a byte position in a lossy string back to a position in the original byte slice.
///
/// `String::from_utf8_lossy` replaces invalid UTF-8 bytes with the Unicode
/// replacement character (U+FFFD), which encodes to 3 bytes in UTF-8. This
/// function walks both the original bytes and the lossy string in lockstep,
/// finding the original byte position that corresponds to `lossy_pos`.
fn map_lossy_pos_to_bytes(original: &[u8], lossy: &str, lossy_pos: usize) -> usize {
if lossy_pos == 0 {
return 0;
}
let replacement = '\u{FFFD}';
let replacement_len = replacement.len_utf8(); // 3 bytes
let mut orig_idx = 0usize;
let mut lossy_idx = 0usize;
let lossy_bytes = lossy.as_bytes();
while lossy_idx < lossy_pos && orig_idx < original.len() {
// Try to decode the next character from the original bytes
match std::str::from_utf8(&original[orig_idx..]) {
Ok("") => break,
Ok(s) => {
let ch = s.chars().next().unwrap();
let ch_len = ch.len_utf8();
// Check if this is a replacement character in the lossy string
if ch == replacement
&& lossy_idx + replacement_len <= lossy_pos
&& lossy_bytes[lossy_idx..].starts_with(
&replacement.encode_utf8(&mut [0; 4]).as_bytes()[..replacement_len],
)
{
// Could be a real U+FFFD or a replacement of invalid bytes.
// If the original byte at this position is valid UTF-8 start, it's real.
if original[orig_idx] < 0x80 || original[orig_idx] >= 0xC0 {
// Real character
orig_idx += ch_len;
lossy_idx += ch_len;
} else {
// Invalid byte that was replaced — advance original by 1
orig_idx += 1;
lossy_idx += replacement_len;
}
} else {
orig_idx += ch_len;
lossy_idx += ch_len;
}
}
Err(e) => {
let valid = e.valid_up_to();
if valid > 0 {
// Some valid bytes, then invalid
orig_idx += valid;
lossy_idx += valid;
} else {
// Invalid byte — in lossy it becomes 3-byte replacement char
orig_idx += 1;
lossy_idx += replacement_len;
}
}
}
}
orig_idx.min(original.len())
}
// ---------------------------------------------------------------------------
// Registration
// ---------------------------------------------------------------------------
#[ctor::ctor]
fn register_token_filters() {
register_filter_plugin("head_tokens", || Box::new(HeadTokensFilter::new(0)))
.expect("Failed to register head_tokens filter");
register_filter_plugin("skip_tokens", || Box::new(SkipTokensFilter::new(0)))
.expect("Failed to register skip_tokens filter");
register_filter_plugin("tail_tokens", || Box::new(TailTokensFilter::new(0)))
.expect("Failed to register tail_tokens filter");
}
#[cfg(test)]
mod tests {
use super::*;
use std::io::Cursor;
fn make_tokenizer() -> Tokenizer {
get_tokenizer(TokenEncoding::Cl100kBase).clone()
}
#[test]
fn test_head_tokens_basic() {
let mut filter = HeadTokensFilter::new(3);
filter.tokenizer = make_tokenizer();
let input = b"The quick brown fox";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
let result = String::from_utf8_lossy(&output);
// "The quick brown" is typically 3 tokens
assert!(!result.is_empty());
assert!(result.len() <= input.len());
}
#[test]
fn test_head_tokens_zero() {
let mut filter = HeadTokensFilter::new(0);
filter.tokenizer = make_tokenizer();
let input = b"The quick brown fox";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
assert!(output.is_empty());
}
#[test]
fn test_head_tokens_more_than_available() {
let mut filter = HeadTokensFilter::new(1000);
filter.tokenizer = make_tokenizer();
let input = b"Hello world";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
assert_eq!(output, input);
}
#[test]
fn test_skip_tokens_basic() {
let mut filter = SkipTokensFilter::new(2);
filter.tokenizer = make_tokenizer();
let input = b"The quick brown fox";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
let result = String::from_utf8_lossy(&output);
// Should have skipped some tokens
assert!(result.len() < input.len());
}
#[test]
fn test_skip_tokens_zero() {
let mut filter = SkipTokensFilter::new(0);
filter.tokenizer = make_tokenizer();
let input = b"Hello world";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
assert_eq!(output, input);
}
#[test]
fn test_tail_tokens_basic() {
let mut filter = TailTokensFilter::new(2);
filter.tokenizer = make_tokenizer();
let input = b"The quick brown fox jumps over the lazy dog";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
let result = String::from_utf8_lossy(&output);
// Should only have last 2 tokens
assert!(!result.is_empty());
assert!(result.len() < input.len());
}
#[test]
fn test_tail_tokens_zero() {
let mut filter = TailTokensFilter::new(0);
filter.tokenizer = make_tokenizer();
let input = b"Hello world";
let mut output = Vec::new();
filter.filter(&mut Cursor::new(input), &mut output).unwrap();
assert!(output.is_empty());
}
#[test]
fn test_map_lossy_pos_ascii() {
let original = b"Hello world";
let lossy = String::from_utf8_lossy(original);
assert_eq!(map_lossy_pos_to_bytes(original, &lossy, 5), 5);
assert_eq!(map_lossy_pos_to_bytes(original, &lossy, 0), 0);
assert_eq!(map_lossy_pos_to_bytes(original, &lossy, 11), 11);
}
#[test]
fn test_map_lossy_pos_with_invalid_utf8() {
let original = b"Hello\x80world";
let lossy = String::from_utf8_lossy(original);
// lossy = "Hello\u{FFFD}world" (13 bytes)
// Position 5 in lossy = after "Hello" = position 5 in original
assert_eq!(map_lossy_pos_to_bytes(original, &lossy, 5), 5);
// Position 8 in lossy = "Hello\u{FFFD}" = position 6 in original
// (the invalid byte \x80 at position 5 was replaced)
assert_eq!(map_lossy_pos_to_bytes(original, &lossy, 8), 6);
}
}

225
src/import_tar.rs Normal file
View File

@@ -0,0 +1,225 @@
use anyhow::{Context, Result, anyhow};
use log::debug;
use std::collections::HashMap;
use std::fs;
use std::io::{Read, Write};
use std::path::Path;
use std::str::FromStr;
use tempfile::TempDir;
use tar::Archive;
use crate::common::PIPESIZE;
use crate::compression_engine::CompressionType;
use crate::db;
use crate::modes::common::ImportMeta;
/// Represents a parsed tar entry from an export archive.
struct TarEntry {
/// Path to the extracted data file in the temp directory.
data_path: Option<std::path::PathBuf>,
/// Path to the extracted meta file in the temp directory.
meta_path: Option<std::path::PathBuf>,
}
/// Import all items from a `.keep.tar` archive.
///
/// Items are imported in ascending order of their original IDs,
/// ensuring chronological ordering is preserved. Each imported item
/// receives a new auto-incremented ID from the target database.
///
/// # Arguments
///
/// * `tar_path` - Path to the `.keep.tar` file.
/// * `conn` - Mutable database connection.
/// * `data_path` - Path to the data storage directory.
///
/// # Returns
///
/// A list of newly assigned item IDs.
pub fn import_from_tar(
tar_path: &Path,
conn: &mut rusqlite::Connection,
data_path: &Path,
) -> Result<Vec<i64>> {
let file = fs::File::open(tar_path)
.with_context(|| format!("Cannot open tar file: {}", tar_path.display()))?;
let mut archive = Archive::new(file);
let tmp_dir = TempDir::new().context("Cannot create temporary directory for import")?;
let tmp_path = tmp_dir.path();
// Extract entries to temp dir
let mut entries_map: HashMap<i64, TarEntry> = HashMap::new();
for entry_result in archive.entries().context("Cannot read tar entries")? {
let mut entry = entry_result.context("Cannot read tar entry")?;
let entry_path = entry.path().context("Cannot get entry path")?.to_path_buf();
let path_str = entry_path.to_string_lossy().replace('\\', "/");
// Reject path traversal attempts
if path_str.starts_with('/') || path_str.starts_with("..") || path_str.contains("/../") {
return Err(anyhow!("Rejected path traversal entry: {path_str}"));
}
// Skip directory entries
if entry.header().entry_type().is_dir() {
debug!("IMPORT_TAR: Skipping directory entry: {path_str}");
continue;
}
// Parse: <dir>/<id>.data.<compression> or <dir>/<id>.meta.yml
let filename = entry_path
.file_name()
.ok_or_else(|| anyhow!("Invalid entry path: {path_str}"))?
.to_string_lossy();
let (orig_id, is_data) = if let Some(id_str) = filename.strip_suffix(".meta.yml") {
let id: i64 = id_str
.parse()
.with_context(|| format!("Invalid ID in entry: {path_str}"))?;
(id, false)
} else if let Some(dot_pos) = filename.find(".data.") {
let id_str = &filename[..dot_pos];
let id: i64 = id_str
.parse()
.with_context(|| format!("Invalid ID in entry: {path_str}"))?;
(id, true)
} else {
debug!("IMPORT_TAR: Skipping unrecognized entry: {path_str}");
continue;
};
let entry_ref = entries_map.entry(orig_id).or_insert_with(|| TarEntry {
data_path: None,
meta_path: None,
});
if is_data {
let dest = tmp_path.join(format!("{orig_id}.data"));
let mut dest_file = fs::File::create(&dest).context("Cannot create temp data file")?;
let mut buf = [0u8; PIPESIZE];
loop {
let n = entry.read(&mut buf)?;
if n == 0 {
break;
}
dest_file.write_all(&buf[..n])?;
}
entry_ref.data_path = Some(dest);
debug!("IMPORT_TAR: Extracted data for original ID {orig_id}");
} else {
let dest = tmp_path.join(format!("{orig_id}.meta.yml"));
let mut dest_file = fs::File::create(&dest).context("Cannot create temp meta file")?;
let mut buf = [0u8; PIPESIZE];
loop {
let n = entry.read(&mut buf)?;
if n == 0 {
break;
}
dest_file.write_all(&buf[..n])?;
}
entry_ref.meta_path = Some(dest);
debug!("IMPORT_TAR: Extracted meta for original ID {orig_id}");
}
}
if entries_map.is_empty() {
return Err(anyhow!("No items found in archive"));
}
// Sort by original ID ascending
let mut sorted_ids: Vec<i64> = entries_map.keys().copied().collect();
sorted_ids.sort_unstable();
let mut imported_ids = Vec::new();
for orig_id in sorted_ids {
let entry = entries_map.get(&orig_id).expect("ID should exist in map");
let meta_path = entry
.meta_path
.as_ref()
.ok_or_else(|| anyhow!("Item {orig_id} missing .meta.yml entry"))?;
let data_path_entry = entry
.data_path
.as_ref()
.ok_or_else(|| anyhow!("Item {orig_id} missing .data entry"))?;
// Parse metadata
let meta_yaml = fs::read_to_string(meta_path)
.with_context(|| format!("Cannot read meta file for item {orig_id}"))?;
let import_meta: ImportMeta = serde_yaml::from_str(&meta_yaml)
.with_context(|| format!("Cannot parse meta file for item {orig_id}"))?;
// Validate compression type
CompressionType::from_str(&import_meta.compression).map_err(|_| {
anyhow!(
"Invalid compression type '{}' for item {}",
import_meta.compression,
orig_id
)
})?;
// Create item with original timestamp
let item = db::insert_item_with_ts(conn, import_meta.ts, &import_meta.compression)?;
let new_id = item.id.context("New item missing ID")?;
// Set tags
let tags = if !import_meta.tags.is_empty() {
db::set_item_tags(conn, item.clone(), &import_meta.tags)?;
import_meta.tags.clone()
} else {
Vec::new()
};
// Stream data to storage
let mut storage_path = data_path.to_path_buf();
storage_path.push(new_id.to_string());
let mut reader = fs::File::open(data_path_entry)
.with_context(|| format!("Cannot read data file for item {orig_id}"))?;
let mut writer = fs::File::create(&storage_path)
.with_context(|| format!("Cannot create storage file for item {new_id}"))?;
let mut buf = [0u8; PIPESIZE];
let mut total = 0i64;
loop {
let n = reader.read(&mut buf)?;
if n == 0 {
break;
}
writer.write_all(&buf[..n])?;
total += n as i64;
}
if total == 0 {
return Err(anyhow!("Item {orig_id} has empty data file"));
}
// Set metadata
for (key, value) in &import_meta.metadata {
db::query_upsert_meta(
conn,
db::Meta {
id: new_id,
name: key.clone(),
value: value.clone(),
},
)?;
}
// Update item sizes
let size_to_record = import_meta.uncompressed_size.unwrap_or(total);
let mut updated_item = item;
updated_item.uncompressed_size = Some(size_to_record);
updated_item.compressed_size = Some(std::fs::metadata(&storage_path)?.len() as i64);
updated_item.closed = true;
db::update_item(conn, updated_item)?;
log::info!("KEEP: Imported item {new_id} (was {orig_id}) tags: {tags:?}");
imported_ids.push(new_id);
}
Ok(imported_ids)
}

View File

@@ -35,7 +35,9 @@ pub mod common;
pub mod compression_engine;
pub mod config;
pub mod db;
pub mod export_tar;
pub mod filter_plugin;
pub mod import_tar;
pub mod meta_plugin;
pub mod modes;
pub mod services;
@@ -43,27 +45,55 @@ pub mod services;
#[cfg(feature = "client")]
pub mod client;
#[cfg(feature = "meta_tokens")]
pub mod tokenizer;
// Re-export Args struct for library usage
pub use args::Args;
// Re-export PIPESIZE constant
pub use common::PIPESIZE;
pub use services::CoreError;
// Import all filter plugins to ensure they register themselves
#[allow(unused_imports)]
use filter_plugin::{grep, head, skip, strip_ansi, tail};
#[cfg(feature = "filter_grep")]
use filter_plugin::grep;
#[allow(unused_imports)]
use filter_plugin::{head, skip, strip_ansi, tail};
#[cfg(feature = "meta_tokens")]
#[allow(unused_imports)]
use filter_plugin::tokens as token_filters;
use crate::meta_plugin::{
cwd, digest, env, exec, hostname, keep_pid, read_rate, read_time, shell, shell_pid, user,
};
#[cfg(feature = "magic")]
#[cfg(feature = "meta_magic")]
#[allow(unused_imports)]
use crate::meta_plugin::magic_file;
#[cfg(feature = "meta_tokens")]
#[allow(unused_imports)]
use crate::meta_plugin::tokens;
#[cfg(feature = "meta_infer")]
#[allow(unused_imports)]
use crate::meta_plugin::infer_plugin;
#[cfg(feature = "meta_tree_magic_mini")]
#[allow(unused_imports)]
use crate::meta_plugin::tree_magic_mini;
/// Initializes plugins at library load time.
///
/// Ensures all filter and meta plugins are registered via their ctors.
/// Call this early in application startup if needed (though ctors handle most cases).
/// Plugin registration happens automatically via `#[ctor]` constructors
/// when each plugin module is loaded. The explicit module imports in
/// `lib.rs` guarantee this happens at library initialization time.
///
/// This function exists as a public API entry point for callers that
/// want to explicitly ensure plugins are ready. It intentionally does
/// no additional work.
///
/// # Examples
///
@@ -71,8 +101,8 @@ use crate::meta_plugin::magic_file;
/// keep::init_plugins();
/// ```
pub fn init_plugins() {
// This will be expanded in Step 3 implementation
// For now, the ctors handle registration
// Plugins self-register via #[ctor] on module load.
// The use-statements in lib.rs guarantee module inclusion.
}
#[cfg(test)]

View File

@@ -1,3 +1,6 @@
use std::io::Write;
use std::time::Instant;
use anyhow::{Context, Error, Result, anyhow};
use clap::error::ErrorKind;
use clap::*;
@@ -25,13 +28,42 @@ fn main() -> Result<(), Error> {
cmd.error(ErrorKind::ValueValidation, e).exit();
}
stderrlog::new()
.module(module_path!())
.quiet(args.options.quiet)
.verbosity(usize::from(args.options.verbose + 2))
//.timestamp(stderrlog::Timestamp::Second)
.init()
.unwrap();
// Handle --generate-completion early (prints to stdout and exits)
if let Some(shell) = args.mode.generate_completion {
clap_complete::generate(shell, &mut Args::command(), "keep", &mut std::io::stdout());
std::process::exit(0);
}
let start = Instant::now();
let mut builder = env_logger::Builder::new();
let show_module = args.options.verbose >= 2;
builder.format(move |buf, record| {
let elapsed = start.elapsed();
let ts = format!("[{:>6}.{:03}]", elapsed.as_secs(), elapsed.subsec_millis());
if show_module {
writeln!(
buf,
"{} {:<5} {}: {}",
ts,
record.level(),
record.module_path().unwrap_or("?"),
record.args()
)
} else {
writeln!(buf, "{} {:<5} {}", ts, record.level(), record.args())
}
});
let max_level = if args.options.quiet {
LevelFilter::Off
} else {
match args.options.verbose {
0 => LevelFilter::Warn,
1 => LevelFilter::Debug,
_ => LevelFilter::Trace,
}
};
builder.filter_module("keep", max_level);
builder.init();
debug!("MAIN: Start");
@@ -49,7 +81,7 @@ fn main() -> Result<(), Error> {
let ids = &mut Vec::new();
let tags = &mut Vec::new();
// For --info and --get modes, treat numeric strings as IDs
// For --info, --get, --export, and --list modes, treat numeric strings as IDs
for v in args.ids_or_tags.iter() {
debug!("MAIN: Parsed value: {v:?}");
match v.clone() {
@@ -58,22 +90,15 @@ fn main() -> Result<(), Error> {
ids.push(num)
}
NumberOrString::Str(str) => {
// For --info and --get, try to parse strings as numbers to treat them as IDs
if args.mode.info || args.mode.get {
if let Ok(num) = str.parse::<i64>() {
// For --info, --get, --export, and --list, try to parse strings as numbers to treat them as IDs
if (args.mode.info || args.mode.get || args.mode.export || args.mode.list)
&& let Ok(num) = str.parse::<i64>()
{
debug!("MAIN: Adding parsed string to ids: {num}");
ids.push(num);
continue;
} else if args.mode.info {
// --info only accepts numeric IDs
cmd.error(
ErrorKind::InvalidValue,
format!("--info requires numeric IDs, found: '{str}'"),
)
.exit();
}
}
// If not a number, or not using --info/--get, treat as tag
// If not a number, or not using --info/--get/--export/--list, treat as tag
debug!("MAIN: Adding to tags: {str}");
tags.push(str)
}
@@ -92,8 +117,12 @@ fn main() -> Result<(), Error> {
List,
Delete,
Info,
Update,
Export,
Import,
Status,
StatusPlugins,
#[cfg(feature = "server")]
Server,
GenerateConfig,
}
@@ -112,13 +141,24 @@ fn main() -> Result<(), Error> {
mode = KeepModes::Delete;
} else if args.mode.info {
mode = KeepModes::Info;
} else if args.mode.update {
mode = KeepModes::Update;
} else if args.mode.export {
mode = KeepModes::Export;
} else if args.mode.import.is_some() {
mode = KeepModes::Import;
} else if args.mode.status {
mode = KeepModes::Status;
} else if args.mode.status_plugins {
mode = KeepModes::StatusPlugins;
} else if args.mode.server {
}
#[cfg(feature = "server")]
{
if args.mode.server {
mode = KeepModes::Server;
} else if args.mode.generate_config {
}
}
if args.mode.generate_config {
mode = KeepModes::GenerateConfig;
}
@@ -154,6 +194,7 @@ fn main() -> Result<(), Error> {
}
// Validate server password usage
#[cfg(feature = "server")]
if settings.server_password().is_some() && mode != KeepModes::Server {
cmd.error(
ErrorKind::InvalidValue,
@@ -162,6 +203,15 @@ fn main() -> Result<(), Error> {
.exit();
}
// Validate ids-only usage
if settings.ids_only && mode != KeepModes::List {
cmd.error(
ErrorKind::InvalidValue,
"--ids-only can only be used with --list mode",
)
.exit();
}
debug!("MAIN: args: {args:?}");
debug!("MAIN: ids: {ids:?}");
debug!("MAIN: tags: {tags:?}");
@@ -188,12 +238,20 @@ fn main() -> Result<(), Error> {
#[cfg(feature = "client")]
{
if let Some(ref client_url) = settings.client_url {
let client =
keep::client::KeepClient::new(client_url, settings.client_password.clone())?;
let client = keep::client::KeepClient::new(
client_url,
settings.client_username.clone(),
settings.client_password.clone(),
settings.client_jwt.clone(),
)?;
return match mode {
KeepModes::Save => {
let metadata = std::collections::HashMap::new();
let metadata: std::collections::HashMap<String, String> = settings
.meta
.iter()
.filter_map(|(k, v)| v.as_ref().map(|val| (k.clone(), val.clone())))
.collect();
keep::modes::client::save::mode(&client, &mut cmd, &settings, tags, metadata)
}
KeepModes::Get => keep::modes::client::get::mode(
@@ -205,7 +263,7 @@ fn main() -> Result<(), Error> {
filter_chain,
),
KeepModes::List => {
keep::modes::client::list::mode(&client, &mut cmd, &settings, tags)
keep::modes::client::list::mode(&client, &mut cmd, &settings, ids, tags)
}
KeepModes::Delete => {
keep::modes::client::delete::mode(&client, &mut cmd, &settings, ids)
@@ -219,6 +277,16 @@ fn main() -> Result<(), Error> {
KeepModes::Status => {
keep::modes::client::status::mode(&client, &mut cmd, &settings)
}
KeepModes::Update => {
keep::modes::client::update::mode(&client, &mut cmd, &settings, ids, tags)
}
KeepModes::Export => {
keep::modes::client::export::mode(&client, &mut cmd, &settings, ids, tags)
}
KeepModes::Import => {
let meta_file = args.mode.import.as_ref().unwrap();
keep::modes::client::import::mode(&client, &mut cmd, &settings, meta_file)
}
_ => {
cmd.error(
ErrorKind::InvalidValue,
@@ -230,6 +298,9 @@ fn main() -> Result<(), Error> {
}
}
// SAFETY: umask is thread-safe by POSIX spec, and we invoke it exactly once
// before any file operations to set a secure default mask. No other threads
// exist yet at this point in main(), so there is no data race.
unsafe {
libc::umask(0o077);
}
@@ -271,23 +342,28 @@ fn main() -> Result<(), Error> {
KeepModes::Info => {
modes::info::mode_info(&mut cmd, &settings, ids, tags, &mut conn, data_path)
}
KeepModes::Update => {
modes::update::mode_update(&mut cmd, &settings, ids, tags, &mut conn, data_path)
}
KeepModes::Export => modes::export::mode_export(
&mut cmd,
&settings,
ids,
tags,
&mut conn,
data_path,
filter_chain,
),
KeepModes::Import => {
let meta_file = args.mode.import.as_ref().unwrap();
modes::import::mode_import(&mut cmd, &settings, meta_file, &mut conn, data_path)
}
KeepModes::Status => modes::status::mode_status(&mut cmd, &settings, data_path, db_path),
KeepModes::StatusPlugins => {
modes::status_plugins::mode_status_plugins(&mut cmd, &settings, data_path, db_path)
}
KeepModes::Server => {
#[cfg(feature = "server")]
{
modes::server::mode_server(&mut cmd, &settings, &mut conn, data_path)
}
#[cfg(not(feature = "server"))]
{
cmd.error(
ErrorKind::MissingRequiredArgument,
"This binary was not compiled with server support. Recompile with --features server"
).exit();
}
}
KeepModes::Server => modes::server::mode_server(&mut cmd, &settings, &mut conn, data_path),
KeepModes::GenerateConfig => {
modes::generate_config::mode_generate_config(&mut cmd, &settings)
}

View File

@@ -49,6 +49,14 @@ impl MetaPlugin for CwdMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn finalize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process again
if self.is_finalized {
@@ -105,16 +113,20 @@ impl MetaPlugin for CwdMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -124,5 +136,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_cwd_plugin() {
register_meta_plugin(MetaPluginType::Cwd, |options, outputs| {
Box::new(CwdMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register CwdMetaPlugin");
}

View File

@@ -32,7 +32,7 @@ impl Hasher {
match self {
Hasher::Sha256(hasher) => hasher.update(data),
Hasher::Md5(hasher) => {
let _ = hasher.write(data);
hasher.consume(data);
}
Hasher::Sha512(hasher) => hasher.update(data),
}
@@ -159,6 +159,14 @@ impl MetaPlugin for DigestMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn initialize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
@@ -235,8 +243,10 @@ impl MetaPlugin for DigestMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
@@ -251,8 +261,14 @@ impl MetaPlugin for DigestMetaPlugin {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
@@ -263,5 +279,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_digest_plugin() {
register_meta_plugin(MetaPluginType::Digest, |options, outputs| {
Box::new(DigestMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register DigestMetaPlugin");
}

View File

@@ -22,24 +22,40 @@ impl EnvMetaPlugin {
///
/// A new instance of `EnvMetaPlugin`.
pub fn new(
_options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> Self {
// Collect environment variables starting with KEEP_META_
let mut env_vars = Vec::new();
let mut outputs_map = std::collections::HashMap::new();
// Use options from --meta-plugin JSON if provided and non-empty,
// otherwise fall back to KEEP_META_* environment variables.
let use_options = options.as_ref().map(|o| !o.is_empty()).unwrap_or(false);
if use_options {
let opts = options.as_ref().unwrap();
for (key, value) in opts {
let value_str = match value {
serde_yaml::Value::String(s) => s.clone(),
serde_yaml::Value::Number(n) => n.to_string(),
serde_yaml::Value::Bool(b) => b.to_string(),
_ => serde_yaml::to_string(value).unwrap_or_default(),
};
env_vars.push((key.clone(), value_str));
outputs_map.insert(key.clone(), serde_yaml::Value::String(key.clone()));
}
} else {
// Fall back to KEEP_META_* environment variables
for (key, value) in std::env::vars() {
if let Some(stripped_key) = key.strip_prefix("KEEP_META_") {
// Add to env_vars to process later
env_vars.push((stripped_key.to_string(), value));
// Add to outputs with default mapping to the stripped name
outputs_map.insert(
stripped_key.to_string(),
serde_yaml::Value::String(stripped_key.to_string()),
);
}
}
}
// Override with provided outputs
if let Some(provided_outputs) = outputs {
@@ -87,6 +103,14 @@ impl MetaPlugin for EnvMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Initializes the plugin, processing environment variables.
///
/// Processes all KEEP_META_* variables and generates metadata using output mappings.
@@ -183,8 +207,10 @@ impl MetaPlugin for EnvMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names based on collected env vars.
@@ -212,8 +238,10 @@ impl MetaPlugin for EnvMetaPlugin {
/// # Panics
///
/// Panics with "options_mut() not implemented for EnvMetaPlugin".
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -223,5 +251,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_env_plugin() {
register_meta_plugin(MetaPluginType::Env, |options, outputs| {
Box::new(EnvMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register EnvMetaPlugin");
}

View File

@@ -20,7 +20,7 @@ pub struct MetaPluginExec {
pub supported: bool,
pub split_whitespace: bool,
process: Option<Child>,
writer: Option<Box<dyn Write>>,
writer: Option<Box<dyn Write + Send>>,
result: Option<String>,
base: BaseMetaPlugin,
}
@@ -131,7 +131,19 @@ impl MetaPluginExec {
match cmd.spawn() {
Ok(mut child) => {
let stdin = child.stdin.take().unwrap();
let stdin = match child.stdin.take() {
Some(s) => s,
None => {
error!(
"META: Exec plugin: failed to capture stdin for '{}'",
self.program
);
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
};
self.writer = Some(Box::new(stdin));
self.process = Some(child);
debug!("META: Exec plugin: started process for '{}'", self.program);
@@ -167,6 +179,14 @@ impl MetaPlugin for MetaPluginExec {
false
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn initialize(&mut self) -> MetaPluginResponse {
self.start_process()
}
@@ -244,21 +264,29 @@ impl MetaPlugin for MetaPluginExec {
&self.base.outputs
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.base.outputs
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(&mut self.base.outputs)
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
&self.base.options
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.base.options
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(&mut self.base.options)
}
fn default_outputs(&self) -> Vec<String> {
vec!["exec".to_string()]
}
fn parallel_safe(&self) -> bool {
true
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -303,5 +331,6 @@ fn register_exec_plugin() {
options,
outputs,
))
});
})
.expect("Failed to register ExecMetaPlugin");
}

View File

@@ -211,6 +211,14 @@ impl MetaPlugin for HostnameMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn finalize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process again
if self.is_finalized {
@@ -375,8 +383,10 @@ impl MetaPlugin for HostnameMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
@@ -391,8 +401,10 @@ impl MetaPlugin for HostnameMetaPlugin {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -402,5 +414,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_hostname_plugin() {
register_meta_plugin(MetaPluginType::Hostname, |options, outputs| {
Box::new(HostnameMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register HostnameMetaPlugin");
}

View File

@@ -0,0 +1,177 @@
use crate::common::PIPESIZE;
use crate::meta_plugin::{
BaseMetaPlugin, MetaPlugin, MetaPluginResponse, MetaPluginType, process_metadata_outputs,
register_meta_plugin,
};
#[derive(Debug, Default)]
pub struct InferMetaPlugin {
buffer: Vec<u8>,
max_buffer_size: usize,
is_finalized: bool,
base: BaseMetaPlugin,
}
impl InferMetaPlugin {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> InferMetaPlugin {
let mut base = BaseMetaPlugin::new();
if let Some(opts) = options {
for (key, value) in opts {
base.options.insert(key, value);
}
}
let max_buffer_size = base
.options
.get("max_buffer_size")
.and_then(|v| v.as_u64())
.unwrap_or(PIPESIZE as u64) as usize;
base.outputs.insert(
"infer_mime_type".to_string(),
serde_yaml::Value::String("infer_mime_type".to_string()),
);
if let Some(outs) = outputs {
for (key, value) in outs {
base.outputs.insert(key, value);
}
}
InferMetaPlugin {
buffer: Vec::new(),
max_buffer_size,
is_finalized: false,
base,
}
}
}
impl MetaPlugin for InferMetaPlugin {
fn meta_type(&self) -> MetaPluginType {
MetaPluginType::Infer
}
fn is_finalized(&self) -> bool {
self.is_finalized
}
fn set_finalized(&mut self, finalized: bool) {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn update(&mut self, data: &[u8]) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let remaining = self.max_buffer_size.saturating_sub(self.buffer.len());
let to_add = &data[..data.len().min(remaining)];
self.buffer.extend_from_slice(to_add);
if self.buffer.len() >= self.max_buffer_size {
let mime_type = infer::get(&self.buffer)
.map(|kind| kind.mime_type().to_string())
.unwrap_or_else(|| "application/octet-stream".to_string());
self.is_finalized = true;
let metadata = process_metadata_outputs(
"infer_mime_type",
serde_yaml::Value::String(mime_type),
self.base.outputs(),
)
.map(|m| vec![m])
.unwrap_or_default();
return MetaPluginResponse {
metadata,
is_finalized: true,
};
}
MetaPluginResponse {
metadata: Vec::new(),
is_finalized: false,
}
}
fn finalize(&mut self) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let mime_type = infer::get(&self.buffer)
.map(|kind| kind.mime_type().to_string())
.unwrap_or_else(|| "application/octet-stream".to_string());
self.is_finalized = true;
let metadata = process_metadata_outputs(
"infer_mime_type",
serde_yaml::Value::String(mime_type),
self.base.outputs(),
)
.map(|m| vec![m])
.unwrap_or_default();
MetaPluginResponse {
metadata,
is_finalized: true,
}
}
fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs()
}
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
vec!["infer_mime_type".to_string()]
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
#[ctor::ctor]
fn register_infer_plugin() {
register_meta_plugin(MetaPluginType::Infer, |options, outputs| {
Box::new(InferMetaPlugin::new(options, outputs))
})
.expect("Failed to register InferMetaPlugin");
}

View File

@@ -54,6 +54,14 @@ impl MetaPlugin for KeepPidMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Finalizes the plugin, processing any remaining data if needed.
///
/// # Returns
@@ -162,8 +170,10 @@ impl MetaPlugin for KeepPidMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names for this plugin.
@@ -189,8 +199,10 @@ impl MetaPlugin for KeepPidMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -200,5 +212,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_keep_pid_plugin() {
register_meta_plugin(MetaPluginType::KeepPid, |options, outputs| {
Box::new(KeepPidMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register KeepPidMetaPlugin");
}

View File

@@ -1,337 +0,0 @@
use magic::{Cookie, CookieFlags};
use std::io;
use crate::common::PIPESIZE;
use crate::meta_plugin::{MetaPlugin, MetaPluginType};
#[derive(Debug)]
pub struct MagicFileMetaPlugin {
buffer: Vec<u8>,
max_buffer_size: usize,
is_finalized: bool,
cookie: Option<Cookie>,
base: crate::meta_plugin::BaseMetaPlugin,
}
impl MagicFileMetaPlugin {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> MagicFileMetaPlugin {
// Start with default options
let mut final_options = std::collections::HashMap::new();
final_options.insert("max_buffer_size".to_string(), serde_yaml::Value::Number(PIPESIZE.into()));
if let Some(opts) = options {
for (key, value) in opts {
final_options.insert(key, value);
}
}
// Start with default outputs
let mut final_outputs = std::collections::HashMap::new();
let default_outputs = vec!["mime_type".to_string(), "mime_encoding".to_string(), "file_type".to_string()];
for output_name in default_outputs {
final_outputs.insert(output_name.clone(), serde_yaml::Value::String(output_name));
}
if let Some(outs) = outputs {
for (key, value) in outs {
final_outputs.insert(key, value);
}
}
let max_buffer_size = final_options.get("max_buffer_size")
.and_then(|v| v.as_u64())
.unwrap_or(PIPESIZE as u64) as usize;
// Ensure the default max_buffer_size is in the options
if !final_options.contains_key("max_buffer_size") {
final_options.insert("max_buffer_size".to_string(), serde_yaml::Value::Number(PIPESIZE.into()));
}
let mut base = crate::meta_plugin::BaseMetaPlugin::new();
base.outputs = final_outputs;
base.options = final_options;
MagicFileMetaPlugin {
buffer: Vec::new(),
max_buffer_size,
is_finalized: false,
cookie: None,
base,
}
}
fn get_magic_result(&self, flags: CookieFlags) -> io::Result<String> {
// Use the existing cookie and just change flags
if let Some(cookie) = &self.cookie {
cookie.set_flags(flags)
.map_err(|e| io::Error::new(io::ErrorKind::Other, format!("Failed to set magic flags: {}", e)))?;
let result = cookie.buffer(&self.buffer)
.map_err(|e| io::Error::new(io::ErrorKind::Other, format!("Failed to analyze buffer: {}", e)))?;
// Clean up the result - remove extra whitespace and take first part if needed
let trimmed = result.trim();
// For some magic results, we might want just the first part before semicolon or comma
let cleaned = if trimmed.contains(';') {
trimmed.split(';').next().unwrap_or(trimmed).trim()
} else if trimmed.contains(',') && flags.contains(CookieFlags::MIME_TYPE | CookieFlags::MIME_ENCODING) {
trimmed.split(',').next().unwrap_or(trimmed).trim()
} else {
trimmed
};
Ok(cleaned.to_string())
} else {
Err(io::Error::new(io::ErrorKind::Other, "Magic cookie not initialized"))
}
}
/// Helper function to process all magic types and collect metadata
fn process_magic_types(&self) -> Vec<crate::meta_plugin::MetaData> {
let mut metadata = Vec::new();
// Define the types to process with their corresponding flags
let types_to_process = [
("mime_type", CookieFlags::MIME_TYPE),
("mime_encoding", CookieFlags::MIME_ENCODING),
("file_type", CookieFlags::default()),
];
for (name, flags) in types_to_process.iter() {
if let Ok(result) = self.get_magic_result(*flags) {
if !result.is_empty() {
// Use process_metadata_outputs to handle output mapping
if let Some(meta_data) = crate::meta_plugin::process_metadata_outputs(
name,
serde_yaml::Value::String(result),
self.base.outputs()
) {
metadata.push(meta_data);
}
}
}
}
metadata
}
}
impl MetaPlugin for MagicFileMetaPlugin {
/// Checks if the plugin has been finalized.
///
/// # Returns
///
/// `true` if finalized, `false` otherwise.
fn is_finalized(&self) -> bool {
self.is_finalized
}
/// Sets the finalized state of the plugin.
///
/// # Arguments
///
/// * `finalized` - The new finalized state.
fn set_finalized(&mut self, finalized: bool) {
self.is_finalized = finalized;
}
/// Initializes the magic cookie for file type detection.
///
/// Loads the magic database; finalizes if initialization fails.
///
/// # Returns
///
/// A `MetaPluginResponse` with empty metadata; `is_finalized` is `true` on failure.
///
/// # Errors
///
/// Logs errors; returns finalized response on cookie or load failure.
///
/// # Examples
///
/// ```
/// let mut plugin = MagicFileMetaPlugin::new(None, None);
/// let response = plugin.initialize();
/// ```
fn initialize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// Initialize the magic cookie once
let cookie = match Cookie::open(Default::default()) {
Ok(cookie) => cookie,
Err(_e) => {
return crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
};
if let Err(_e) = cookie.load(&[] as &[&str]) {
return crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
self.cookie = Some(cookie);
crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
is_finalized: false,
}
}
/// Finalizes the plugin and performs file type detection.
///
/// Analyzes the accumulated buffer and outputs detected types.
///
/// # Returns
///
/// A `MetaPluginResponse` with detection metadata and finalized state set to `true`.
///
/// # Examples
///
/// ```
/// let mut plugin = MagicFileMetaPlugin::new(None, None);
/// // ... after updates
/// let response = plugin.finalize();
/// assert!(response.is_finalized);
/// ```
fn finalize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process again
if self.is_finalized {
return crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let metadata = self.process_magic_types();
// Mark as finalized
self.is_finalized = true;
crate::meta_plugin::MetaPluginResponse {
metadata,
is_finalized: true,
}
}
/// Updates the plugin with new data, accumulating for analysis.
///
/// Buffers data up to `max_buffer_size`; triggers detection when full.
///
/// # Arguments
///
/// * `data` - Content chunk to buffer.
///
/// # Returns
///
/// A `MetaPluginResponse` with metadata on buffer full; finalizes then.
///
/// # Examples
///
/// ```
/// let mut plugin = MagicFileMetaPlugin::new(None, None);
/// let response = plugin.update(b"content");
/// ```
fn update(&mut self, data: &[u8]) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process more data
if self.is_finalized {
return crate::meta_plugin::MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let mut metadata = Vec::new();
// Only collect up to max_buffer_size
let remaining_capacity = self.max_buffer_size.saturating_sub(self.buffer.len());
if remaining_capacity > 0 {
let bytes_to_copy = std::cmp::min(data.len(), remaining_capacity);
self.buffer.extend_from_slice(&data[..bytes_to_copy]);
// Check if we've reached our buffer limit and return metadata
if self.buffer.len() >= self.max_buffer_size {
metadata = self.process_magic_types();
// Mark as finalized when we've processed enough data
self.is_finalized = true;
}
}
let is_finalized = !metadata.is_empty();
crate::meta_plugin::MetaPluginResponse {
metadata,
is_finalized,
}
}
/// Returns the type of this meta plugin.
///
/// # Returns
///
/// `MetaPluginType::MagicFile`.
fn meta_type(&self) -> MetaPluginType {
MetaPluginType::MagicFile
}
/// Returns a reference to the outputs mapping.
///
/// # Returns
///
/// A reference to the `HashMap` of outputs.
fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs()
}
/// Returns a mutable reference to the outputs mapping.
///
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
}
/// Returns the default output names for this plugin.
///
/// # Returns
///
/// Vector of default output field names.
fn default_outputs(&self) -> Vec<String> {
vec!["mime_type".to_string(), "mime_encoding".to_string(), "file_type".to_string()]
}
/// Returns a reference to the options mapping.
///
/// # Returns
///
/// A reference to the `HashMap` of options.
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
/// Returns a mutable reference to the options mapping.
///
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
}
}
use crate::meta_plugin::register_meta_plugin;
// Register the plugin at module initialization time
#[ctor::ctor]
fn register_magic_file_plugin() {
register_meta_plugin(MetaPluginType::MagicFile, |options, outputs| {
Box::new(MagicFileMetaPlugin::new(options, outputs))
});
}

View File

@@ -1,9 +1,8 @@
#[cfg(feature = "magic")]
#[cfg(feature = "meta_magic")]
use magic::{Cookie, CookieFlags};
#[cfg(not(feature = "magic"))]
#[cfg(not(feature = "meta_magic"))]
use std::process::{Command, Stdio};
use log::debug;
use std::io::{self, Write};
use std::path::Path;
@@ -12,17 +11,26 @@ use crate::meta_plugin::{
process_metadata_outputs,
};
#[cfg(feature = "magic")]
// Thread-local libmagic cookie, lazily initialized on first access per thread.
// Each thread gets its own independent Cookie instance. Libmagic documents that
// separate cookies can be used from different threads concurrently without
// synchronization. Using thread_local! avoids unsafe impl Send since the
// storage is inherently !Send.
#[cfg(feature = "meta_magic")]
thread_local! {
static MAGIC_COOKIE: std::cell::RefCell<Option<Cookie>> = const { std::cell::RefCell::new(None) };
}
#[cfg(feature = "meta_magic")]
#[derive(Debug)]
pub struct MagicFileMetaPluginImpl {
buffer: Vec<u8>,
max_buffer_size: usize,
is_finalized: bool,
cookie: Option<Cookie>,
base: BaseMetaPlugin,
}
#[cfg(feature = "magic")]
#[cfg(feature = "meta_magic")]
impl MagicFileMetaPluginImpl {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
@@ -45,13 +53,28 @@ impl MagicFileMetaPluginImpl {
buffer: Vec::new(),
max_buffer_size,
is_finalized: false,
cookie: None,
base,
}
}
fn get_magic_result(&self, flags: CookieFlags) -> io::Result<String> {
if let Some(cookie) = &self.cookie {
MAGIC_COOKIE.with(|cell| {
// Lazy init: create cookie on first access per thread
{
let mut opt = cell.borrow_mut();
if opt.is_none() {
let cookie = Cookie::open(CookieFlags::default())
.map_err(|e| io::Error::other(format!("Failed to open magic: {e}")))?;
cookie.load(&[] as &[&Path]).map_err(|e| {
io::Error::other(format!("Failed to load magic database: {e}"))
})?;
*opt = Some(cookie);
}
}
let cookie_ref = cell.borrow();
let cookie = cookie_ref.as_ref().expect("cookie initialized above");
cookie
.set_flags(flags)
.map_err(|e| io::Error::other(format!("Failed to set magic flags: {e}")))?;
@@ -60,13 +83,8 @@ impl MagicFileMetaPluginImpl {
.buffer(&self.buffer)
.map_err(|e| io::Error::other(format!("Failed to analyze buffer: {e}")))?;
// Clean up the result - remove extra whitespace
let trimmed = result.trim().to_string();
Ok(trimmed)
} else {
Err(io::Error::other("Magic cookie not initialized"))
}
Ok(result.trim().to_string())
})
}
fn process_magic_types(&self) -> Vec<MetaData> {
@@ -95,7 +113,7 @@ impl MagicFileMetaPluginImpl {
}
}
#[cfg(feature = "magic")]
#[cfg(feature = "meta_magic")]
impl MetaPlugin for MagicFileMetaPluginImpl {
fn is_finalized(&self) -> bool {
self.is_finalized
@@ -105,28 +123,16 @@ impl MetaPlugin for MagicFileMetaPluginImpl {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn initialize(&mut self) -> MetaPluginResponse {
let cookie = match Cookie::open(CookieFlags::default()) {
Ok(cookie) => cookie,
Err(e) => {
debug!("META: MagicFile plugin: failed to create cookie: {e}");
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
};
if let Err(e) = cookie.load(&[] as &[&Path]) {
debug!("META: MagicFile plugin: failed to load magic database: {e}");
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
self.cookie = Some(cookie);
// Cookie is lazily initialized in the thread-local on first use.
MetaPluginResponse {
metadata: Vec::new(),
is_finalized: false,
@@ -187,8 +193,10 @@ impl MetaPlugin for MagicFileMetaPluginImpl {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
@@ -203,12 +211,21 @@ impl MetaPlugin for MagicFileMetaPluginImpl {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
#[cfg(not(feature = "magic"))]
#[cfg(feature = "meta_magic")]
pub use MagicFileMetaPluginImpl as MagicFileMetaPlugin;
#[cfg(not(feature = "meta_magic"))]
#[derive(Debug)]
pub struct FallbackMagicFileMetaPlugin {
buffer: Vec<u8>,
@@ -217,26 +234,23 @@ pub struct FallbackMagicFileMetaPlugin {
base: BaseMetaPlugin,
}
#[cfg(not(feature = "magic"))]
#[cfg(not(feature = "meta_magic"))]
impl FallbackMagicFileMetaPlugin {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> FallbackMagicFileMetaPlugin {
) -> Self {
let mut base = BaseMetaPlugin::new();
// Set default outputs
let default_outputs = &["mime_type", "mime_encoding", "file_type"];
base.initialize_plugin(default_outputs, &options, &outputs);
// Get max_buffer_size from options, default to PIPESIZE
let max_buffer_size = base
.options
.get("max_buffer_size")
.and_then(|v| v.as_u64())
.unwrap_or(crate::common::PIPESIZE as u64) as usize;
FallbackMagicFileMetaPlugin {
Self {
buffer: Vec::new(),
max_buffer_size,
is_finalized: false,
@@ -244,76 +258,85 @@ impl FallbackMagicFileMetaPlugin {
}
}
fn run_file_command(&self, buffer: &[u8]) -> io::Result<String> {
let mut temp_file = tempfile::NamedTempFile::new()?;
temp_file.as_ref().write_all(buffer)?;
fn run_file_command(&self, args: &[&str]) -> Option<String> {
let output = Command::new("file")
.arg("-b")
.arg("-m")
.arg("all")
.arg(temp_file.path())
.output()
.map_err(|e| {
io::Error::new(
io::ErrorKind::Other,
format!("Failed to run file command: {}", e),
)
})?;
.args(args)
.arg("-")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.and_then(|mut child| {
if let Some(mut stdin) = child.stdin.take() {
if stdin.write_all(&self.buffer).is_err() {
// Ignore write error; child will see EOF and likely fail
// the file detection, returning no output.
}
}
child.wait_with_output()
});
if !output.status.success() {
return Err(io::Error::new(io::ErrorKind::Other, "File command failed"));
output
.ok()
.map(|o| String::from_utf8_lossy(&o.stdout).trim().to_string())
}
let result = String::from_utf8_lossy(&output.stdout).trim().to_string();
Ok(result)
}
fn process_file_output(&self, result: &str) -> Vec<MetaData> {
fn detect_type(&self) -> Vec<MetaData> {
let mut metadata = Vec::new();
// Parse the file command output
// file -m all output format is typically: type; charset=encoding
let parts: Vec<&str> = result.split(';').map(|s| s.trim()).collect();
let file_type = parts.first().cloned().unwrap_or(result);
let mime_encoding = parts
.get(1)
.and_then(|s| s.strip_prefix("charset="))
.cloned()
.unwrap_or("");
// Get mime_type and mime_encoding via --mime
if let Some(mime_line) = self.run_file_command(&["--brief", "--mime"]) {
// Format: "text/plain; charset=us-ascii"
if let Some((mime_type, rest)) = mime_line.split_once(';') {
let mime_type = mime_type.trim().to_string();
let mime_encoding = rest
.trim()
.strip_prefix("charset=")
.unwrap_or("binary")
.to_string();
// For mime_type, try to infer from file type or use a heuristic
let mime_type = if file_type.starts_with("text") {
"text/plain"
} else if file_type.contains("ASCII") || file_type.contains("UTF-8") {
"text/plain"
} else if file_type.contains("empty") {
"application/octet-stream"
} else {
"application/octet-stream" // default
};
let outputs_to_process = [
("mime_type", mime_type),
("mime_encoding", mime_encoding),
("file_type", file_type),
];
for (name, value) in outputs_to_process.iter() {
if let Some(meta_data) = process_metadata_outputs(
name,
serde_yaml::Value::String(value.to_string()),
"mime_type",
serde_yaml::Value::String(mime_type),
self.base.outputs(),
) {
metadata.push(meta_data);
}
if let Some(meta_data) = process_metadata_outputs(
"mime_encoding",
serde_yaml::Value::String(mime_encoding),
self.base.outputs(),
) {
metadata.push(meta_data);
}
} else {
// No charset, just mime type
if let Some(meta_data) = process_metadata_outputs(
"mime_type",
serde_yaml::Value::String(mime_line),
self.base.outputs(),
) {
metadata.push(meta_data);
}
}
}
// Get human-readable file type via --brief
if let Some(file_type) = self.run_file_command(&["--brief"])
&& !file_type.is_empty()
&& let Some(meta_data) = process_metadata_outputs(
"file_type",
serde_yaml::Value::String(file_type),
self.base.outputs(),
)
{
metadata.push(meta_data);
}
metadata
}
}
#[cfg(not(feature = "magic"))]
#[cfg(not(feature = "meta_magic"))]
impl MetaPlugin for FallbackMagicFileMetaPlugin {
fn is_finalized(&self) -> bool {
self.is_finalized
@@ -323,8 +346,15 @@ impl MetaPlugin for FallbackMagicFileMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn initialize(&mut self) -> MetaPluginResponse {
// No initialization needed for fallback
MetaPluginResponse {
metadata: Vec::new(),
is_finalized: false,
@@ -339,27 +369,18 @@ impl MetaPlugin for FallbackMagicFileMetaPlugin {
};
}
let remaining_capacity = self.max_buffer_size.saturating_sub(self.buffer.len());
if remaining_capacity > 0 {
let bytes_to_copy = std::cmp::min(data.len(), remaining_capacity);
self.buffer.extend_from_slice(&data[..bytes_to_copy]);
let remaining = self.max_buffer_size.saturating_sub(self.buffer.len());
if remaining > 0 {
let n = std::cmp::min(data.len(), remaining);
self.buffer.extend_from_slice(&data[..n]);
if self.buffer.len() >= self.max_buffer_size {
if let Ok(result) = self.run_file_command(&self.buffer) {
let metadata = self.process_file_output(&result);
let metadata = self.detect_type();
self.is_finalized = true;
return MetaPluginResponse {
metadata,
is_finalized: true,
};
} else {
// On error, finalize with empty metadata
self.is_finalized = true;
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
}
}
@@ -376,21 +397,9 @@ impl MetaPlugin for FallbackMagicFileMetaPlugin {
is_finalized: true,
};
}
let metadata = if !self.buffer.is_empty() {
if let Ok(result) = self.run_file_command(&self.buffer) {
self.process_file_output(&result)
} else {
Vec::new()
}
} else {
Vec::new()
};
self.is_finalized = true;
MetaPluginResponse {
metadata,
metadata: self.detect_type(),
is_finalized: true,
}
}
@@ -403,8 +412,10 @@ impl MetaPlugin for FallbackMagicFileMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
@@ -419,15 +430,18 @@ impl MetaPlugin for FallbackMagicFileMetaPlugin {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
#[cfg(feature = "magic")]
pub use MagicFileMetaPluginImpl as MagicFileMetaPlugin;
#[cfg(not(feature = "magic"))]
#[cfg(not(feature = "meta_magic"))]
pub use FallbackMagicFileMetaPlugin as MagicFileMetaPlugin;
use crate::meta_plugin::register_meta_plugin;
@@ -436,5 +450,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_magic_file_plugin() {
register_meta_plugin(MetaPluginType::MagicFile, |options, outputs| {
Box::new(MagicFileMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register MagicFileMetaPlugin");
}

View File

@@ -1,41 +1,48 @@
use log::debug;
use once_cell::sync::Lazy;
use log::{debug, warn};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::sync::Mutex;
use std::sync::{Arc, Mutex};
pub mod cwd;
pub mod digest;
pub mod env;
pub mod exec;
pub mod hostname;
#[cfg(feature = "meta_infer")]
pub mod infer_plugin;
pub mod keep_pid;
#[cfg(feature = "magic")]
pub mod magic_file;
pub mod read_rate;
pub mod read_time;
pub mod shell;
pub mod shell_pid;
pub mod text;
#[cfg(feature = "meta_tokens")]
pub mod tokens;
#[cfg(feature = "meta_tree_magic_mini")]
pub mod tree_magic_mini;
pub mod user;
// pub mod text; // Removed duplicate
pub use digest::DigestMetaPlugin;
pub use exec::MetaPluginExec;
#[cfg(feature = "magic")]
#[cfg(feature = "meta_magic")]
pub use magic_file::MagicFileMetaPlugin;
// pub use text::TextMetaPlugin; // Removed duplicate
pub use cwd::CwdMetaPlugin;
pub use env::EnvMetaPlugin;
pub use hostname::HostnameMetaPlugin;
#[cfg(feature = "meta_infer")]
pub use infer_plugin::InferMetaPlugin;
pub use keep_pid::KeepPidMetaPlugin;
pub use read_rate::ReadRateMetaPlugin;
pub use read_time::ReadTimeMetaPlugin;
pub use shell::ShellMetaPlugin;
pub use shell_pid::ShellPidMetaPlugin;
#[cfg(feature = "meta_tree_magic_mini")]
pub use tree_magic_mini::TreeMagicMiniMetaPlugin;
pub use user::UserMetaPlugin;
#[cfg(not(feature = "magic"))]
#[cfg(not(feature = "meta_magic"))]
pub use magic_file::FallbackMagicFileMetaPlugin as MagicFileMetaPlugin;
type PluginConstructor = fn(
@@ -61,8 +68,16 @@ pub struct MetaPluginResponse {
pub is_finalized: bool,
}
/// Type alias for the save_meta callback shared by all plugins.
pub type SaveMetaFn = Arc<Mutex<dyn FnMut(&str, &str) + Send>>;
/// Creates a no-op save_meta for plugins not wired through MetaService.
pub fn noop_save_meta() -> SaveMetaFn {
Arc::new(Mutex::new(|_: &str, _: &str| {}))
}
/// Base implementation for meta plugins to reduce boilerplate.
#[derive(Debug, Clone, Default)]
#[derive(Clone)]
pub struct BaseMetaPlugin {
/// Output mappings for metadata.
pub outputs: std::collections::HashMap<String, serde_yaml::Value>,
@@ -70,6 +85,29 @@ pub struct BaseMetaPlugin {
pub options: std::collections::HashMap<String, serde_yaml::Value>,
/// Whether the plugin is finalized.
pub is_finalized: bool,
/// Callback to store metadata. Called directly by plugins.
pub save_meta: SaveMetaFn,
}
impl std::fmt::Debug for BaseMetaPlugin {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("BaseMetaPlugin")
.field("outputs", &self.outputs)
.field("options", &self.options)
.field("is_finalized", &self.is_finalized)
.finish_non_exhaustive()
}
}
impl Default for BaseMetaPlugin {
fn default() -> Self {
Self {
outputs: HashMap::new(),
options: HashMap::new(),
is_finalized: false,
save_meta: noop_save_meta(),
}
}
}
impl BaseMetaPlugin {
@@ -83,41 +121,39 @@ impl BaseMetaPlugin {
}
/// Returns a reference to the outputs mapping.
///
/// # Returns
///
/// A reference to the `HashMap` of outputs.
pub fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
&self.outputs
}
/// Returns a mutable reference to the outputs mapping.
///
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
pub fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.outputs
}
/// Returns a reference to the options mapping.
///
/// # Returns
///
/// A reference to the `HashMap` of options.
pub fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
&self.options
}
/// Returns a mutable reference to the options mapping.
///
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
pub fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.options
}
/// Sets the save_meta callback on the base plugin.
pub fn set_save_meta(&mut self, save_meta: SaveMetaFn) {
self.save_meta = save_meta;
}
/// Saves a metadata entry via the save_meta callback.
pub fn save_meta(&self, name: &str, value: &str) {
if let Ok(mut f) = self.save_meta.lock() {
f(name, value);
} else {
warn!("META_PLUGIN: save_meta lock poisoned, dropping metadata: {name}={value}");
}
}
/// Helper function to initialize plugin options and outputs.
///
/// # Arguments
@@ -179,8 +215,10 @@ impl MetaPlugin for BaseMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.outputs
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(&mut self.outputs)
}
/// Returns a reference to the options mapping.
@@ -197,8 +235,10 @@ impl MetaPlugin for BaseMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
&mut self.options
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(&mut self.options)
}
}
@@ -229,6 +269,9 @@ pub enum MetaPluginType {
Hostname,
Exec,
Env,
Tokens,
TreeMagicMini,
Infer,
}
/// Central function to handle metadata output with name mapping.
@@ -262,22 +305,7 @@ pub fn process_metadata_outputs(
return None;
}
if let Some(custom_name) = mapping.as_str() {
// Convert the value to a string representation
let value_str = match &value {
serde_yaml::Value::Null => "null".to_string(),
serde_yaml::Value::Bool(b) => b.to_string(),
serde_yaml::Value::Number(n) => n.to_string(),
serde_yaml::Value::String(s) => s.clone(),
serde_yaml::Value::Sequence(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
serde_yaml::Value::Mapping(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
serde_yaml::Value::Tagged(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
};
let value_str = yaml_value_to_string(&value);
debug!(
"META: Processing metadata: internal_name={internal_name}, custom_name={custom_name}, value={value_str}"
);
@@ -288,22 +316,7 @@ pub fn process_metadata_outputs(
}
}
// Convert the value to a string representation
let value_str = match &value {
serde_yaml::Value::Null => "null".to_string(),
serde_yaml::Value::Bool(b) => b.to_string(),
serde_yaml::Value::Number(n) => n.to_string(),
serde_yaml::Value::String(s) => s.clone(),
serde_yaml::Value::Sequence(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
serde_yaml::Value::Mapping(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
serde_yaml::Value::Tagged(_) => {
serde_yaml::to_string(&value).unwrap_or_else(|_| "".to_string())
}
};
let value_str = yaml_value_to_string(&value);
// Default: use internal name as output name
debug!("META: Processing metadata: name={internal_name}, value={value_str}");
@@ -313,7 +326,21 @@ pub fn process_metadata_outputs(
})
}
pub trait MetaPlugin
fn yaml_value_to_string(value: &serde_yaml::Value) -> String {
match value {
serde_yaml::Value::Null => "null".to_string(),
serde_yaml::Value::Bool(b) => b.to_string(),
serde_yaml::Value::Number(n) => n.to_string(),
serde_yaml::Value::String(s) => s.clone(),
serde_yaml::Value::Sequence(_)
| serde_yaml::Value::Mapping(_)
| serde_yaml::Value::Tagged(_) => {
serde_yaml::to_string(value).unwrap_or_else(|_| "".to_string())
}
}
}
pub trait MetaPlugin: Send
where
Self: 'static,
{
@@ -416,19 +443,25 @@ where
///
/// An empty `HashMap` (default implementation).
fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
use once_cell::sync::Lazy;
static EMPTY: Lazy<std::collections::HashMap<String, serde_yaml::Value>> =
Lazy::new(std::collections::HashMap::new);
use std::sync::LazyLock;
static EMPTY: LazyLock<std::collections::HashMap<String, serde_yaml::Value>> =
LazyLock::new(std::collections::HashMap::new);
&EMPTY
}
/// Returns a mutable reference to the outputs mapping.
///
/// # Panics
/// # Returns
///
/// Panics with "outputs_mut() not implemented for this plugin".
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
panic!("outputs_mut() not implemented for this plugin")
/// A mutable reference to the outputs `HashMap`.
///
/// # Errors
///
/// Returns an error if the plugin does not support mutable outputs.
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
anyhow::bail!("outputs_mut() not supported by this plugin")
}
/// Returns a reference to the options mapping.
@@ -437,19 +470,25 @@ where
///
/// An empty `HashMap` (default implementation).
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
use once_cell::sync::Lazy;
static EMPTY: Lazy<std::collections::HashMap<String, serde_yaml::Value>> =
Lazy::new(std::collections::HashMap::new);
use std::sync::LazyLock;
static EMPTY: LazyLock<std::collections::HashMap<String, serde_yaml::Value>> =
LazyLock::new(std::collections::HashMap::new);
&EMPTY
}
/// Returns a mutable reference to the options mapping.
///
/// # Panics
/// # Returns
///
/// Panics with "options_mut() not implemented for this plugin".
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
panic!("options_mut() not implemented for this plugin")
/// A mutable reference to the options `HashMap`.
///
/// # Errors
///
/// Returns an error if the plugin does not support mutable options.
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
anyhow::bail!("options_mut() not supported by this plugin")
}
/// Gets the default output names this plugin can produce.
@@ -462,6 +501,82 @@ where
vec![self.meta_type().to_string()]
}
/// Returns a description of this plugin for display in config templates.
///
/// # Returns
///
/// A description string (empty by default).
fn description(&self) -> &str {
""
}
/// Returns true if this plugin can execute concurrently with other
/// parallel-safe plugins.
///
/// Plugins that do significant per-chunk work (hashing, tokenization,
/// piping to child processes) should return true. The MetaService will
/// run all parallel-safe plugins in separate threads per phase, then
/// process results sequentially.
fn parallel_safe(&self) -> bool {
false
}
/// Builds the schema for this plugin from its options and outputs.
///
/// Default implementation infers option types from YAML values and
/// collects enabled outputs.
///
/// # Returns
///
/// A `PluginSchema` describing this plugin's configuration.
fn schema(&self) -> crate::common::schema::PluginSchema {
use crate::common::schema::{OptionSchema, OptionType, OutputSchema, PluginSchema};
let options: Vec<OptionSchema> = self
.options()
.iter()
.map(|(key, value)| {
let option_type = OptionType::from_yaml_value(value);
let (default, required) = if value.is_null() {
(None, true)
} else {
(Some(value.clone()), false)
};
OptionSchema {
name: key.clone(),
option_type,
default,
required,
}
})
.collect();
let mut outputs: Vec<OutputSchema> = Vec::new();
for (key, value) in self.outputs() {
if !value.is_null() {
outputs.push(OutputSchema {
name: key.clone(),
description: key.clone(),
});
}
}
if outputs.is_empty() {
for output_name in self.default_outputs() {
outputs.push(OutputSchema {
name: output_name.clone(),
description: output_name,
});
}
}
PluginSchema {
name: self.meta_type().to_string(),
description: self.description().to_string(),
options,
outputs,
}
}
/// Method to downcast to concrete type (for checking finalization state).
///
/// # Returns
@@ -473,11 +588,22 @@ where
{
self
}
/// Sets the save_meta callback for this plugin.
///
/// Called by MetaService to wire the plugin to the metadata storage.
fn set_save_meta(&mut self, _save_meta: SaveMetaFn) {}
/// Saves a metadata entry via the save_meta callback.
///
/// Plugins call this during initialize/update/finalize to persist metadata.
fn save_meta(&self, _name: &str, _value: &str) {}
}
/// Global registry for meta plugins.
static META_PLUGIN_REGISTRY: Lazy<Mutex<HashMap<MetaPluginType, PluginConstructor>>> =
Lazy::new(|| Mutex::new(HashMap::new()));
static META_PLUGIN_REGISTRY: std::sync::LazyLock<
Mutex<HashMap<MetaPluginType, PluginConstructor>>,
> = std::sync::LazyLock::new(|| Mutex::new(HashMap::new()));
/// Register a meta plugin with the global registry.
///
@@ -485,23 +611,45 @@ static META_PLUGIN_REGISTRY: Lazy<Mutex<HashMap<MetaPluginType, PluginConstructo
///
/// * `meta_plugin_type` - The type of the meta plugin to register.
/// * `constructor` - The constructor function for creating plugin instances.
pub fn register_meta_plugin(meta_plugin_type: MetaPluginType, constructor: PluginConstructor) {
pub fn register_meta_plugin(
meta_plugin_type: MetaPluginType,
constructor: PluginConstructor,
) -> anyhow::Result<()> {
META_PLUGIN_REGISTRY
.lock()
.unwrap()
.map_err(|e| anyhow::anyhow!("plugin registry poisoned: {e}"))?
.insert(meta_plugin_type, constructor);
Ok(())
}
pub fn get_meta_plugin(
meta_plugin_type: MetaPluginType,
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> Box<dyn MetaPlugin> {
let registry = META_PLUGIN_REGISTRY.lock().unwrap();
) -> anyhow::Result<Box<dyn MetaPlugin>> {
get_meta_plugin_with_save(meta_plugin_type, options, outputs, None)
}
/// Creates a meta plugin instance with an optional save_meta callback.
///
/// If `save_meta` is provided, it is wired to the plugin so it can
/// store metadata directly during initialize/update/finalize.
pub fn get_meta_plugin_with_save(
meta_plugin_type: MetaPluginType,
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
save_meta: Option<SaveMetaFn>,
) -> anyhow::Result<Box<dyn MetaPlugin>> {
let registry = META_PLUGIN_REGISTRY
.lock()
.map_err(|e| anyhow::anyhow!("plugin registry poisoned: {e}"))?;
if let Some(constructor) = registry.get(&meta_plugin_type) {
return constructor(options, outputs);
let mut plugin = constructor(options, outputs);
if let Some(sm) = save_meta {
plugin.set_save_meta(sm);
}
return Ok(plugin);
}
// Fallback for unknown plugins
panic!("Meta plugin {meta_plugin_type:?} not registered");
anyhow::bail!("Meta plugin {meta_plugin_type:?} not registered")
}

View File

@@ -84,6 +84,14 @@ impl MetaPlugin for ReadRateMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Finalizes the plugin, calculating the read rate.
///
/// Computes KB/s from bytes read and elapsed time. Outputs via mappings.
@@ -193,8 +201,10 @@ impl MetaPlugin for ReadRateMetaPlugin {
/// # Returns
///
/// Mutable reference to the outputs HashMap.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names for this plugin.
@@ -222,8 +232,10 @@ impl MetaPlugin for ReadRateMetaPlugin {
/// # Returns
///
/// Mutable reference to the options HashMap.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -233,5 +245,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_read_rate_plugin() {
register_meta_plugin(MetaPluginType::ReadRate, |options, outputs| {
Box::new(ReadRateMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register ReadRateMetaPlugin");
}

View File

@@ -37,6 +37,14 @@ impl MetaPlugin for ReadTimeMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn finalize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process again
if self.is_finalized {
@@ -97,8 +105,10 @@ impl MetaPlugin for ReadTimeMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
@@ -109,8 +119,10 @@ impl MetaPlugin for ReadTimeMetaPlugin {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -120,5 +132,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_read_time_plugin() {
register_meta_plugin(MetaPluginType::ReadTime, |options, outputs| {
Box::new(ReadTimeMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register ReadTimeMetaPlugin");
}

View File

@@ -70,6 +70,14 @@ impl MetaPlugin for ShellMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Finalizes the plugin without processing data.
///
/// For this plugin, finalization is handled in `initialize`, so this returns empty metadata.
@@ -194,8 +202,10 @@ impl MetaPlugin for ShellMetaPlugin {
/// # Returns
///
/// * `&mut HashMap<String, serde_yaml::Value>` - Mutable outputs map.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names for this plugin.
@@ -221,8 +231,10 @@ impl MetaPlugin for ShellMetaPlugin {
/// # Returns
///
/// * `&mut HashMap<String, serde_yaml::Value>` - Mutable options map.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
/// Registers the shell meta plugin with the global registry.
@@ -236,5 +248,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_shell_plugin() {
register_meta_plugin(MetaPluginType::Shell, |options, outputs| {
Box::new(ShellMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register ShellMetaPlugin");
}

View File

@@ -35,6 +35,14 @@ impl MetaPlugin for ShellPidMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn finalize(&mut self) -> crate::meta_plugin::MetaPluginResponse {
// If already finalized, don't process again
if self.is_finalized {
@@ -109,16 +117,20 @@ impl MetaPlugin for ShellPidMetaPlugin {
self.base.outputs()
}
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -128,5 +140,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_shell_pid_plugin() {
register_meta_plugin(MetaPluginType::ShellPid, |options, outputs| {
Box::new(ShellPidMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register ShellPidMetaPlugin");
}

View File

@@ -510,6 +510,14 @@ impl MetaPlugin for TextMetaPlugin {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Updates the plugin with new data chunk.
///
/// Accumulates data for binary detection (if pending) or text statistics.
@@ -769,8 +777,10 @@ impl MetaPlugin for TextMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names for this plugin.
@@ -803,8 +813,10 @@ impl MetaPlugin for TextMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -814,5 +826,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_text_plugin() {
register_meta_plugin(MetaPluginType::Text, |options, outputs| {
Box::new(TextMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register TextMetaPlugin");
}

325
src/meta_plugin/tokens.rs Normal file
View File

@@ -0,0 +1,325 @@
use crate::common::PIPESIZE;
use crate::common::is_binary::is_binary;
use crate::meta_plugin::{MetaPlugin, MetaPluginResponse, MetaPluginType};
use crate::tokenizer::{TokenEncoding, get_tokenizer};
#[derive(Debug, Clone)]
pub struct TokensMetaPlugin {
/// Buffer for binary detection (up to PIPESIZE bytes).
buffer: Option<Vec<u8>>,
max_buffer_size: usize,
is_finalized: bool,
is_binary_content: Option<bool>,
/// Running token count accumulated across chunks.
token_count: usize,
/// UTF-8 boundary carry buffer.
utf8_buffer: Vec<u8>,
base: crate::meta_plugin::BaseMetaPlugin,
/// The tokenizer encoding.
encoding: TokenEncoding,
}
impl TokensMetaPlugin {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> Self {
let mut base = crate::meta_plugin::BaseMetaPlugin::new();
base.initialize_plugin(&["token_count"], &options, &outputs);
// Set default options
let default_options = vec![
(
"token_detect_size",
serde_yaml::Value::Number(PIPESIZE.into()),
),
(
"encoding",
serde_yaml::Value::String("cl100k_base".to_string()),
),
];
for (key, value) in default_options {
if !base.options.contains_key(key) {
base.options.insert(key.to_string(), value);
}
}
let max_buffer_size = base
.options
.get("token_detect_size")
.and_then(|v| v.as_u64())
.unwrap_or(PIPESIZE as u64) as usize;
let encoding = base
.options
.get("encoding")
.and_then(|v| v.as_str())
.and_then(|s| s.parse::<TokenEncoding>().ok())
.unwrap_or_default();
Self {
buffer: Some(Vec::new()),
max_buffer_size,
is_finalized: false,
is_binary_content: None,
token_count: 0,
utf8_buffer: Vec::new(),
base,
encoding,
}
}
/// Tokenize a byte chunk, handling UTF-8 boundaries.
///
/// Combines with any pending UTF-8 carry bytes, converts to text,
/// and adds the token count to the running total.
///
/// Avoids unnecessary allocations when there is no pending UTF-8 carry
/// and the data is valid UTF-8.
fn count_tokens(&mut self, data: &[u8]) {
if data.is_empty() && self.utf8_buffer.is_empty() {
return;
}
let tokenizer = get_tokenizer(self.encoding);
if self.utf8_buffer.is_empty() {
// Fast path: no pending carry — try to use data directly
match std::str::from_utf8(data) {
Ok(text) => {
if !text.is_empty() {
self.token_count += tokenizer.count(text);
}
return;
}
Err(e) => {
let valid_up_to = e.valid_up_to();
if valid_up_to > 0 {
// Count the valid prefix without copying
let text =
std::str::from_utf8(&data[..valid_up_to]).expect("validated prefix");
self.token_count += tokenizer.count(text);
}
// Save invalid trailing bytes for next call
self.utf8_buffer.extend_from_slice(&data[valid_up_to..]);
return;
}
}
}
// Slow path: pending carry bytes — must build combined buffer
let mut combined = std::mem::take(&mut self.utf8_buffer);
combined.extend_from_slice(data);
match std::str::from_utf8(&combined) {
Ok(text) => {
if !text.is_empty() {
self.token_count += tokenizer.count(text);
}
}
Err(e) => {
let valid_up_to = e.valid_up_to();
if valid_up_to > 0 {
let text =
std::str::from_utf8(&combined[..valid_up_to]).expect("validated prefix");
self.token_count += tokenizer.count(text);
}
self.utf8_buffer.extend_from_slice(&combined[valid_up_to..]);
}
}
}
/// Perform binary detection on the buffer.
fn detect_binary(&mut self, buffer: &[u8]) -> bool {
let result = is_binary(buffer);
self.is_binary_content = Some(result);
result
}
}
impl MetaPlugin for TokensMetaPlugin {
fn is_finalized(&self) -> bool {
self.is_finalized
}
fn set_finalized(&mut self, finalized: bool) {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn update(&mut self, data: &[u8]) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let mut metadata = Vec::new();
if self.is_binary_content.is_none() {
// Add data to the buffer
let should_detect = if let Some(ref mut buffer) = self.buffer {
let remaining = self.max_buffer_size.saturating_sub(buffer.len());
let to_take = std::cmp::min(data.len(), remaining);
buffer.extend_from_slice(&data[..to_take]);
buffer.len() >= std::cmp::min(1024, self.max_buffer_size)
} else {
false
};
if should_detect {
let buffer_data = self.buffer.as_ref().unwrap().clone();
let is_binary = self.detect_binary(&buffer_data);
if is_binary {
if let Some(md) = crate::meta_plugin::process_metadata_outputs(
"token_count",
serde_yaml::Value::Null,
self.base.outputs(),
) {
metadata.push(md);
}
self.buffer = None;
self.is_finalized = true;
return MetaPluginResponse {
metadata,
is_finalized: true,
};
}
// It's text — tokenize the full buffer (nothing was counted yet),
// then clear to avoid double-counting in finalize().
self.count_tokens(&buffer_data);
self.buffer = Some(Vec::new());
}
} else if self.is_binary_content == Some(false) {
self.count_tokens(data);
} else if self.is_binary_content == Some(true) {
self.is_finalized = true;
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
MetaPluginResponse {
metadata,
is_finalized: self.is_finalized,
}
}
fn finalize(&mut self) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let mut metadata = Vec::new();
// If binary detection hasn't completed, do it now
if self.is_binary_content.is_none()
&& let Some(buffer) = &self.buffer
&& !buffer.is_empty()
{
let buffer_data = buffer.clone();
let is_binary = self.detect_binary(&buffer_data);
if is_binary {
if let Some(md) = crate::meta_plugin::process_metadata_outputs(
"token_count",
serde_yaml::Value::Null,
self.base.outputs(),
) {
metadata.push(md);
}
self.buffer = None;
self.is_finalized = true;
return MetaPluginResponse {
metadata,
is_finalized: true,
};
}
}
// Tokenize any bytes in the buffer
if let Some(buffer) = &self.buffer {
let data = buffer.clone();
self.count_tokens(&data);
}
// Process any remaining UTF-8 bytes
if !self.utf8_buffer.is_empty() {
self.count_tokens(&[]);
}
// Emit token count
if let Some(md) = crate::meta_plugin::process_metadata_outputs(
"token_count",
serde_yaml::Value::String(self.token_count.to_string()),
self.base.outputs(),
) {
metadata.push(md);
}
self.buffer = None;
self.is_finalized = true;
MetaPluginResponse {
metadata,
is_finalized: true,
}
}
fn meta_type(&self) -> MetaPluginType {
MetaPluginType::Tokens
}
fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs()
}
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
vec!["token_count".to_string()]
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
use crate::meta_plugin::register_meta_plugin;
#[ctor::ctor]
fn register_tokens_plugin() {
register_meta_plugin(MetaPluginType::Tokens, |options, outputs| {
Box::new(TokensMetaPlugin::new(options, outputs))
})
.expect("Failed to register TokensMetaPlugin");
}

View File

@@ -0,0 +1,173 @@
use crate::common::PIPESIZE;
use crate::meta_plugin::{
BaseMetaPlugin, MetaPlugin, MetaPluginResponse, MetaPluginType, process_metadata_outputs,
register_meta_plugin,
};
#[derive(Debug, Default)]
pub struct TreeMagicMiniMetaPlugin {
buffer: Vec<u8>,
max_buffer_size: usize,
is_finalized: bool,
base: BaseMetaPlugin,
}
impl TreeMagicMiniMetaPlugin {
pub fn new(
options: Option<std::collections::HashMap<String, serde_yaml::Value>>,
outputs: Option<std::collections::HashMap<String, serde_yaml::Value>>,
) -> TreeMagicMiniMetaPlugin {
let mut base = BaseMetaPlugin::new();
if let Some(opts) = options {
for (key, value) in opts {
base.options.insert(key, value);
}
}
let max_buffer_size = base
.options
.get("max_buffer_size")
.and_then(|v| v.as_u64())
.unwrap_or(PIPESIZE as u64) as usize;
base.outputs.insert(
"tree_magic_mime_type".to_string(),
serde_yaml::Value::String("tree_magic_mime_type".to_string()),
);
if let Some(outs) = outputs {
for (key, value) in outs {
base.outputs.insert(key, value);
}
}
TreeMagicMiniMetaPlugin {
buffer: Vec::new(),
max_buffer_size,
is_finalized: false,
base,
}
}
}
impl MetaPlugin for TreeMagicMiniMetaPlugin {
fn meta_type(&self) -> MetaPluginType {
MetaPluginType::TreeMagicMini
}
fn is_finalized(&self) -> bool {
self.is_finalized
}
fn set_finalized(&mut self, finalized: bool) {
self.is_finalized = finalized;
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
fn update(&mut self, data: &[u8]) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let remaining = self.max_buffer_size.saturating_sub(self.buffer.len());
let to_add = &data[..data.len().min(remaining)];
self.buffer.extend_from_slice(to_add);
if self.buffer.len() >= self.max_buffer_size {
let mime_type = tree_magic_mini::from_u8(&self.buffer);
self.is_finalized = true;
let metadata = process_metadata_outputs(
"tree_magic_mime_type",
serde_yaml::Value::String(mime_type.to_string()),
self.base.outputs(),
)
.map(|m| vec![m])
.unwrap_or_default();
return MetaPluginResponse {
metadata,
is_finalized: true,
};
}
MetaPluginResponse {
metadata: Vec::new(),
is_finalized: false,
}
}
fn finalize(&mut self) -> MetaPluginResponse {
if self.is_finalized {
return MetaPluginResponse {
metadata: Vec::new(),
is_finalized: true,
};
}
let mime_type = tree_magic_mini::from_u8(&self.buffer);
self.is_finalized = true;
let metadata = process_metadata_outputs(
"tree_magic_mime_type",
serde_yaml::Value::String(mime_type.to_string()),
self.base.outputs(),
)
.map(|m| vec![m])
.unwrap_or_default();
MetaPluginResponse {
metadata,
is_finalized: true,
}
}
fn outputs(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs()
}
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
fn default_outputs(&self) -> Vec<String> {
vec!["tree_magic_mime_type".to_string()]
}
fn options(&self) -> &std::collections::HashMap<String, serde_yaml::Value> {
self.base.options()
}
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
fn parallel_safe(&self) -> bool {
true
}
}
#[ctor::ctor]
fn register_tree_magic_mini_plugin() {
register_meta_plugin(MetaPluginType::TreeMagicMini, |options, outputs| {
Box::new(TreeMagicMiniMetaPlugin::new(options, outputs))
})
.expect("Failed to register TreeMagicMiniMetaPlugin");
}

View File

@@ -105,6 +105,14 @@ impl MetaPlugin for UserMetaPlugin {
MetaPluginType::User
}
fn set_save_meta(&mut self, save_meta: crate::meta_plugin::SaveMetaFn) {
self.base.set_save_meta(save_meta);
}
fn save_meta(&self, name: &str, value: &str) {
self.base.save_meta(name, value);
}
/// Returns a reference to the outputs mapping.
///
/// # Returns
@@ -119,8 +127,10 @@ impl MetaPlugin for UserMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of outputs.
fn outputs_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.outputs_mut()
fn outputs_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.outputs_mut())
}
/// Returns the default output names.
@@ -151,8 +161,10 @@ impl MetaPlugin for UserMetaPlugin {
/// # Returns
///
/// A mutable reference to the `HashMap` of options.
fn options_mut(&mut self) -> &mut std::collections::HashMap<String, serde_yaml::Value> {
self.base.options_mut()
fn options_mut(
&mut self,
) -> anyhow::Result<&mut std::collections::HashMap<String, serde_yaml::Value>> {
Ok(self.base.options_mut())
}
}
use crate::meta_plugin::register_meta_plugin;
@@ -162,5 +174,6 @@ use crate::meta_plugin::register_meta_plugin;
fn register_user_plugin() {
register_meta_plugin(MetaPluginType::User, |options, outputs| {
Box::new(UserMetaPlugin::new(options, outputs))
});
})
.expect("Failed to register UserMetaPlugin");
}

View File

@@ -0,0 +1,77 @@
use anyhow::{Context, Result, anyhow};
use chrono::Utc;
use clap::Command;
use log::debug;
use std::collections::HashMap;
use std::fs;
use crate::client::KeepClient;
use crate::common::sanitize_ts_string;
use crate::config;
/// Export items to a `.keep.tar` archive via client.
///
/// Sends a request to the server's `/api/export` endpoint and
/// streams the response to a local tar file.
pub fn mode(
client: &KeepClient,
cmd: &mut Command,
settings: &config::Settings,
ids: &[i64],
tags: &[String],
) -> Result<()> {
// Validate: IDs XOR tags
if !ids.is_empty() && !tags.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Cannot use both IDs and tags with --export",
)
.exit();
}
if ids.is_empty() && tags.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Must provide either IDs or tags with --export",
)
.exit();
}
// We need to resolve items on the server to compute the filename.
// First, get the item info to build the filename template variables.
// For the tar filename, we use {name}_{ts}.keep.tar where name comes from
// --export-name or default export_<common-tags>.
let dir_name = if let Some(ref name) = settings.export_name {
name.clone()
} else {
"export".to_string()
};
let now = Utc::now();
let ts_str = sanitize_ts_string(&now.format("%Y-%m-%dT%H:%M:%SZ").to_string());
let mut vars = HashMap::new();
vars.insert("name".to_string(), dir_name);
vars.insert("ts".to_string(), ts_str);
let basename = strfmt::strfmt(&settings.export_filename_format, &vars).map_err(|e| {
anyhow!(
"Invalid export filename format '{}': {}",
settings.export_filename_format,
e
)
})?;
let tar_filename = format!("{basename}.keep.tar");
client
.export_items_to_file(ids, tags, std::path::Path::new(&tar_filename))
.map_err(|e| anyhow!("Export failed: {e}"))?;
if !settings.quiet {
eprintln!("{tar_filename}");
}
debug!("CLIENT_EXPORT: Wrote items to {tar_filename}");
Ok(())
}

View File

@@ -1,16 +1,17 @@
use crate::client::KeepClient;
use crate::compression_engine::CompressionType;
use crate::filter_plugin::FilterChain;
use crate::modes::common::{check_binary_tty, resolve_item_id};
use crate::services::compression_service::CompressionService;
use anyhow::Result;
use clap::Command;
use is_terminal::IsTerminal;
use log::debug;
use std::io::{Read, Write};
use std::str::FromStr;
pub fn mode(
client: &KeepClient,
_cmd: &mut Command,
cmd: &mut Command,
settings: &crate::config::Settings,
ids: &[i64],
tags: &[String],
@@ -18,78 +19,57 @@ pub fn mode(
) -> Result<(), anyhow::Error> {
debug!("CLIENT_GET: Getting item via remote server");
// Find the item ID
let item_id = if !ids.is_empty() {
ids[0]
} else if !tags.is_empty() {
// Find item by tags
let items = client.list_items(tags, "newest", 0, 1)?;
if items.is_empty() {
return Err(anyhow::anyhow!("No items found matching tags: {:?}", tags));
if !ids.is_empty() && !tags.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Both ID and tags given, you must supply either IDs or tags when using --get",
)
.exit();
}
items[0].id
} else {
// Get latest item
let items = client.list_items(&[], "newest", 0, 1)?;
if items.is_empty() {
return Err(anyhow::anyhow!("No items found"));
}
items[0].id
};
// Get item info to determine compression type
let item_id = resolve_item_id(client, ids, tags)?;
// Get item info for metadata
let item_info = client.get_item_info(item_id)?;
let metadata = &item_info.metadata;
// Get raw content from server
let (raw_bytes, compression) = client.get_item_content_raw(item_id)?;
// Get streaming reader for raw content
let (reader, compression) = client.get_item_content_stream(item_id)?;
let compression_type = CompressionType::from_str(&compression).unwrap_or(CompressionType::Raw);
// Check if binary content would be sent to TTY
let is_text = item_info
.metadata
.get("text")
.map(|v| v == "true")
.unwrap_or(false);
// Decompress through streaming readers
let mut decompressed_reader: Box<dyn Read> =
CompressionService::decompressing_reader(reader, &compression_type)?;
if std::io::stdout().is_terminal() && !is_text && !settings.force {
// Check if content is binary
let sample_len = std::cmp::min(raw_bytes.len(), 8192);
if crate::common::is_binary::is_binary(&raw_bytes[..sample_len]) {
return Err(anyhow::anyhow!(
"Refusing to output binary data to a terminal. Use --force to override."
));
}
}
// Binary detection: sample first chunk
let mut sample_buf = [0u8; crate::common::PIPESIZE];
let sample_len = decompressed_reader.read(&mut sample_buf)?;
check_binary_tty(metadata, &sample_buf[..sample_len], settings.force)?;
// Decompress locally
let compression_type = CompressionType::from_str(&compression).unwrap_or(CompressionType::None);
let decompressed = match compression_type {
CompressionType::GZip => {
use flate2::read::GzDecoder;
let mut decoder = GzDecoder::new(&raw_bytes[..]);
let mut content = Vec::new();
decoder.read_to_end(&mut content)?;
content
}
CompressionType::LZ4 => lz4_flex::decompress_size_prepended(&raw_bytes)
.map_err(|e| anyhow::anyhow!("LZ4 decompression failed: {}", e))?,
_ => raw_bytes,
};
// Apply filters if present
let output = if let Some(mut chain) = filter_chain {
let mut filtered = Vec::new();
chain.filter(&mut &decompressed[..], &mut filtered)?;
filtered
} else {
decompressed
};
// Stream to stdout
// If filters present, buffer through filter chain; otherwise stream directly
if let Some(mut chain) = filter_chain {
// Apply filter to sample first, then remaining
let mut output = Vec::new();
chain.filter(&mut &sample_buf[..sample_len], &mut output)?;
crate::common::stream_copy(&mut decompressed_reader, |chunk| {
chain.filter(&mut std::io::Cursor::new(chunk), &mut output)?;
Ok(())
})?;
let stdout = std::io::stdout();
let mut stdout = stdout.lock();
stdout.write_all(&output)?;
stdout.flush()?;
} else {
// Stream decompressed content to stdout
let stdout = std::io::stdout();
let mut stdout = stdout.lock();
stdout.write_all(&sample_buf[..sample_len])?;
crate::common::stream_copy(&mut decompressed_reader, |chunk| {
stdout.write_all(chunk)?;
Ok(())
})?;
stdout.flush()?;
}
Ok(())
}

160
src/modes/client/import.rs Normal file
View File

@@ -0,0 +1,160 @@
use anyhow::{Context, Result, anyhow};
use clap::Command;
use log::debug;
use std::collections::HashMap;
use std::fs;
use std::io::Read;
use std::path::Path;
use crate::client::KeepClient;
use crate::compression_engine::CompressionType;
use crate::config;
use crate::modes::common::ImportMeta;
use std::str::FromStr;
/// Import items from a `.keep.tar` archive or legacy `.meta.yml` file via client.
///
/// For `.keep.tar` files, streams the archive to the server's `/api/import` endpoint.
/// For `.meta.yml` files, uses the legacy single-item import path.
pub fn mode(
client: &KeepClient,
cmd: &mut Command,
settings: &config::Settings,
import_path: &str,
) -> Result<()> {
if import_path.ends_with(".keep.tar") {
import_tar(client, cmd, settings, import_path)
} else if import_path.ends_with(".meta.yml") {
import_legacy(client, cmd, settings, import_path)
} else {
cmd.error(
clap::error::ErrorKind::InvalidValue,
format!("Unsupported import format: {}", import_path),
)
.exit();
}
}
/// Import from a `.keep.tar` archive via the server API.
fn import_tar(
client: &KeepClient,
_cmd: &mut Command,
settings: &config::Settings,
tar_path: &str,
) -> Result<()> {
let path = Path::new(tar_path);
let imported_ids = client
.import_tar_file(path)
.map_err(|e| anyhow!("Import failed: {e}"))?;
if !settings.quiet {
println!(
"KEEP: Imported {} item(s): {:?}",
imported_ids.len(),
imported_ids
);
}
debug!(
"CLIENT_IMPORT: Imported {} items from {}",
imported_ids.len(),
tar_path
);
Ok(())
}
/// Legacy single-item import from a `.meta.yml` file.
fn import_legacy(
client: &KeepClient,
cmd: &mut Command,
settings: &config::Settings,
meta_file: &str,
) -> Result<()> {
// Read and parse metadata
let meta_yaml = fs::read_to_string(meta_file)
.with_context(|| format!("Cannot read metadata file: {meta_file}"))?;
let import_meta: ImportMeta = serde_yaml::from_str(&meta_yaml)
.with_context(|| format!("Cannot parse metadata file: {meta_file}"))?;
// Validate compression type
CompressionType::from_str(&import_meta.compression).map_err(|_| {
anyhow!(
"Invalid compression type '{}' in metadata file",
import_meta.compression
)
})?;
debug!(
"CLIENT_IMPORT: Parsed meta: ts={}, compression={}, tags={:?}",
import_meta.ts, import_meta.compression, import_meta.tags
);
// Build query parameters
let ts_str = import_meta.ts.to_rfc3339();
let params = [
("compress".to_string(), "false".to_string()),
("meta".to_string(), "false".to_string()),
("tags".to_string(), import_meta.tags.join(",")),
(
"compression_type".to_string(),
import_meta.compression.clone(),
),
("ts".to_string(), ts_str),
];
let param_refs: Vec<(&str, &str)> = params
.iter()
.map(|(k, v)| (k.as_str(), v.as_str()))
.collect();
// Stream data to server without buffering entire file
let item_info = if let Some(ref data_file) = settings.import_data_file {
let mut reader = fs::File::open(data_file)
.with_context(|| format!("Cannot read data file: {}", data_file.display()))?;
client.post_stream("/api/item/", &mut reader, &param_refs)?
} else {
// For stdin, we need to buffer since stdin can't be seeked
// and post_stream may need to retry.
let mut buf = Vec::new();
std::io::stdin()
.read_to_end(&mut buf)
.context("Cannot read data from stdin")?;
if buf.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"No data provided (empty stdin)",
)
.exit();
}
let mut cursor = std::io::Cursor::new(&buf);
client.post_stream("/api/item/", &mut cursor, &param_refs)?
};
let item_id = item_info.id;
debug!("CLIENT_IMPORT: Created item {} via server", item_id);
// Set uncompressed size if known from metadata
if let Some(size) = import_meta.uncompressed_size {
client.set_item_size(item_id, size as u64)?;
debug!("CLIENT_IMPORT: Set size to {}", size);
}
// Post metadata
if !import_meta.metadata.is_empty() {
client.post_metadata(item_id, &import_meta.metadata)?;
debug!(
"CLIENT_IMPORT: Set {} metadata entries",
import_meta.metadata.len()
);
}
if !settings.quiet {
println!(
"KEEP: Imported item {} tags: {:?}",
item_id, import_meta.tags
);
}
Ok(())
}

View File

@@ -1,5 +1,8 @@
use crate::client::KeepClient;
use crate::modes::common::{OutputFormat, format_size, settings_output_format};
use crate::modes::common::{
DisplayItemInfo, OutputFormat, format_size, render_item_info_table, resolve_item_ids,
settings_output_format,
};
use clap::Command;
use log::debug;
@@ -13,50 +16,34 @@ pub fn mode(
debug!("CLIENT_INFO: Getting item info via remote server");
let output_format = settings_output_format(settings);
// If tags provided, find matching item first
let item_ids: Vec<i64> = if !tags.is_empty() {
let items = client.list_items(tags, "newest", 0, 1)?;
if items.is_empty() {
return Err(anyhow::anyhow!("No items found matching tags: {:?}", tags));
}
items.into_iter().map(|i| i.id).collect()
} else {
ids.to_vec()
};
let item_ids = resolve_item_ids(client, ids, tags)?;
for &id in &item_ids {
let item = client.get_item_info(id)?;
match output_format {
OutputFormat::Json => {
println!("{}", serde_json::to_string_pretty(&item)?);
}
OutputFormat::Yaml => {
println!("{}", serde_yaml::to_string(&item)?);
OutputFormat::Json | OutputFormat::Yaml => {
crate::modes::common::print_serialized(&item, &output_format)?;
}
OutputFormat::Table => {
use comfy_table::{Table, presets::UTF8_FULL};
let mut table = Table::new();
table.load_preset(UTF8_FULL);
let size_str = item
.size
let display = DisplayItemInfo {
id: item.id,
timestamp: item.ts.clone(),
path: String::new(),
stream_size: item
.uncompressed_size
.map(|s| format_size(s as u64, settings.human_readable))
.unwrap_or_else(|| "N/A".to_string());
table.add_row(vec!["ID".to_string(), item.id.to_string()]);
table.add_row(vec!["Time".to_string(), item.ts.clone()]);
table.add_row(vec!["Size".to_string(), size_str]);
table.add_row(vec!["Compression".to_string(), item.compression.clone()]);
table.add_row(vec!["Tags".to_string(), item.tags.join(", ")]);
for (key, value) in &item.metadata {
table.add_row(vec![format!("Meta: {}", key), value.clone()]);
}
println!("{table}");
.unwrap_or_else(|| "N/A".to_string()),
compression: item.compression.clone(),
file_size: String::new(),
tags: item.tags.clone(),
metadata: item
.metadata
.iter()
.map(|(k, v)| (k.clone(), v.clone()))
.collect(),
};
render_item_info_table(&display, &settings.table_config);
}
}
}

View File

@@ -1,53 +1,69 @@
use crate::client::KeepClient;
use crate::modes::common::{OutputFormat, format_size, settings_output_format};
use crate::modes::common::{
ColumnType, OutputFormat, format_size, render_list_table_with_format, settings_output_format,
};
use clap::Command;
use log::debug;
use std::str::FromStr;
pub fn mode(
client: &KeepClient,
_cmd: &mut Command,
settings: &crate::config::Settings,
ids: &[i64],
tags: &[String],
) -> Result<(), anyhow::Error> {
debug!("CLIENT_LIST: Listing items via remote server");
let items = client.list_items(tags, "newest", 0, 100)?;
let items = client.list_items(ids, tags, "newest", 0, 100, &settings.meta_filter())?;
if settings.ids_only {
for item in &items {
println!("{}", item.id);
}
return Ok(());
}
let output_format = settings_output_format(settings);
match output_format {
OutputFormat::Json => {
println!("{}", serde_json::to_string_pretty(&items)?);
}
OutputFormat::Yaml => {
println!("{}", serde_yaml::to_string(&items)?);
OutputFormat::Json | OutputFormat::Yaml => {
crate::modes::common::print_serialized(&items, &output_format)?;
}
OutputFormat::Table => {
use comfy_table::{Table, presets::UTF8_FULL};
let mut table = Table::new();
table.load_preset(UTF8_FULL);
// Header
let headers = ["ID", "Time", "Size", "Compression", "Tags"];
table.set_header(headers.iter().map(|h| h.to_string()).collect::<Vec<_>>());
for item in &items {
let size_str = item
.size
let rows: Vec<Vec<String>> = items
.iter()
.map(|item| {
let mut row = Vec::new();
for column in &settings.list_format {
let col_type = ColumnType::from_str(&column.name).ok();
let cell = match col_type {
Some(ColumnType::Id) => item.id.to_string(),
Some(ColumnType::Time) => item.ts.clone(),
Some(ColumnType::Size) => item
.uncompressed_size
.map(|s| format_size(s as u64, settings.human_readable))
.unwrap_or_default();
table.add_row(vec![
item.id.to_string(),
item.ts.clone(),
size_str,
item.compression.clone(),
item.tags.join(", "),
]);
.unwrap_or_default(),
Some(ColumnType::Compression) => item.compression.clone(),
Some(ColumnType::Tags) => item.tags.join(" "),
Some(ColumnType::Meta) => {
let meta_key = column.name.strip_prefix("meta:");
match meta_key {
Some(key) => {
item.metadata.get(key).cloned().unwrap_or_default()
}
None => String::new(),
}
}
_ => String::new(),
};
row.push(cell);
}
row
})
.collect();
println!("{table}");
render_list_table_with_format(&settings.list_format, &rows, &settings.table_config);
}
}

View File

@@ -1,7 +1,10 @@
pub mod delete;
pub mod diff;
pub mod export;
pub mod get;
pub mod import;
pub mod info;
pub mod list;
pub mod save;
pub mod status;
pub mod update;

View File

@@ -1,12 +1,15 @@
use crate::client::{ItemInfo, KeepClient};
use crate::client::KeepClient;
use crate::compression_engine::CompressionType;
use crate::config::Settings;
use crate::meta_plugin::SaveMetaFn;
use crate::modes::common::settings_compression_type;
use crate::services::ItemInfo;
use crate::services::compression_service::CompressionService;
use crate::services::meta_service::MetaService;
use anyhow::Result;
use clap::Command;
use is_terminal::IsTerminal;
use log::debug;
use sha2::{Digest, Sha256};
use std::collections::HashMap;
use std::io::{Read, Write};
use std::sync::{Arc, Mutex};
@@ -14,11 +17,14 @@ use std::sync::{Arc, Mutex};
/// Streaming save mode for client.
///
/// Uses three threads for true streaming with constant memory:
/// - Reader thread: reads stdin, tees to stdout, computes SHA-256,
/// - Reader thread: reads stdin, tees to stdout, runs meta plugins,
/// compresses data, writes to OS pipe
/// - Pipe: zero-copy transfer of compressed bytes between threads
/// - Streamer thread: reads from pipe, streams to server via chunked HTTP
///
/// Meta plugins run on the client side during streaming. Collected metadata
/// is sent to the server via a separate POST after streaming completes.
///
/// Memory usage is O(PIPESIZE) regardless of data size.
pub fn mode(
client: &KeepClient,
@@ -29,43 +35,48 @@ pub fn mode(
) -> Result<(), anyhow::Error> {
debug!("CLIENT_SAVE: Saving item via remote server (streaming)");
if tags.is_empty() {
tags.push("none".to_string());
}
crate::modes::common::ensure_default_tag(tags);
// Determine compression type from settings
let compression_type = settings_compression_type(cmd, settings);
let server_compress = matches!(compression_type, CompressionType::None);
let compression_type_str = compression_type.to_string();
// In client mode, the client always handles compression (even "raw").
// The server should never re-compress client data.
let server_compress = false;
// Shared metadata collection: plugins write here via save_meta closure
let collected_meta: Arc<Mutex<HashMap<String, String>>> = Arc::new(Mutex::new(HashMap::new()));
let meta_collector = collected_meta.clone();
let save_meta: SaveMetaFn = Arc::new(Mutex::new(move |name: &str, value: &str| {
if let Ok(mut map) = meta_collector.lock() {
map.insert(name.to_string(), value.to_string());
}
}));
// Create MetaService and get plugins (must happen before spawning reader thread)
let meta_service = MetaService::new(save_meta);
let mut plugins = meta_service.get_plugins(cmd, settings);
// Create OS pipe for streaming compressed bytes between threads
let (pipe_reader, pipe_writer) = os_pipe::pipe()?;
// Shared state for reader thread results
let shared = Arc::new(Mutex::new((0u64, String::new())));
let shared_reader = Arc::clone(&shared);
// Reader thread: stdin → tee(stdout) → hash → compress → pipe
// Reader thread: stdin → tee(stdout) → meta plugins → compress → pipe
let compression_type_clone = compression_type.clone();
let reader_handle = std::thread::spawn(move || -> Result<(u64, String)> {
let reader_handle = std::thread::spawn(move || -> Result<u64> {
let stdin = std::io::stdin();
let stdout = std::io::stdout();
let mut stdin_lock = stdin.lock();
let mut stdout_lock = stdout.lock();
let mut hasher = Sha256::new();
let mut total_bytes = 0u64;
let mut buffer = [0u8; 8192];
// Initialize meta plugins
meta_service.initialize_plugins(&mut plugins);
// Wrap pipe writer with appropriate compression
let mut compressor: Box<dyn Write> = match compression_type_clone {
CompressionType::GZip => {
use flate2::Compression;
use flate2::write::GzEncoder;
Box::new(GzEncoder::new(pipe_writer, Compression::default()))
}
CompressionType::LZ4 => Box::new(lz4_flex::frame::FrameEncoder::new(pipe_writer)),
_ => Box::new(pipe_writer),
};
let mut compressor: Box<dyn Write> =
CompressionService::compressing_writer(Box::new(pipe_writer), &compression_type_clone)?;
loop {
let n = stdin_lock.read(&mut buffer)?;
@@ -76,40 +87,46 @@ pub fn mode(
// Tee to stdout
stdout_lock.write_all(&buffer[..n])?;
// Update hash
hasher.update(&buffer[..n]);
// Feed chunk to meta plugins
meta_service.process_chunk(&mut plugins, &buffer[..n]);
total_bytes += n as u64;
// Compress and write to pipe
compressor.write_all(&buffer[..n])?;
}
// Finalize compression (flushes any buffered compressed data)
// Finalize meta plugins (digest, text, tokens produce final output here)
meta_service.finalize_plugins(&mut plugins);
// Explicitly flush and finalize compression before dropping.
compressor.flush()?;
drop(compressor);
// Pipe writer is now dropped (inside compressor), signaling EOF to streamer
let digest = format!("{:x}", hasher.finalize());
// Set shared state for main thread
let mut shared = shared_reader.lock().unwrap();
*shared = (total_bytes, digest.clone());
Ok((total_bytes, digest))
Ok(total_bytes)
});
// Streamer thread: reads compressed bytes from pipe → POST to server
let client_url = client.base_url().to_string();
let client_username = client.username().cloned();
let client_password = client.password().cloned();
let client_jwt = client.jwt().cloned();
let tags_clone = tags.clone();
let compression_type_str_clone = compression_type_str.clone();
let streamer_handle = std::thread::spawn(move || -> Result<ItemInfo> {
let streaming_client = KeepClient::new(&client_url, client_password)?;
let streaming_client =
KeepClient::new(&client_url, client_username, client_password, client_jwt)?;
let params = [
("compress".to_string(), server_compress.to_string()),
("meta".to_string(), "false".to_string()),
("tags".to_string(), tags_clone.join(",")),
// Always send compression_type when compress=false (client handled compression)
("compression_type".to_string(), compression_type_str_clone),
];
// Filter out empty params
let params: Vec<(String, String)> =
params.into_iter().filter(|(_, v)| !v.is_empty()).collect();
let param_refs: Vec<(&str, &str)> = params
.iter()
.map(|(k, v)| (k.as_str(), v.as_str()))
@@ -126,42 +143,34 @@ pub fn mode(
.map_err(|e| anyhow::anyhow!("Streamer thread panicked: {:?}", e))??;
// Wait for reader thread (should complete quickly after pipe is drained)
reader_handle
let uncompressed_size = reader_handle
.join()
.map_err(|e| anyhow::anyhow!("Reader thread panicked: {:?}", e))??;
// Read results from shared state
let (uncompressed_size, digest) = {
let shared = shared.lock().unwrap();
shared.clone()
};
// Build local metadata and send to server
// Merge plugin-collected metadata with CLI metadata
let mut local_metadata = metadata;
local_metadata.insert("digest_sha256".to_string(), digest);
local_metadata.insert(
"uncompressed_size".to_string(),
uncompressed_size.to_string(),
);
// Add hostname
if let Ok(hostname) = gethostname::gethostname().into_string() {
local_metadata.insert("hostname".to_string(), hostname.clone());
let short = hostname.split('.').next().unwrap_or(&hostname).to_string();
local_metadata.insert("hostname_short".to_string(), short);
// Add plugin-collected metadata (digest, hostname, text stats, etc.)
if let Ok(plugin_meta) = collected_meta.lock() {
for (k, v) in plugin_meta.iter() {
local_metadata.entry(k.clone()).or_insert_with(|| v.clone());
}
}
// Send uncompressed size to server (proper field, not metadata)
client.set_item_size(item_info.id, uncompressed_size)?;
// Send metadata to server
if !local_metadata.is_empty() {
client.post_metadata(item_info.id, &local_metadata)?;
}
// Print status to stderr
// Print status to stderr (item ID is known immediately from server response)
if !settings.quiet {
if std::io::stderr().is_terminal() {
eprintln!("KEEP: New item (streaming) tags: {}", tags.join(" "));
eprintln!("KEEP: New item: {} tags: {}", item_info.id, tags.join(" "));
} else {
eprintln!("KEEP: New item (streaming) tags: {tags:?}");
eprintln!("KEEP: New item: {} tags: {tags:?}", item_info.id);
}
}

View File

@@ -2,6 +2,7 @@ use crate::client::KeepClient;
use crate::modes::common::OutputFormat;
use crate::modes::common::settings_output_format;
use clap::Command;
use comfy_table::{Attribute, Cell, Table};
use log::debug;
pub fn mode(
@@ -11,21 +12,78 @@ pub fn mode(
) -> Result<(), anyhow::Error> {
debug!("CLIENT_STATUS: Getting status from remote server");
let status = client.get_status()?;
let status_info = client.get_status()?;
let output_format = settings_output_format(settings);
match output_format {
OutputFormat::Json => {
println!("{}", serde_json::to_string_pretty(&status)?);
}
OutputFormat::Yaml => {
println!("{}", serde_yaml::to_string(&status)?);
OutputFormat::Json | OutputFormat::Yaml => {
crate::modes::common::print_serialized(&status_info, &output_format)?;
}
OutputFormat::Table => {
println!("Remote Server Status");
println!("====================");
println!("{}", serde_json::to_string_pretty(&status)?);
// Paths
let path_table =
crate::modes::common::build_path_table(&status_info.paths, &settings.table_config);
println!("PATHS:");
println!(
"{}",
crate::modes::common::trim_lines_end(&path_table.trim_fmt())
);
println!();
// Configured meta plugins
if let Some(ref configured) = status_info.configured_meta_plugins
&& !configured.is_empty()
{
let mut sorted = configured.clone();
sorted.sort_by(|a, b| a.name.cmp(&b.name));
let mut table =
crate::modes::common::create_table_with_config(&settings.table_config);
table.set_header(vec![
Cell::new("Plugin Name").add_attribute(Attribute::Bold),
Cell::new("Enabled").add_attribute(Attribute::Bold),
]);
for plugin in &sorted {
let enabled = status_info.enabled_meta_plugins.contains(&plugin.name);
table.add_row(vec![
plugin.name.clone(),
if enabled { "Yes" } else { "No" }.to_string(),
]);
}
println!("META PLUGINS:");
println!(
"{}",
crate::modes::common::trim_lines_end(&table.trim_fmt())
);
println!();
}
// Compression
if !status_info.compression.is_empty() {
let mut table =
crate::modes::common::create_table_with_config(&settings.table_config);
table.set_header(vec![
Cell::new("Type").add_attribute(Attribute::Bold),
Cell::new("Found").add_attribute(Attribute::Bold),
Cell::new("Default").add_attribute(Attribute::Bold),
Cell::new("Binary").add_attribute(Attribute::Bold),
]);
for comp in &status_info.compression {
table.add_row(vec![
comp.compression_type.clone(),
if comp.found { "Yes" } else { "No" }.to_string(),
if comp.default { "Yes" } else { "No" }.to_string(),
comp.binary.clone(),
]);
}
println!("COMPRESSION:");
println!(
"{}",
crate::modes::common::trim_lines_end(&table.trim_fmt())
);
println!();
}
}
}

102
src/modes/client/update.rs Normal file
View File

@@ -0,0 +1,102 @@
use crate::client::KeepClient;
use crate::config::Settings;
use anyhow::Result;
use clap::Command;
use log::debug;
use std::collections::HashMap;
/// Client update mode: runs meta plugins on the server for an existing item.
///
/// Sends the list of plugin names (from --meta-plugin config) and any direct
/// metadata (--meta key=value) to the server. The server reads the stored file,
/// runs the specified plugins, and stores the results.
pub fn mode(
client: &KeepClient,
cmd: &mut Command,
settings: &Settings,
ids: &mut [i64],
tags: &mut [String],
) -> Result<(), anyhow::Error> {
debug!("CLIENT_UPDATE: Updating item via remote server");
if ids.len() != 1 {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"--update requires exactly one numeric ID",
)
.exit();
}
let item_id = ids[0];
// Collect plugin names from settings (--meta-plugin config)
let plugin_names: Vec<String> = settings
.meta_plugins_names()
.into_iter()
.flat_map(|s| {
s.split(',')
.map(|p| p.trim().to_string())
.collect::<Vec<_>>()
})
.filter(|p| !p.is_empty())
.collect();
// Collect direct metadata from --meta flags
let metadata: HashMap<String, String> = settings
.meta
.iter()
.filter_map(|(k, v)| v.as_ref().map(|val| (k.clone(), val.clone())))
.collect();
// Build query params
let mut params: Vec<(String, String)> = Vec::new();
if !plugin_names.is_empty() {
params.push(("plugins".to_string(), plugin_names.join(",")));
}
if !metadata.is_empty() {
let meta_json = serde_json::to_string(&metadata)?;
params.push(("metadata".to_string(), meta_json));
}
if !tags.is_empty() {
params.push(("tags".to_string(), tags.join(",")));
}
// Nothing to update
if params.is_empty() {
if !settings.quiet {
eprintln!("KEEP: No changes specified for item {item_id}");
}
return Ok(());
}
let param_refs: Vec<(&str, &str)> = params
.iter()
.map(|(k, v)| (k.as_str(), v.as_str()))
.collect();
let url_path = format!("/api/item/{item_id}/update");
// POST to update endpoint
let _item_info = client.post_bytes(&url_path, &[], &param_refs)?;
if !settings.quiet {
let mut parts = Vec::new();
if !plugin_names.is_empty() {
parts.push(format!("plugins: {}", plugin_names.join(", ")));
}
if !metadata.is_empty() {
parts.push(format!("{} metadata", metadata.len()));
}
if !tags.is_empty() {
parts.push(format!("tags: {}", tags.join(" ")));
}
let action = parts.join(", ");
eprintln!("KEEP: Updated item {item_id} ({action})");
}
Ok(())
}

View File

@@ -1,3 +1,4 @@
use crate::common::status::PathInfo;
use crate::compression_engine::CompressionType;
/// Common utilities shared across different modes in the Keep application.
///
@@ -15,11 +16,13 @@ use crate::compression_engine::CompressionType;
/// ```
use crate::config;
use crate::meta_plugin::MetaPluginType;
use anyhow::{Result, anyhow};
use chrono::{DateTime, Utc};
use clap::Command;
use clap::error::ErrorKind;
use comfy_table::{ContentArrangement, Table};
use comfy_table::{Attribute, Cell, ContentArrangement, Table};
use log::debug;
use regex::Regex;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::env;
use std::io::IsTerminal;
@@ -52,38 +55,18 @@ pub enum OutputFormat {
Yaml,
}
/// Extracts metadata from KEEP_META_* environment variables.
///
/// Scans environment for variables prefixed with KEEP_META_ and extracts
/// key-value pairs for initial item metadata. Ignores KEEP_META_PLUGINS.
///
/// # Returns
///
/// `HashMap<String, String>` - Metadata from environment variables, with keys in uppercase without prefix.
///
/// # Errors
///
/// None; silently ignores non-matching vars and PLUGINS.
///
/// # Examples
///
/// ```ignore
/// use std::env;
/// env::set_var("KEEP_META_COMMAND", "ls -la");
/// let meta = keep::modes::common::get_meta_from_env();
/// assert_eq!(meta.get("COMMAND"), Some(&"ls -la".to_string()));
/// ```
pub const IMPORT_FORMAT_ERROR: &str =
"Unsupported import format: {} (expected .keep.tar or .meta.yml)";
pub fn get_meta_from_env() -> HashMap<String, String> {
debug!("COMMON: Getting meta from KEEP_META_*");
let re = Regex::new(r"^KEEP_META_(.+)$").unwrap();
let mut meta_env: HashMap<String, String> = HashMap::new();
const PREFIX: &str = "KEEP_META_";
for (key, value) in env::vars() {
if let Some(meta_name_caps) = re.captures(key.as_str()) {
let name = String::from(meta_name_caps.get(1).unwrap().as_str());
// Ignore KEEP_META_PLUGINS
if name != "PLUGINS" {
debug!("COMMON: Found meta: {}={}", name.clone(), value.clone());
meta_env.insert(name, value.clone());
if let Some(name) = key.strip_prefix(PREFIX) {
if !name.is_empty() && name != "PLUGINS" {
debug!("COMMON: Found meta: {}={}", name, value);
meta_env.insert(name.to_string(), value);
}
}
}
@@ -206,9 +189,10 @@ pub fn settings_meta_plugin_types(
// Try to find the MetaPluginType by meta name
let mut found = false;
for meta_plugin_type in MetaPluginType::iter() {
let meta_plugin =
crate::meta_plugin::get_meta_plugin(meta_plugin_type.clone(), None, None);
if meta_plugin.meta_type().to_string() == trimmed_name {
if let Ok(meta_plugin) =
crate::meta_plugin::get_meta_plugin(meta_plugin_type.clone(), None, None)
&& meta_plugin.meta_type().to_string() == trimmed_name
{
meta_plugin_types.push(meta_plugin_type);
found = true;
break;
@@ -336,26 +320,8 @@ pub fn trim_lines_end(s: &str) -> String {
/// let mut table = create_table(true);
/// table.add_row(vec!["Header1", "Header2"]);
/// ```
pub fn create_table(use_styling: bool) -> Table {
let mut table = Table::new();
table.set_content_arrangement(ContentArrangement::Dynamic);
if use_styling {
if std::io::stdout().is_terminal() {
table
.load_preset(comfy_table::presets::UTF8_FULL)
.apply_modifier(comfy_table::modifiers::UTF8_SOLID_INNER_BORDERS);
} else {
table.load_preset(comfy_table::presets::ASCII_FULL);
}
} else {
table.load_preset(comfy_table::presets::NOTHING);
}
if !std::io::stdout().is_terminal() {
table.force_no_tty();
}
table
pub fn create_table(_use_styling: bool) -> Table {
create_table_with_config(&crate::config::TableConfig::default())
}
/// Creates a table configured from application table settings.
@@ -446,3 +412,292 @@ pub fn create_table_with_config(table_config: &crate::config::TableConfig) -> Ta
table
}
/// Display data for a single item's detail view (used by --info).
pub struct DisplayItemInfo {
pub id: i64,
pub timestamp: String,
pub path: String,
pub stream_size: String,
pub compression: String,
pub file_size: String,
pub tags: Vec<String>,
pub metadata: Vec<(String, String)>,
}
/// Renders item detail table. Shared by local and client info modes.
pub fn render_item_info_table(info: &DisplayItemInfo, table_config: &config::TableConfig) {
use comfy_table::{Attribute, Cell};
let mut table = create_table_with_config(table_config);
table.add_row(vec![
Cell::new("ID").add_attribute(Attribute::Bold),
Cell::new(info.id.to_string()),
]);
table.add_row(vec![
Cell::new("Time").add_attribute(Attribute::Bold),
Cell::new(&info.timestamp),
]);
table.add_row(vec![
Cell::new("Size").add_attribute(Attribute::Bold),
Cell::new(&info.stream_size),
]);
table.add_row(vec![
Cell::new("Compression").add_attribute(Attribute::Bold),
Cell::new(&info.compression),
]);
table.add_row(vec![
Cell::new("Tags").add_attribute(Attribute::Bold),
Cell::new(info.tags.join(" ")),
]);
for (key, value) in &info.metadata {
table.add_row(vec![
Cell::new(format!("Meta: {key}")).add_attribute(Attribute::Bold),
Cell::new(value),
]);
}
println!("{}", trim_lines_end(&table.trim_fmt()));
}
/// Renders list table with column format from config. Shared by local and client list modes.
pub fn render_list_table_with_format(
columns: &[config::ColumnConfig],
rows: &[Vec<String>],
table_config: &config::TableConfig,
) {
let mut table = create_table_with_config(table_config);
let header_cells: Vec<Cell> = columns
.iter()
.map(|col| Cell::new(&col.label).add_attribute(Attribute::Bold))
.collect();
table.set_header(header_cells);
for row in rows {
let cells: Vec<Cell> = row
.iter()
.enumerate()
.map(|(i, val)| {
let mut cell = Cell::new(val);
if let Some(col) = columns.get(i) {
if let Some(ref fg) = col.fg_color {
cell = apply_color(cell, fg, true);
}
if let Some(ref bg) = col.bg_color {
cell = apply_color(cell, bg, false);
}
for attr in &col.attributes {
cell = apply_table_attribute(cell, attr);
}
}
cell
})
.collect();
table.add_row(cells);
}
println!("{}", trim_lines_end(&table.trim_fmt()));
}
/// Applies config TableColor to a comfy-table Cell.
pub fn apply_color(mut cell: Cell, color: &config::TableColor, is_foreground: bool) -> Cell {
use comfy_table::Color;
let comfy_color = match color {
config::TableColor::Black => Color::Black,
config::TableColor::Red => Color::Red,
config::TableColor::Green => Color::Green,
config::TableColor::Yellow => Color::Yellow,
config::TableColor::Blue => Color::Blue,
config::TableColor::Magenta => Color::Magenta,
config::TableColor::Cyan => Color::Cyan,
config::TableColor::White => Color::White,
config::TableColor::Gray => Color::Grey,
config::TableColor::DarkRed => Color::DarkRed,
config::TableColor::DarkGreen => Color::DarkGreen,
config::TableColor::DarkYellow => Color::DarkYellow,
config::TableColor::DarkBlue => Color::DarkBlue,
config::TableColor::DarkMagenta => Color::DarkMagenta,
config::TableColor::DarkCyan => Color::DarkCyan,
config::TableColor::Rgb(r, g, b) => Color::Rgb {
r: *r,
g: *g,
b: *b,
},
};
if is_foreground {
cell = cell.fg(comfy_color);
} else {
cell = cell.bg(comfy_color);
}
cell
}
/// Ensures tags has at least one entry, adding "none" if empty.
pub fn ensure_default_tag(tags: &mut Vec<String>) {
if tags.is_empty() {
tags.push("none".to_string());
}
}
/// Prints a serializable value in JSON or YAML format based on output format.
///
/// Only handles Json and Yaml variants; Table should be handled separately.
pub fn print_serialized<T: serde::Serialize>(
value: &T,
format: &OutputFormat,
) -> anyhow::Result<()> {
match format {
OutputFormat::Json => println!("{}", serde_json::to_string_pretty(value)?),
OutputFormat::Yaml => println!("{}", serde_yaml::to_string(value)?),
OutputFormat::Table => unreachable!(),
}
Ok(())
}
/// Applies config TableAttribute to a comfy-table Cell.
pub fn apply_table_attribute(mut cell: Cell, attribute: &config::TableAttribute) -> Cell {
match attribute {
config::TableAttribute::Bold => cell = cell.add_attribute(Attribute::Bold),
config::TableAttribute::Dim => cell = cell.add_attribute(Attribute::Dim),
config::TableAttribute::Italic => cell = cell.add_attribute(Attribute::Italic),
config::TableAttribute::Underlined => cell = cell.add_attribute(Attribute::Underlined),
config::TableAttribute::SlowBlink => cell = cell.add_attribute(Attribute::SlowBlink),
config::TableAttribute::RapidBlink => cell = cell.add_attribute(Attribute::RapidBlink),
config::TableAttribute::Reverse => cell = cell.add_attribute(Attribute::Reverse),
config::TableAttribute::Hidden => cell = cell.add_attribute(Attribute::Hidden),
config::TableAttribute::CrossedOut => cell = cell.add_attribute(Attribute::CrossedOut),
}
cell
}
/// Builds a table showing data and database path information.
pub fn build_path_table(path_info: &PathInfo, table_config: &config::TableConfig) -> Table {
let mut path_table = create_table_with_config(table_config);
path_table.set_header(vec![
Cell::new("Type").add_attribute(Attribute::Bold),
Cell::new("Path").add_attribute(Attribute::Bold),
]);
path_table.add_row(vec!["Data", &path_info.data]);
path_table.add_row(vec!["Database", &path_info.database]);
path_table
}
/// Sanitize tags for use in filenames.
///
/// Replaces non-alphanumeric characters with underscores and joins with `_`.
/// Empty tags are filtered out to avoid double underscores.
pub fn sanitize_tags(tags: &[String]) -> String {
tags.iter()
.filter(|t| !t.is_empty())
.map(|t| {
t.chars()
.map(|c| if c.is_alphanumeric() { c } else { '_' })
.collect::<String>()
})
.collect::<Vec<_>>()
.join("_")
}
/// Metadata structure for export to YAML. Shared by local and client export modes.
#[derive(Debug, Serialize)]
pub struct ExportMeta {
pub ts: DateTime<Utc>,
pub compression: String,
pub uncompressed_size: Option<i64>,
pub tags: Vec<String>,
pub metadata: HashMap<String, String>,
}
/// Metadata structure for import from YAML. Shared by local and client import modes.
#[derive(Debug, Deserialize)]
pub struct ImportMeta {
pub ts: DateTime<Utc>,
pub compression: String,
#[serde(default, alias = "size")]
pub uncompressed_size: Option<i64>,
#[serde(default)]
pub tags: Vec<String>,
#[serde(default)]
pub metadata: HashMap<String, String>,
}
/// Resolve a single item ID from explicit IDs, tags, or latest item.
///
/// Returns the first ID if provided, the newest item matching tags,
/// or the newest item overall if neither is specified.
#[cfg(feature = "client")]
pub fn resolve_item_id(
client: &crate::client::KeepClient,
ids: &[i64],
tags: &[String],
) -> Result<i64> {
if !ids.is_empty() {
Ok(ids[0])
} else if !tags.is_empty() {
let items = client.list_items(&[], tags, "newest", 0, 1, &HashMap::new())?;
if items.is_empty() {
return Err(anyhow!("No items found matching tags: {:?}", tags));
}
Ok(items[0].id)
} else {
let items = client.list_items(&[], &[], "newest", 0, 1, &HashMap::new())?;
if items.is_empty() {
return Err(anyhow!("No items found"));
}
Ok(items[0].id)
}
}
/// Resolve item IDs from explicit IDs or tags (multi-item variant).
#[cfg(feature = "client")]
pub fn resolve_item_ids(
client: &crate::client::KeepClient,
ids: &[i64],
tags: &[String],
) -> Result<Vec<i64>> {
if !ids.is_empty() {
Ok(ids.to_vec())
} else if !tags.is_empty() {
let items = client.list_items(&[], tags, "newest", 0, 0, &HashMap::new())?;
if items.is_empty() {
return Err(anyhow!("No items found matching tags: {:?}", tags));
}
Ok(items.into_iter().map(|i| i.id).collect())
} else {
let items = client.list_items(&[], &[], "newest", 0, 1, &HashMap::new())?;
if items.is_empty() {
return Err(anyhow!("No items found"));
}
Ok(vec![items[0].id])
}
}
/// Check if binary content should be blocked from TTY output.
///
/// Uses metadata `text` field as fast path, then falls back to byte sampling.
/// Returns Err if content is binary and should not be displayed.
pub fn check_binary_tty(
metadata: &HashMap<String, String>,
data_sample: &[u8],
force: bool,
) -> Result<()> {
if force || !std::io::stdout().is_terminal() {
return Ok(());
}
if crate::common::is_binary::is_content_binary_from_metadata(metadata, data_sample) {
return Err(anyhow!(
"Refusing to output binary data to TTY, use --force to override"
));
}
Ok(())
}

View File

@@ -1,12 +1,19 @@
use crate::config;
use crate::services::item_service::ItemService;
/// Diff mode implementation.
///
/// This module provides functionality for comparing two items and displaying their
/// differences using external diff tools.
use anyhow::{Context, Result};
/// differences using external diff tools. Decompressed content is streamed to diff
/// via pipes and /dev/fd file descriptors — no temporary files are created.
use crate::config;
use crate::services::compression_service::CompressionService;
use crate::services::item_service::ItemService;
use anyhow::{Context, Result, anyhow};
use clap::Command;
use command_fds::{CommandFdExt, FdMapping};
use log::debug;
use nix::fcntl::OFlag;
use nix::unistd::pipe2;
use std::io::Read;
use std::os::unix::io::{AsRawFd, OwnedFd};
fn validate_diff_args(_cmd: &mut Command, ids: &[i64], tags: &[String]) -> anyhow::Result<()> {
if !tags.is_empty() {
@@ -23,19 +30,6 @@ fn validate_diff_args(_cmd: &mut Command, ids: &[i64], tags: &[String]) -> anyho
}
/// Fetches and validates items from the database for diff operation.
///
/// This function retrieves two items by their IDs from the database using the
/// item service, which handles validation, and returns them as a tuple.
///
/// # Arguments
///
/// * `conn` - Mutable reference to the database connection.
/// * `ids` - Vector of item IDs to fetch.
/// * `item_service` - Reference to the item service for validation.
///
/// # Returns
///
/// * `Result<(ItemWithMeta, ItemWithMeta)>` - Tuple of items with metadata or error.
fn fetch_and_validate_items(
conn: &mut rusqlite::Connection,
ids: &[i64],
@@ -44,7 +38,6 @@ fn fetch_and_validate_items(
crate::services::types::ItemWithMeta,
crate::services::types::ItemWithMeta,
)> {
// Fetch items using the service, which handles validation
let item_a = item_service
.get_item(conn, ids[0])
.with_context(|| format!("Unable to find first item (ID: {}) in database", ids[0]))?;
@@ -52,48 +45,12 @@ fn fetch_and_validate_items(
.get_item(conn, ids[1])
.with_context(|| format!("Unable to find second item (ID: {}) in database", ids[1]))?;
debug!("MAIN: Found item A {:?}", item_a.item);
debug!("MAIN: Found item B {:?}", item_b.item);
debug!("DIFF: Found item A {:?}", item_a.item);
debug!("DIFF: Found item B {:?}", item_b.item);
Ok((item_a, item_b))
}
/// Sets up file paths and compression for diff operation.
///
/// This function constructs the file paths for the two items and prepares the
/// compression engines needed for reading their contents.
///
/// # Arguments
///
/// * `item_service` - Reference to the item service.
/// * `item_a` - First item with metadata.
/// * `item_b` - Second item with metadata.
///
/// # Returns
///
/// * `Result<(PathBuf, PathBuf)>` - Tuple of item file paths or error.
fn setup_diff_paths_and_compression(
item_service: &ItemService,
item_a: &crate::services::types::ItemWithMeta,
item_b: &crate::services::types::ItemWithMeta,
) -> Result<(std::path::PathBuf, std::path::PathBuf)> {
let item_a_id = item_a
.item
.id
.ok_or_else(|| anyhow::anyhow!("Item A missing ID"))?;
let item_b_id = item_b
.item
.id
.ok_or_else(|| anyhow::anyhow!("Item B missing ID"))?;
// Use the service's data path to construct proper file paths
let data_path = item_service.get_data_path();
let item_a_path = data_path.join(item_a_id.to_string());
let item_b_path = data_path.join(item_b_id.to_string());
Ok((item_a_path, item_b_path))
}
pub fn mode_diff(
cmd: &mut Command,
args: &crate::args::Args,
@@ -125,51 +82,119 @@ pub fn mode_diff(
validate_diff_args(cmd, &ids, &tags)?;
let settings = crate::config::Settings::new(args, crate::config::Settings::default_dir()?)?;
let item_service = crate::services::item_service::ItemService::new(settings.dir.clone());
let settings = config::Settings::new(args, config::Settings::default_dir()?)?;
let item_service = ItemService::new(settings.dir.clone());
let (item_a, item_b) = fetch_and_validate_items(conn, &ids, &item_service)?;
let (path_a, path_b) = setup_diff_paths_and_compression(&item_service, &item_a, &item_b)?;
run_external_diff(&path_a, &path_b)?;
Ok(())
run_external_diff(&item_service, &item_a, &item_b)
}
/// Runs external diff command to compare two files.
/// Creates a pipe with CLOEXEC set atomically, returns (read_fd, write_fd).
fn create_pipe() -> Result<(OwnedFd, OwnedFd)> {
pipe2(OFlag::O_CLOEXEC).context("Failed to create pipe")
}
/// Streams decompressed item content through a pipe fd.
///
/// Uses the system's `diff` command to generate a unified diff output.
/// Returns an error if the diff command is not found.
/// Returns a JoinHandle for the writer thread. The thread writes decompressed
/// data to write_fd and closes it when done (causing EOF for the reader).
fn spawn_writer_thread(
item_service: &ItemService,
item: &crate::services::types::ItemWithMeta,
write_fd: OwnedFd,
) -> std::thread::JoinHandle<Result<()>> {
let data_path = item_service.get_data_path().clone();
let id = match item.item.id {
Some(id) => id,
None => return std::thread::spawn(|| Err(anyhow!("item missing ID"))),
};
let compression = item.item.compression.clone();
let mut item_path = data_path;
item_path.push(id.to_string());
std::thread::spawn(move || -> Result<()> {
let compression_service = CompressionService::new();
let mut reader = compression_service
.stream_item_content(item_path, &compression)
.map_err(|e| anyhow::anyhow!("Failed to stream item {id}: {e}"))?;
// Convert OwnedFd to File — safe, takes ownership, closes on drop
let mut writer = std::fs::File::from(write_fd);
crate::common::stream_copy(&mut reader, |chunk| {
use std::io::Write;
writer.write_all(chunk)
})
.map_err(|e| anyhow::anyhow!("Error reading item {id}: {e}"))?;
// writer dropped here, closing write_fd → diff sees EOF
Ok(())
})
}
/// Runs external diff command, streaming decompressed content via /dev/fd pipes.
///
/// # Arguments
///
/// * `path_a` - Path to the first file.
/// * `path_b` - Path to the second file.
///
/// # Returns
///
/// * `Result<()>` - Success or error.
fn run_external_diff(path_a: &std::path::Path, path_b: &std::path::Path) -> anyhow::Result<()> {
/// Creates two pipes, spawns writer threads to decompress each item into its pipe,
/// and runs `diff -u /dev/fd/N /dev/fd/M` where N and M are the pipe read fds.
/// The `command-fds` crate handles CLOEXEC clearing safely — no unsafe needed.
fn run_external_diff(
item_service: &ItemService,
item_a: &crate::services::types::ItemWithMeta,
item_b: &crate::services::types::ItemWithMeta,
) -> Result<()> {
if which::which_global("diff").is_err() {
return Err(anyhow::anyhow!(
"diff command not found. Please install diffutils."
));
}
let mut child = std::process::Command::new("diff")
let (read_fd_a, write_fd_a) = create_pipe()?;
let (read_fd_b, write_fd_b) = create_pipe()?;
// Spawn writer threads — they take ownership of write fds and close them on exit
let writer_a = spawn_writer_thread(item_service, item_a, write_fd_a);
let writer_b = spawn_writer_thread(item_service, item_b, write_fd_b);
// Get fd numbers for /dev/fd paths (borrows, does not consume)
let raw_read_a = read_fd_a.as_raw_fd();
let raw_read_b = read_fd_b.as_raw_fd();
debug!("DIFF: pipe fds: a(r={raw_read_a}) b(r={raw_read_b})");
// Spawn diff with /dev/fd/N paths. command-fds handles CLOEXEC clearing
// and fd inheritance safely — the fds are released from OwnedFd to the
// child process. If spawn fails, the OwnedFd values in FdMapping are
// dropped and the fds are properly closed.
let mut command = std::process::Command::new("diff");
command
.arg("-u")
.arg(path_a)
.arg(path_b)
.arg(format!("/dev/fd/{raw_read_a}"))
.arg(format!("/dev/fd/{raw_read_b}"))
.stdout(std::process::Stdio::inherit())
.stderr(std::process::Stdio::inherit())
.spawn()
.context("Failed to spawn diff command")?;
.stdin(std::process::Stdio::null())
.fd_mappings(vec![
FdMapping {
parent_fd: read_fd_a,
child_fd: raw_read_a,
},
FdMapping {
parent_fd: read_fd_b,
child_fd: raw_read_b,
},
])
.map_err(|e| anyhow::anyhow!("FD mapping collision: {e}"))?;
let mut child = command.spawn().context("Failed to spawn diff command")?;
let status = child.wait().context("Failed to wait for diff command")?;
// diff returns 0 if files are identical, 1 if different, 2 on error
// Join writer threads and propagate errors
writer_a
.join()
.map_err(|e| anyhow::anyhow!("Writer A panicked: {e:?}"))??;
writer_b
.join()
.map_err(|e| anyhow::anyhow!("Writer B panicked: {e:?}"))??;
// diff returns 0 if identical, 1 if different, 2 on error
if status.code() == Some(2) {
Err(anyhow::anyhow!("diff command failed with an error"))
} else {

145
src/modes/export.rs Normal file
View File

@@ -0,0 +1,145 @@
use anyhow::{Context, Result, anyhow};
use chrono::Utc;
use clap::Command;
use log::debug;
use std::collections::HashMap;
use std::fs;
use std::path::PathBuf;
use crate::common::sanitize_ts_string;
use crate::config;
use crate::export_tar;
use crate::filter_plugin::FilterChain;
use crate::modes::common::sanitize_tags;
use crate::services::item_service::ItemService;
use crate::services::types::ItemWithMeta;
/// Export items to a `.keep.tar` archive.
///
/// Requires either IDs or tags (mutually exclusive). If IDs are given,
/// ALL must exist. Archives contain per-item data and metadata files.
pub fn mode_export(
cmd: &mut Command,
settings: &config::Settings,
ids: &[i64],
tags: &[String],
conn: &mut rusqlite::Connection,
data_path: PathBuf,
filter_chain: Option<FilterChain>,
) -> Result<()> {
// Validate: IDs XOR tags
if !ids.is_empty() && !tags.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Cannot use both IDs and tags with --export",
)
.exit();
}
if ids.is_empty() && tags.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Must provide either IDs or tags with --export",
)
.exit();
}
let item_service = ItemService::new(data_path.clone());
let meta_filter = settings.meta_filter();
// Resolve items
let items: Vec<ItemWithMeta> = if !ids.is_empty() {
// Fetch each ID individually; ALL must exist
let mut result = Vec::new();
for &id in ids {
match item_service.get_item(conn, id) {
Ok(item) => result.push(item),
Err(_) => {
cmd.error(
clap::error::ErrorKind::InvalidValue,
format!("Item {id} not found"),
)
.exit();
}
}
}
result
} else {
// Search by tags
item_service
.list_items(conn, tags, &meta_filter)
.map_err(|e| anyhow!("Unable to find matching items: {}", e))?
};
if items.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"No items found matching the given criteria",
)
.exit();
}
// Validate: --export-filename-format doesn't use per-item vars with multiple items
if items.len() > 1 {
let fmt = &settings.export_filename_format;
if fmt.contains("{id}") || fmt.contains("{tags}") || fmt.contains("{compression}") {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"Cannot use {id}, {tags}, or {compression} in --export-filename-format when exporting multiple items",
)
.exit();
}
}
// Compute export name
let dir_name = export_tar::export_name(&settings.export_name, &items);
// Compute tar filename from format template
let now = Utc::now();
let ts_str = sanitize_ts_string(&now.format("%Y-%m-%dT%H:%M:%SZ").to_string());
let mut vars = HashMap::new();
vars.insert("name".to_string(), dir_name.clone());
vars.insert("ts".to_string(), ts_str.clone());
// For single-item exports, also provide per-item vars
if items.len() == 1 {
let item = &items[0];
let item_id = item.item.id.context("Item missing ID")?;
let item_tags = item.tag_names();
vars.insert("id".to_string(), item_id.to_string());
vars.insert("tags".to_string(), sanitize_tags(&item_tags));
vars.insert("compression".to_string(), item.item.compression.clone());
}
let basename = strfmt::strfmt(&settings.export_filename_format, &vars).map_err(|e| {
anyhow!(
"Invalid export filename format '{}': {}",
settings.export_filename_format,
e
)
})?;
let tar_filename = format!("{basename}.keep.tar");
// Write the tar archive
let tar_file = fs::File::create(&tar_filename)
.with_context(|| format!("Cannot create tar file: {tar_filename}"))?;
export_tar::write_export_tar(
tar_file,
&dir_name,
&items,
&data_path,
filter_chain.as_ref(),
&item_service,
conn,
)?;
if !settings.quiet {
eprintln!("{tar_filename}");
}
debug!("EXPORT: Wrote {} items to {tar_filename}", items.len());
Ok(())
}

View File

@@ -1,80 +1,17 @@
use crate::meta_plugin::MetaPlugin;
use anyhow::Result;
use clap::Command;
use serde::{Deserialize, Serialize};
use serde_yaml;
use std::collections::HashMap;
use strum::IntoEnumIterator;
/// Mode for generating a default configuration file.
///
/// This module creates a commented YAML template with default values for settings,
/// including list format, server config, compression, and meta plugins.
#[derive(Debug, Serialize, Deserialize)]
/// Default configuration structure for the generated template.
///
/// Includes core settings, list formatting, server options, compression, and meta plugins.
struct DefaultConfig {
dir: Option<String>,
list_format: Vec<ColumnConfig>,
human_readable: bool,
output_format: Option<String>,
quiet: bool,
force: bool,
server: Option<ServerConfig>,
compression_plugin: Option<CompressionPluginConfig>,
meta_plugins: Option<Vec<MetaPluginConfig>>,
}
#[derive(Debug, Serialize, Deserialize)]
/// Configuration for a column in the list format.
struct ColumnConfig {
name: String,
label: Option<String>,
#[serde(default)]
align: ColumnAlignment,
#[serde(default)]
max_len: Option<String>,
}
#[derive(Debug, Serialize, Deserialize, Default)]
#[serde(rename_all = "lowercase")]
/// Alignment options for table columns.
enum ColumnAlignment {
#[default]
Left,
Right,
}
#[derive(Debug, Serialize, Deserialize)]
/// Server configuration options.
struct ServerConfig {
address: Option<String>,
port: Option<u16>,
password_file: Option<String>,
password: Option<String>,
password_hash: Option<String>,
}
#[derive(Debug, Serialize, Deserialize)]
/// Configuration for the compression plugin.
struct CompressionPluginConfig {
name: String,
}
#[derive(Debug, Serialize, Deserialize)]
/// Configuration for a meta plugin.
struct MetaPluginConfig {
name: String,
#[serde(default)]
options: std::collections::HashMap<String, serde_yaml::Value>,
#[serde(default)]
outputs: std::collections::HashMap<String, String>,
}
use crate::common::schema::{gather_filter_plugin_schemas, gather_meta_plugin_schemas};
use crate::compression_engine::CompressionType;
use crate::config;
/// Generates and prints a default commented YAML configuration template.
///
/// Creates instances of available meta plugins to populate default options and outputs,
/// then serializes the config to YAML with all lines commented for easy editing.
/// Discovers all registered meta plugins, filter plugins, and compression engines
/// at runtime via the plugin schema system. Outputs a commented YAML template
/// with all available plugins and their default options/outputs.
///
/// # Arguments
///
@@ -84,152 +21,244 @@ struct MetaPluginConfig {
/// # Returns
///
/// `Ok(())` on success.
///
/// # Examples
///
/// ```ignore
/// // Example usage requires Command and Settings instances
/// mode_generate_config(&mut cmd, &settings)?;
/// ```
pub fn mode_generate_config(_cmd: &mut Command, _settings: &crate::config::Settings) -> Result<()> {
// Create instances of each meta plugin to get their default options and outputs
let cwd_plugin = crate::meta_plugin::cwd::CwdMetaPlugin::new(None, None);
let digest_plugin = crate::meta_plugin::digest::DigestMetaPlugin::new(None, None);
let hostname_plugin = crate::meta_plugin::hostname::HostnameMetaPlugin::new(None, None);
#[cfg(feature = "magic")]
let magic_file_plugin = crate::meta_plugin::magic_file::MagicFileMetaPlugin::new(None, None);
let env_plugin = crate::meta_plugin::env::EnvMetaPlugin::new(None, None);
let meta_schemas = gather_meta_plugin_schemas();
let filter_schemas = gather_filter_plugin_schemas();
// Create a default configuration
let default_config = DefaultConfig {
dir: Some("~/.local/share/keep".to_string()),
list_format: vec![
ColumnConfig {
name: "id".to_string(),
label: Some("Item".to_string()),
align: ColumnAlignment::Right,
max_len: None,
},
ColumnConfig {
name: "time".to_string(),
label: Some("Time".to_string()),
align: ColumnAlignment::Right,
max_len: None,
},
ColumnConfig {
name: "size".to_string(),
label: Some("Size".to_string()),
align: ColumnAlignment::Right,
max_len: None,
},
ColumnConfig {
name: "tags".to_string(),
label: Some("Tags".to_string()),
align: ColumnAlignment::Left,
max_len: Some("40".to_string()),
},
ColumnConfig {
name: "meta:hostname_full".to_string(),
label: Some("Hostname".to_string()),
align: ColumnAlignment::Left,
max_len: Some("28".to_string()),
},
],
human_readable: false,
output_format: Some("table".to_string()),
quiet: false,
force: false,
server: Some(ServerConfig {
address: Some("127.0.0.1".to_string()),
port: Some(8080),
password_file: None,
password: None,
password_hash: None,
}),
compression_plugin: None,
meta_plugins: Some(vec![
MetaPluginConfig {
name: "cwd".to_string(),
options: cwd_plugin.options().clone(),
outputs: convert_outputs_to_string_map(cwd_plugin.outputs()),
},
MetaPluginConfig {
name: "digest".to_string(),
options: digest_plugin.options().clone(),
outputs: convert_outputs_to_string_map(digest_plugin.outputs()),
},
MetaPluginConfig {
name: "hostname".to_string(),
options: hostname_plugin.options().clone(),
outputs: convert_outputs_to_string_map(hostname_plugin.outputs()),
},
#[cfg(feature = "magic")]
MetaPluginConfig {
name: "magic_file".to_string(),
options: magic_file_plugin.options().clone(),
outputs: convert_outputs_to_string_map(magic_file_plugin.outputs()),
},
MetaPluginConfig {
name: "env".to_string(),
options: env_plugin.options().clone(),
outputs: convert_outputs_to_string_map(env_plugin.outputs()),
},
]),
};
// Build list_format defaults matching config.rs
let list_format = default_list_format();
// Serialize to YAML and comment out all lines
let yaml = serde_yaml::to_string(&default_config)?;
// Build meta_plugins with env as the default (active), rest commented
let meta_plugins = build_meta_plugins_section(&meta_schemas);
// Comment out every line
let commented_yaml = yaml
.lines()
.map(|line| {
if line.trim().is_empty() {
line.to_string()
} else {
format!("# {line}")
// Build the full YAML
let mut lines = Vec::with_capacity(128);
lines.push("# Keep configuration file".to_string());
lines.push("# Uncomment and modify the settings you need.".to_string());
lines.push(String::new());
// Core settings
lines.push("# Data directory for storing items".to_string());
lines.push("dir: ~/.local/share/keep".to_string());
lines.push(String::new());
// List format
lines.push("# Column configuration for --list output".to_string());
lines.push("list_format:".to_string());
for col in &list_format {
lines.push(format!(" - name: {}", col.name));
lines.push(format!(" label: {}", col.label));
lines.push(format!(" align: {}", col.align));
}
})
.collect::<Vec<String>>()
.join("\n");
lines.push(String::new());
println!("{commented_yaml}");
// Table config
lines.push("# Table display configuration".to_string());
lines.push("#table_config:".to_string());
lines.push("# style: nothing".to_string());
lines.push("# modifiers: []".to_string());
lines.push("# content_arrangement: dynamic".to_string());
lines.push("# truncination_indicator: \"\"".to_string());
lines.push(String::new());
// Other settings
lines.push("human_readable: false".to_string());
lines.push("output_format: table".to_string());
lines.push("quiet: false".to_string());
lines.push("force: false".to_string());
lines.push(String::new());
// Server config
lines.push("# Server configuration (only used with --server)".to_string());
lines.push("server:".to_string());
lines.push(" address: 127.0.0.1".to_string());
lines.push(" port: 8080".to_string());
lines.push("# username: keep".to_string());
lines.push("# password: null".to_string());
lines.push("# password_file: null".to_string());
lines.push("# password_hash: null".to_string());
lines.push("# jwt_secret: null".to_string());
lines.push("# jwt_secret_file: null".to_string());
lines.push("# cert_file: null".to_string());
lines.push("# key_file: null".to_string());
lines.push("# cors_origin: null".to_string());
lines.push(String::new());
// Compression plugin
lines.push("# Compression plugin to use".to_string());
lines.push("#compression_plugin:".to_string());
let mut comp_types: Vec<String> = CompressionType::iter().map(|ct| ct.to_string()).collect();
comp_types.sort();
for ct in &comp_types {
lines.push(format!("# name: {ct} # {}", compression_description(ct)));
}
lines.push(String::new());
// Meta plugins
lines.push("# Meta plugins to run when saving items".to_string());
lines.push("meta_plugins:".to_string());
for line in &meta_plugins {
lines.push(line.clone());
}
lines.push(String::new());
// Filter plugins reference
if !filter_schemas.is_empty() {
lines.push("# Available filter plugins (use with --filter)".to_string());
for schema in &filter_schemas {
lines.push(format!("# {}", schema.name));
if !schema.description.is_empty() {
lines.push(format!("# {}", schema.description));
}
for opt in &schema.options {
let req = if opt.required { "required" } else { "optional" };
lines.push(format!(
"# {} ({:?}, {})",
opt.name, opt.option_type, req
));
}
}
lines.push(String::new());
}
// Client config
lines.push("# Client configuration (requires client feature)".to_string());
lines.push("#client:".to_string());
lines.push("# url: null".to_string());
lines.push("# username: null".to_string());
lines.push("# password: null".to_string());
lines.push("# jwt: null".to_string());
// Print
for line in &lines {
println!("{line}");
}
Ok(())
}
/// Helper function to convert outputs from serde_yaml::Value to String.
///
/// Handles null (uses key), strings, and other values by serializing to YAML string.
///
/// # Arguments
///
/// * `outputs` - Reference to the outputs HashMap.
///
/// # Returns
///
/// A HashMap with string keys and values.
fn convert_outputs_to_string_map(
outputs: &std::collections::HashMap<String, serde_yaml::Value>,
) -> std::collections::HashMap<String, String> {
let mut result = std::collections::HashMap::new();
for (key, value) in outputs {
match value {
serde_yaml::Value::Null => {
// For null, use the key as the value
result.insert(key.clone(), key.clone());
}
serde_yaml::Value::String(s) => {
result.insert(key.clone(), s.clone());
}
_ => {
// Convert other values to their YAML string representation
result.insert(
key.clone(),
serde_yaml::to_string(value).unwrap_or_default(),
);
}
}
}
result
struct ListColumn {
name: String,
label: String,
align: String,
}
fn default_list_format() -> Vec<ListColumn> {
vec![
ListColumn {
name: "id".into(),
label: "Item".into(),
align: "right".into(),
},
ListColumn {
name: "time".into(),
label: "Time".into(),
align: "right".into(),
},
ListColumn {
name: "size".into(),
label: "Size".into(),
align: "right".into(),
},
ListColumn {
name: "meta:text_line_count".into(),
label: "Lines".into(),
align: "right".into(),
},
ListColumn {
name: "tags".into(),
label: "Tags".into(),
align: "left".into(),
},
ListColumn {
name: "meta:hostname_short".into(),
label: "Host".into(),
align: "left".into(),
},
ListColumn {
name: "meta:command".into(),
label: "Command".into(),
align: "left".into(),
},
]
}
fn build_meta_plugins_section(schemas: &[crate::common::schema::PluginSchema]) -> Vec<String> {
let mut lines = Vec::new();
for (i, schema) in schemas.iter().enumerate() {
let is_default = schema.name == "env";
let prefix = if is_default { "" } else { "# " };
if i > 0 {
lines.push(format!("{prefix}# --- {name} ---", name = schema.name));
}
lines.push(format!("{prefix}- name: {}", schema.name));
// Options
if !schema.options.is_empty() {
lines.push(format!("{prefix} options:"));
for opt in &schema.options {
if let Some(ref default) = opt.default {
let default_str = format_yaml_value(default);
lines.push(format!("{prefix} {}: {}", opt.name, default_str));
} else if opt.required {
lines.push(format!("{prefix} {}: null # required", opt.name));
}
}
} else {
lines.push(format!("{prefix} options: {{}}"));
}
// Outputs
if !schema.outputs.is_empty() {
lines.push(format!("{prefix} outputs:"));
for output in &schema.outputs {
lines.push(format!("{prefix} {}: {}", output.name, output.name));
}
} else {
lines.push(format!("{prefix} outputs: {{}}"));
}
}
lines
}
fn format_yaml_value(value: &serde_yaml::Value) -> String {
match value {
serde_yaml::Value::Null => "null".into(),
serde_yaml::Value::Bool(b) => b.to_string(),
serde_yaml::Value::Number(n) => n.to_string(),
serde_yaml::Value::String(s) => {
if s.contains(' ') || s.contains(':') || s.contains('#') {
format!("\"{s}\"")
} else {
s.clone()
}
}
serde_yaml::Value::Sequence(_) | serde_yaml::Value::Mapping(_) => {
serde_yaml::to_string(value)
.unwrap_or_default()
.trim()
.to_string()
}
serde_yaml::Value::Tagged(_) => serde_yaml::to_string(value)
.unwrap_or_default()
.trim()
.to_string(),
}
}
fn compression_description(name: &str) -> &str {
match name {
"lz4" => "Fast compression (native)",
"gzip" => "Good compression ratio (native)",
"bzip2" => "High compression (requires bzip2 binary)",
"xz" => "Very high compression (requires xz binary)",
"zstd" => "Modern fast compression (requires zstd binary)",
"raw" => "No compression (alias: none)",
_ => "",
}
}

View File

@@ -1,4 +1,4 @@
use anyhow::{Result, anyhow};
use anyhow::{Context, Result, anyhow};
use std::io::Write;
use crate::common::PIPESIZE;
@@ -52,10 +52,10 @@ pub fn mode_get(
let item_service = ItemService::new(data_path.clone());
let item_with_meta = item_service
.find_item(conn, ids, tags, &std::collections::HashMap::new())
.find_item(conn, ids, tags, &settings.meta_filter())
.map_err(|e| anyhow!("Unable to find matching item in database: {}", e))?;
let item_id = item_with_meta.item.id.unwrap();
let item_id = item_with_meta.item.id.context("Item missing ID")?;
// Determine if we should detect binary data
let mut detect_binary = !settings.force && std::io::stdout().is_terminal();
@@ -103,13 +103,9 @@ pub fn mode_get(
fn stream_to_stdout(mut reader: Box<dyn Read + Send>) -> Result<()> {
let mut stdout = std::io::stdout();
let mut buffer = [0; PIPESIZE];
loop {
let bytes_read = reader.read(&mut buffer)?;
if bytes_read == 0 {
break;
}
stdout.write_all(&buffer[..bytes_read])?;
}
crate::common::stream_copy(&mut reader, |chunk| {
stdout.write_all(chunk)?;
Ok(())
})?;
Ok(())
}

192
src/modes/import.rs Normal file
View File

@@ -0,0 +1,192 @@
use anyhow::{Context, Result, anyhow};
use chrono::{DateTime, Utc};
use clap::Command;
use log::debug;
use std::collections::HashMap;
use std::fs;
use std::io::{Read, Write};
use std::path::PathBuf;
use std::str::FromStr;
use crate::common::PIPESIZE;
use crate::compression_engine::CompressionType;
use crate::config;
use crate::db;
use crate::import_tar;
use crate::modes::common::ImportMeta;
/// Import items from a `.keep.tar` archive or legacy `.meta.yml` file.
///
/// For `.keep.tar` files, all items are imported in their original ID order,
/// each receiving a new auto-incremented ID from the database.
/// For `.meta.yml` files, the legacy single-item import is used.
pub fn mode_import(
cmd: &mut Command,
settings: &config::Settings,
import_path: &str,
conn: &mut rusqlite::Connection,
data_path: PathBuf,
) -> Result<()> {
let path = PathBuf::from(import_path);
if import_path.ends_with(".keep.tar") {
// New tar-based import
let imported_ids = import_tar::import_from_tar(&path, conn, &data_path)?;
if !settings.quiet {
println!(
"KEEP: Imported {} item(s): {:?}",
imported_ids.len(),
imported_ids
);
}
debug!(
"IMPORT: Imported {} items from {}",
imported_ids.len(),
import_path
);
} else if import_path.ends_with(".meta.yml") {
// Legacy single-item import
import_legacy(cmd, settings, import_path, conn, data_path)?;
} else {
cmd.error(
clap::error::ErrorKind::InvalidValue,
format!("Unsupported import format: {}", import_path),
)
.exit();
}
Ok(())
}
/// Legacy single-item import from a `.meta.yml` file.
fn import_legacy(
cmd: &mut Command,
settings: &config::Settings,
meta_file: &str,
conn: &mut rusqlite::Connection,
data_path: PathBuf,
) -> Result<()> {
// Read metadata
let meta_yaml = fs::read_to_string(meta_file)
.with_context(|| format!("Cannot read metadata file: {meta_file}"))?;
let import_meta: ImportMeta = serde_yaml::from_str(&meta_yaml)
.with_context(|| format!("Cannot parse metadata file: {meta_file}"))?;
// Validate compression type
CompressionType::from_str(&import_meta.compression).map_err(|_| {
anyhow!(
"Invalid compression type '{}' in metadata file",
import_meta.compression
)
})?;
debug!(
"IMPORT: Parsed meta: ts={}, compression={}, tags={:?}",
import_meta.ts, import_meta.compression, import_meta.tags
);
// Create item with original timestamp
let item = db::insert_item_with_ts(conn, import_meta.ts, &import_meta.compression)?;
let item_id = item.id.context("New item missing ID")?;
debug!(
"IMPORT: Created item {} with compression {}",
item_id, import_meta.compression
);
// Set tags
if !import_meta.tags.is_empty() {
db::set_item_tags(conn, item.clone(), &import_meta.tags)?;
debug!("IMPORT: Set {} tags", import_meta.tags.len());
}
// Write data to storage using streaming copy
let mut item_path = data_path;
item_path.push(item_id.to_string());
let data_size: i64 = if let Some(ref data_file) = settings.import_data_file {
// Stream from file to storage using fixed-size buffers
let mut reader = fs::File::open(data_file)
.with_context(|| format!("Cannot read data file: {}", data_file.display()))?;
let mut writer = fs::File::create(&item_path)
.with_context(|| format!("Cannot create item file: {}", item_path.display()))?;
let mut buf = [0u8; PIPESIZE];
let mut total = 0i64;
loop {
let n = reader.read(&mut buf)?;
if n == 0 {
break;
}
writer.write_all(&buf[..n])?;
total += n as i64;
}
total
} else {
// Stream from stdin to storage
let mut writer = fs::File::create(&item_path)
.with_context(|| format!("Cannot create item file: {}", item_path.display()))?;
let mut stdin = std::io::stdin().lock();
let mut buf = [0u8; PIPESIZE];
let mut total = 0i64;
loop {
let n = stdin.read(&mut buf)?;
if n == 0 {
break;
}
writer.write_all(&buf[..n])?;
total += n as i64;
}
total
};
if data_size == 0 {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"No data provided (empty file or stdin)",
)
.exit();
}
debug!(
"IMPORT: Wrote {} bytes to {}",
data_size,
item_path.display()
);
// Set metadata
for (key, value) in &import_meta.metadata {
db::query_upsert_meta(
conn,
db::Meta {
id: item_id,
name: key.clone(),
value: value.clone(),
},
)?;
}
if !import_meta.metadata.is_empty() {
debug!(
"IMPORT: Set {} metadata entries",
import_meta.metadata.len()
);
}
// Update item sizes (use imported size if available, otherwise data length)
let size_to_record = import_meta.uncompressed_size.unwrap_or(data_size);
let mut updated_item = item;
updated_item.uncompressed_size = Some(size_to_record);
updated_item.compressed_size = Some(std::fs::metadata(&item_path)?.len() as i64);
updated_item.closed = true;
db::update_item(conn, updated_item)?;
if !settings.quiet {
println!(
"KEEP: Imported item {} tags: {:?}",
item_id, import_meta.tags
);
}
Ok(())
}

View File

@@ -1,7 +1,7 @@
use crate::config;
use crate::modes::common::{OutputFormat, format_size};
use crate::modes::common::{DisplayItemInfo, OutputFormat, format_size, render_item_info_table};
use crate::services::types::ItemWithMeta;
use anyhow::{Result, anyhow};
use anyhow::{Context, Result, anyhow};
use clap::Command;
use clap::error::ErrorKind;
use serde::{Deserialize, Serialize};
@@ -9,7 +9,6 @@ use std::path::PathBuf;
use crate::services::item_service::ItemService;
use chrono::prelude::*;
use comfy_table::{Attribute, Cell};
/// Displays detailed information about an item or the last item if no ID/tags specified.
///
@@ -65,9 +64,8 @@ pub fn mode_info(
// If both are empty, find_item will find the last item
let item_service = ItemService::new(data_path.clone());
// Use empty metadata HashMap
let item_with_meta = item_service
.find_item(conn, ids, tags, &std::collections::HashMap::new())
.find_item(conn, ids, tags, &settings.meta_filter())
.map_err(|e| anyhow!("Unable to find matching item in database: {}", e))?;
show_item(item_with_meta, settings, data_path)
@@ -140,77 +138,44 @@ fn show_item(
return show_item_structured(item_with_meta, settings, data_path, output_format);
}
let item_tags = item_with_meta.tag_names();
let item = item_with_meta.item;
let item_id = item.id.unwrap();
let item_tags: Vec<String> = item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let mut table = crate::modes::common::create_table(false);
// Add all the rows
table.add_row(vec![
Cell::new("ID").add_attribute(Attribute::Bold),
Cell::new(item_id.to_string()),
]);
let timestamp_str = item.ts.with_timezone(&Local).format("%F %T %Z").to_string();
table.add_row(vec![
Cell::new("Timestamp").add_attribute(Attribute::Bold),
Cell::new(&timestamp_str),
]);
let item_id = item.id.context("Item missing ID")?;
let mut item_path_buf = data_path.clone();
item_path_buf.push(item_id.to_string());
let path_str = item_path_buf
.to_str()
.expect("Unable to get item path")
.to_string();
table.add_row(vec![
Cell::new("Path").add_attribute(Attribute::Bold),
Cell::new(&path_str),
]);
let size_str = match item.size {
let size_str = match item.uncompressed_size {
Some(size) => format_size(size as u64, settings.human_readable),
None => "Missing".to_string(),
};
table.add_row(vec![
Cell::new("Stream Size").add_attribute(Attribute::Bold),
Cell::new(&size_str),
]);
table.add_row(vec![
Cell::new("Compression").add_attribute(Attribute::Bold),
Cell::new(&item.compression),
]);
let file_size_str = match item_path_buf.metadata() {
Ok(metadata) => format_size(metadata.len(), settings.human_readable),
Err(_) => "Missing".to_string(),
};
table.add_row(vec![
Cell::new("File Size").add_attribute(Attribute::Bold),
Cell::new(&file_size_str),
]);
let tags_str = item_tags.join(" ");
table.add_row(vec![
Cell::new("Tags").add_attribute(Attribute::Bold),
Cell::new(&tags_str),
]);
let metadata: Vec<(String, String)> = item_with_meta
.meta
.iter()
.map(|m| (m.name.clone(), m.value.clone()))
.collect();
// Add meta rows
for meta in item_with_meta.meta {
let meta_name = format!("Meta: {}", &meta.name);
table.add_row(vec![
Cell::new(&meta_name).add_attribute(Attribute::Bold),
Cell::new(&meta.value),
]);
}
let display = DisplayItemInfo {
id: item_id,
timestamp: item.ts.with_timezone(&Local).format("%F %T %Z").to_string(),
path: item_path_buf
.to_str()
.ok_or_else(|| anyhow::anyhow!("non-UTF-8 item path"))?
.to_string(),
stream_size: size_str,
compression: item.compression.clone(),
file_size: file_size_str,
tags: item_tags,
metadata,
};
println!(
"{}",
crate::modes::common::trim_lines_end(&table.trim_fmt())
);
render_item_info_table(&display, &settings.table_config);
Ok(())
}
@@ -246,10 +211,10 @@ fn show_item_structured(
data_path: PathBuf,
output_format: OutputFormat,
) -> Result<()> {
let item_tags: Vec<String> = item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let item_tags = item_with_meta.tag_names();
let meta_map = item_with_meta.meta_as_map();
let item = item_with_meta.item;
let item_id = item.id.unwrap();
let item_id = item.id.context("Item missing ID")?;
let mut item_path_buf = data_path.clone();
item_path_buf.push(item_id.to_string());
@@ -260,7 +225,7 @@ fn show_item_structured(
None => "Missing".to_string(),
};
let stream_size_formatted = match item.size {
let stream_size_formatted = match item.uncompressed_size {
Some(size) => format_size(size as u64, settings.human_readable),
None => "Missing".to_string(),
};
@@ -273,7 +238,7 @@ fn show_item_structured(
.format("%F %T %Z")
.to_string(),
path: item_path_buf.to_str().unwrap_or("").to_string(),
stream_size: item.size.map(|s| s as u64),
stream_size: item.uncompressed_size.map(|s| s as u64),
stream_size_formatted,
compression: item.compression,
file_size,
@@ -282,15 +247,7 @@ fn show_item_structured(
meta: meta_map,
};
match output_format {
OutputFormat::Json => {
println!("{}", serde_json::to_string_pretty(&item_info)?);
}
OutputFormat::Yaml => {
println!("{}", serde_yaml::to_string(&item_info)?);
}
OutputFormat::Table => unreachable!(),
}
crate::modes::common::print_serialized(&item_info, &output_format)?;
Ok(())
}

View File

@@ -5,10 +5,10 @@
/// including table, JSON, and YAML.
use crate::config;
use crate::modes::common::ColumnType;
use crate::modes::common::{OutputFormat, format_size};
use crate::modes::common::{OutputFormat, apply_color, apply_table_attribute, format_size};
use crate::services::item_service::ItemService;
use crate::services::types::ItemWithMeta;
use anyhow::Result;
use anyhow::{Context, Result};
use comfy_table::CellAlignment;
use comfy_table::{Attribute, Cell, Color, Row};
use serde::{Deserialize, Serialize};
@@ -63,88 +63,6 @@ struct ListItem {
meta: std::collections::HashMap<String, String>,
}
// Helper function to apply color to a cell.
///
/// This function converts the configuration color to a comfy-table Color and
/// applies it to the cell as foreground or background color.
///
/// # Arguments
///
/// * `cell` - The cell to modify.
/// * `color` - The color from configuration to apply.
/// * `is_foreground` - True for foreground color, false for background.
///
/// # Returns
///
/// The modified cell with color applied.
fn apply_color(mut cell: Cell, color: &crate::config::TableColor, is_foreground: bool) -> Cell {
use crate::config::TableColor::*;
use comfy_table::Color;
let comfy_color = match color {
Black => Color::Black,
Red => Color::Red,
Green => Color::Green,
Yellow => Color::Yellow,
Blue => Color::Blue,
Magenta => Color::Magenta,
Cyan => Color::Cyan,
White => Color::White,
Gray => Color::Grey,
DarkRed => Color::DarkRed,
DarkGreen => Color::DarkGreen,
DarkYellow => Color::DarkYellow,
DarkBlue => Color::DarkBlue,
DarkMagenta => Color::DarkMagenta,
DarkCyan => Color::DarkCyan,
Rgb(r, g, b) => Color::Rgb {
r: *r,
g: *g,
b: *b,
},
};
if is_foreground {
cell = cell.fg(comfy_color);
} else {
cell = cell.bg(comfy_color);
}
cell
}
// Helper function to apply attribute to a cell.
///
/// This function applies a single table attribute to the cell based on the
/// configuration attribute type.
///
/// # Arguments
///
/// * `cell` - The cell to modify.
/// * `attribute` - The attribute from configuration to apply.
///
/// # Returns
///
/// The modified cell with attribute applied.
fn apply_attribute(mut cell: Cell, attribute: &crate::config::TableAttribute) -> Cell {
use crate::config::TableAttribute::*;
use comfy_table::Attribute;
match attribute {
Bold => cell = cell.add_attribute(Attribute::Bold),
Dim => cell = cell.add_attribute(Attribute::Dim),
Italic => cell = cell.add_attribute(Attribute::Italic),
Underlined => cell = cell.add_attribute(Attribute::Underlined),
SlowBlink => cell = cell.add_attribute(Attribute::SlowBlink),
RapidBlink => cell = cell.add_attribute(Attribute::RapidBlink),
Reverse => cell = cell.add_attribute(Attribute::Reverse),
Hidden => cell = cell.add_attribute(Attribute::Hidden),
CrossedOut => cell = cell.add_attribute(Attribute::CrossedOut),
}
cell
}
/// Main list mode function.
///
/// This function handles the listing of items based on tags, applying formatting
@@ -163,23 +81,24 @@ fn apply_attribute(mut cell: Cell, attribute: &crate::config::TableAttribute) ->
///
/// * `Result<()>` - Success or error if listing fails.
pub fn mode_list(
cmd: &mut clap::Command,
_cmd: &mut clap::Command,
settings: &config::Settings,
ids: &mut [i64],
tags: &[String],
conn: &mut rusqlite::Connection,
data_path: std::path::PathBuf,
) -> Result<()> {
if !ids.is_empty() {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"ID given, you can only supply tags when using --list",
)
.exit();
}
let item_service = ItemService::new(data_path.clone());
let items_with_meta = item_service.list_items(conn, tags, &std::collections::HashMap::new())?;
let items_with_meta = item_service.get_items(conn, ids, tags, &settings.meta_filter())?;
if settings.ids_only {
for item_with_meta in &items_with_meta {
if let Some(id) = item_with_meta.item.id {
println!("{id}");
}
}
return Ok(());
}
let output_format = crate::modes::common::settings_output_format(settings);
@@ -197,12 +116,12 @@ pub fn mode_list(
table.set_header(header_cells);
for item_with_meta in items_with_meta {
let tags: Vec<String> = item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let tags = item_with_meta.tag_names();
let meta = item_with_meta.meta_as_map();
let item = item_with_meta.item;
let mut item_path = data_path.clone();
item_path.push(item.id.unwrap().to_string());
item_path.push(item.id.context("Item missing ID")?.to_string());
let mut table_row = Row::new();
@@ -210,7 +129,7 @@ pub fn mode_list(
let column_type = column
.name
.parse::<ColumnType>()
.unwrap_or_else(|_| panic!("Unknown column {:?}", column.name));
.with_context(|| format!("Unknown column type {:?} in list format", column.name))?;
let mut meta_name: Option<&str> = None;
@@ -228,19 +147,29 @@ pub fn mode_list(
.with_timezone(&chrono::Local)
.format("%F %T")
.to_string(),
ColumnType::Size => match item.size {
ColumnType::Size => match item.uncompressed_size {
Some(size) => format_size(size as u64, settings.human_readable),
None => match item_path.metadata() {
Ok(_) => "Unknown".to_string(),
Err(_) => "Missing".to_string(),
Err(e) => {
log::warn!("File missing or inaccessible: {}", e);
"Missing".to_string()
}
},
},
ColumnType::Compression => item.compression.to_string(),
ColumnType::FileSize => match item_path.metadata() {
Ok(metadata) => format_size(metadata.len(), settings.human_readable),
Err(_) => "Missing".to_string(),
Err(e) => {
log::warn!("File missing or inaccessible: {}", e);
"Missing".to_string()
}
},
ColumnType::FilePath => item_path.clone().into_os_string().into_string().unwrap(),
ColumnType::FilePath => item_path
.clone()
.into_os_string()
.into_string()
.unwrap_or_else(|os| os.to_string_lossy().into_owned()),
ColumnType::Tags => tags.join(" "),
ColumnType::Meta => match meta_name {
Some(meta_name) => match meta.get(meta_name) {
@@ -278,7 +207,7 @@ pub fn mode_list(
}
for attribute in &column.attributes {
cell = apply_attribute(cell, attribute);
cell = apply_table_attribute(cell, attribute);
}
// Apply padding if specified
@@ -290,7 +219,7 @@ pub fn mode_list(
// Apply styling for specific cases
match column_type {
ColumnType::Size => {
if item.size.is_none() {
if item.uncompressed_size.is_none() {
if item_path.metadata().is_ok() {
cell = cell
.fg(comfy_table::Color::Yellow)
@@ -340,10 +269,10 @@ fn show_list_structured(
let mut list_items = Vec::new();
for item_with_meta in items_with_meta {
let tags: Vec<String> = item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let tags = item_with_meta.tag_names();
let meta = item_with_meta.meta_as_map();
let item = item_with_meta.item;
let item_id = item.id.unwrap();
let item_id = item.id.context("Item missing ID")?;
let mut item_path = data_path.clone();
item_path.push(item_id.to_string());
@@ -354,7 +283,7 @@ fn show_list_structured(
None => "Missing".to_string(),
};
let size_formatted = match item.size {
let size_formatted = match item.uncompressed_size {
Some(size) => crate::modes::common::format_size(size as u64, settings.human_readable),
None => "Unknown".to_string(),
};
@@ -366,7 +295,7 @@ fn show_list_structured(
.with_timezone(&chrono::Local)
.format("%F %T")
.to_string(),
size: item.size.map(|s| s as u64),
size: item.uncompressed_size.map(|s| s as u64),
size_formatted,
compression: item.compression,
file_size,
@@ -379,15 +308,7 @@ fn show_list_structured(
list_items.push(list_item);
}
match output_format {
OutputFormat::Json => {
println!("{}", serde_json::to_string_pretty(&list_items)?);
}
OutputFormat::Yaml => {
println!("{}", serde_yaml::to_string(&list_items)?);
}
OutputFormat::Table => unreachable!(),
}
crate::modes::common::print_serialized(&list_items, &output_format)?;
Ok(())
}

View File

@@ -9,13 +9,16 @@ pub mod common;
pub mod delete;
pub mod diff;
pub mod export;
pub mod generate_config;
pub mod get;
pub mod import;
pub mod info;
pub mod list;
pub mod save;
pub mod status;
pub mod status_plugins;
pub mod update;
/// Column types, output formats, and formatting utilities shared across modes.
pub use common::{ColumnType, OutputFormat, format_size, settings_output_format};
@@ -26,12 +29,18 @@ pub use delete::mode_delete;
/// Compares two items and shows differences.
pub use diff::mode_diff;
/// Exports an item to data and metadata files.
pub use export::mode_export;
/// Generates a default configuration file.
pub use generate_config::mode_generate_config;
/// Retrieves and outputs item content.
pub use get::mode_get;
/// Imports an item from metadata and data files.
pub use import::mode_import;
/// Displays detailed information about items.
pub use info::mode_info;
@@ -50,3 +59,6 @@ pub use status::mode_status;
/// Lists available plugins and their configurations.
pub use status_plugins::mode_status_plugins;
/// Updates an item's tags and metadata by ID.
pub use update::mode_update;

File diff suppressed because it is too large Load Diff

View File

@@ -1,72 +0,0 @@
use axum::{
extract::State,
http::StatusCode,
response::sse::{Event, KeepAlive, Sse},
};
use futures::stream::{self, Stream};
use log::{debug, info};
use std::convert::Infallible;
use std::time::Duration;
use crate::modes::server::common::AppState;
use crate::modes::server::mcp::KeepMcpServer;
#[utoipa::path(
get,
path = "/mcp/sse",
operation_id = "mcp_sse",
summary = "MCP SSE endpoint",
description = "Server-Sent Events for Model Context Protocol. Enables AI tools to interact with Keep's storage and retrieval functions.",
responses(
(status = 200, description = "SSE stream established"),
(status = 401, description = "Unauthorized"),
(status = 500, description = "Internal server error")
),
security(
("bearerAuth" = [])
),
tag = "mcp"
)]
pub async fn handle_mcp_sse(
State(state): State<AppState>,
) -> Result<Sse<impl Stream<Item = Result<Event, Infallible>>>, StatusCode> {
debug!("MCP: Starting SSE endpoint");
let _mcp_server = KeepMcpServer::new(state);
// Create a simple message channel for SSE communication
let (tx, rx) = tokio::sync::mpsc::unbounded_channel::<String>();
// Send initial connection message
let _ = tx.send("data: {\"type\":\"connection\",\"status\":\"connected\"}\n\n".to_string());
// For now, create a simple stream that sends periodic keep-alive messages
// In a full implementation, this would integrate with the rmcp transport layer
let stream = stream::unfold((rx, tx), |(mut rx, tx)| async move {
tokio::select! {
msg = rx.recv() => {
match msg {
Some(data) => {
let event = Event::default().data(data);
Some((Ok(event), (rx, tx)))
}
None => None,
}
}
_ = tokio::time::sleep(Duration::from_secs(30)) => {
let event = Event::default()
.event("keep-alive")
.data("ping");
Some((Ok(event), (rx, tx)))
}
}
});
info!("MCP: SSE endpoint established");
Ok(Sse::new(stream).keep_alive(
KeepAlive::new()
.interval(Duration::from_secs(30))
.text("keep-alive"),
))
}

View File

@@ -1,12 +1,10 @@
pub mod common;
pub mod item;
#[cfg(feature = "mcp")]
pub mod mcp;
pub mod status;
use axum::{
Router,
routing::{delete, get},
routing::{delete, get, post},
};
use crate::modes::server::common::AppState;
@@ -60,8 +58,7 @@ use utoipa_swagger_ui::SwaggerUi;
struct ApiDoc;
pub fn add_routes(router: Router<AppState>) -> Router<AppState> {
#[cfg_attr(not(feature = "mcp"), allow(unused_mut))]
let mut router = router
router
// Status endpoints
.route("/api/status", get(status::handle_status))
.route("/api/plugins/status", get(status::handle_plugins_status))
@@ -88,14 +85,10 @@ pub fn add_routes(router: Router<AppState>) -> Router<AppState> {
)
.route("/api/item/{item_id}", delete(item::handle_delete_item))
.route("/api/item/{item_id}/info", get(item::handle_get_item_info))
.route("/api/diff", get(item::handle_diff_items));
#[cfg(feature = "mcp")]
{
router = router.route("/mcp/sse", get(mcp::handle_mcp_sse));
}
router
.route("/api/item/{item_id}/update", post(item::handle_update_item))
.route("/api/diff", get(item::handle_diff_items))
.route("/api/export", get(item::handle_export_items))
.route("/api/import", post(item::handle_import_items))
}
#[cfg(feature = "swagger")]

View File

@@ -2,6 +2,32 @@ use axum::{extract::State, http::StatusCode, response::Json};
use crate::modes::server::common::{ApiResponse, AppState, StatusInfoResponse};
async fn generate_status(
state: &AppState,
) -> Result<crate::common::status::StatusInfo, StatusCode> {
let db_path = state
.db
.lock()
.await
.path()
.unwrap_or("unknown")
.to_string();
let status_service = crate::services::status_service::StatusService::new();
let mut cmd = state.cmd.lock().await;
status_service
.generate_status(
&mut cmd,
&state.settings,
state.data_dir.clone(),
db_path.into(),
)
.map_err(|e| {
log::warn!("Failed to generate status: {e}");
StatusCode::INTERNAL_SERVER_ERROR
})
}
#[utoipa::path(
get,
path = "/api/status",
@@ -39,7 +65,7 @@ use crate::modes::server::common::{ApiResponse, AppState, StatusInfoResponse};
///
/// # Examples
///
/// ```
/// ```ignore
/// // In an Axum app:
/// async fn app() -> Result<Json<StatusInfoResponse>, StatusCode> {
/// handle_status(State(app_state)).await
@@ -48,24 +74,7 @@ use crate::modes::server::common::{ApiResponse, AppState, StatusInfoResponse};
pub async fn handle_status(
State(state): State<AppState>,
) -> Result<Json<StatusInfoResponse>, StatusCode> {
// Get database path
let db_path = state
.db
.lock()
.await
.path()
.unwrap_or("unknown")
.to_string();
// Use the status service to generate status info showing configured plugins
let status_service = crate::services::status_service::StatusService::new();
let mut cmd = state.cmd.lock().await;
let status_info = status_service.generate_status(
&mut cmd,
&state.settings,
state.data_dir.clone(),
db_path.into(),
);
let status_info = generate_status(&state).await?;
let response = StatusInfoResponse {
success: true,
@@ -102,22 +111,7 @@ pub struct PluginsStatusResponse {
pub async fn handle_plugins_status(
State(state): State<AppState>,
) -> Result<Json<crate::modes::server::common::ApiResponse<PluginsStatusResponse>>, StatusCode> {
let db_path = state
.db
.lock()
.await
.path()
.unwrap_or("unknown")
.to_string();
let status_service = crate::services::status_service::StatusService::new();
let mut cmd = state.cmd.lock().await;
let status_info = status_service.generate_status(
&mut cmd,
&state.settings,
state.data_dir.clone(),
db_path.into(),
);
let status_info = generate_status(&state).await?;
let response_data = PluginsStatusResponse {
meta_plugins: status_info.meta_plugins,

118
src/modes/server/auth.rs Normal file
View File

@@ -0,0 +1,118 @@
use axum::http::Method;
use jsonwebtoken::{DecodingKey, TokenData, Validation, decode};
use log::debug;
use serde::Deserialize;
/// JWT claims for permission-based access control.
///
/// External token generators should include these claims in the JWT payload.
/// The server validates the signature and checks permissions for each request.
///
/// # Example token payload
///
/// ```json
/// {
/// "sub": "my-client",
/// "exp": 1735689600,
/// "read": true,
/// "write": true,
/// "delete": false
/// }
/// ```
#[derive(Debug, Deserialize)]
pub struct Claims {
/// Subject (client identifier).
pub sub: String,
/// Expiration time (Unix timestamp).
pub exp: usize,
/// Read permission (GET requests).
#[serde(default)]
pub read: bool,
/// Write permission (POST/PUT requests).
#[serde(default)]
pub write: bool,
/// Delete permission (DELETE requests).
#[serde(default)]
pub delete: bool,
}
/// Returns the required permission for an HTTP method.
///
/// # Mapping
///
/// - GET, HEAD → "read"
/// - POST, PUT, PATCH → "write"
/// - DELETE → "delete"
///
/// # Arguments
///
/// * `method` - The HTTP method of the incoming request.
///
/// # Returns
///
/// A string slice representing the required permission.
pub fn required_permission(method: &Method) -> &'static str {
if method == Method::GET || method == Method::HEAD {
"read"
} else if method == Method::DELETE {
"delete"
} else {
"write"
}
}
/// Checks if the JWT claims grant the required permission.
///
/// # Arguments
///
/// * `claims` - The validated JWT claims.
/// * `permission` - The required permission string ("read", "write", or "delete").
///
/// # Returns
///
/// `true` if the claims grant the permission, `false` otherwise.
pub fn check_permission(claims: &Claims, permission: &str) -> bool {
match permission {
"read" => claims.read,
"write" => claims.write,
"delete" => claims.delete,
_ => false,
}
}
/// Validates a JWT token and returns the claims.
///
/// Uses HMAC-SHA256 signature verification with the provided secret.
///
/// # Arguments
///
/// * `token` - The JWT token string (without "Bearer " prefix).
/// * `secret` - The secret key used to verify the signature.
///
/// # Returns
///
/// * `Ok(Claims)` - The validated claims if the token is valid.
/// * `Err(String)` - A human-readable error message if validation fails.
pub fn validate_jwt(token: &str, secret: &str) -> Result<Claims, String> {
let mut validation = Validation::new(jsonwebtoken::Algorithm::HS256);
validation.algorithms = vec![jsonwebtoken::Algorithm::HS256];
validation.set_required_spec_claims(&["exp", "sub"]);
let token_data: TokenData<Claims> = decode::<Claims>(
token,
&DecodingKey::from_secret(secret.as_bytes()),
&validation,
)
.map_err(|e| {
debug!("JWT validation failed: {e}");
match e.kind() {
jsonwebtoken::errors::ErrorKind::ExpiredSignature => "Token expired".to_string(),
jsonwebtoken::errors::ErrorKind::InvalidSignature => "Invalid token".to_string(),
jsonwebtoken::errors::ErrorKind::InvalidToken => "Malformed token".to_string(),
jsonwebtoken::errors::ErrorKind::ImmatureSignature => "Token not yet valid".to_string(),
_ => "Invalid token".to_string(),
}
})?;
Ok(token_data.claims)
}

View File

@@ -1,4 +1,5 @@
use crate::services::item_service::ItemService;
use crate::services::types::ItemWithMeta;
/// Common utilities and types for the server module.
///
/// This module provides shared structures, functions, and middleware used across
@@ -7,15 +8,15 @@ use crate::services::item_service::ItemService;
///
/// # Usage
///
/// ```rust
/// ```rust,ignore
/// // Illustrative — requires runtime values (db connection, settings).
/// use keep::modes::server::common::{ServerConfig, AppState};
/// let config = ServerConfig { address: "127.0.0.1".to_string(), ..Default::default() };
/// let state = AppState { /* ... */ };
/// let config = ServerConfig { address: "127.0.0.1".to_string(), port: Some(8080), /* ... */ };
/// ```
use anyhow::Result;
use axum::{
extract::{ConnectInfo, Request},
http::{HeaderMap, StatusCode},
http::{HeaderMap, Method, StatusCode},
middleware::Next,
response::Response,
};
@@ -27,6 +28,7 @@ use std::net::SocketAddr;
use std::path::PathBuf;
use std::sync::Arc;
use std::time::Instant;
use subtle::ConstantTimeEq;
use tokio::sync::Mutex;
use utoipa::ToSchema;
@@ -37,12 +39,18 @@ use utoipa::ToSchema;
///
/// # Examples
///
/// ```
/// ```rust
/// use keep::modes::server::common::ServerConfig;
/// let config = ServerConfig {
/// address: "127.0.0.1".to_string(),
/// port: Some(8080),
/// username: None,
/// password: None,
/// password_hash: None,
/// jwt_secret: None,
/// cert_file: None,
/// key_file: None,
/// cors_origin: None,
/// };
/// ```
#[derive(Debug, Clone)]
@@ -57,9 +65,13 @@ pub struct ServerConfig {
/// The TCP port number to listen on. If not specified, a default port (typically
/// 8080 or 21080) will be used.
pub port: Option<u16>,
/// Optional authentication username.
///
/// Username for Basic authentication. Defaults to "keep" when not specified.
pub username: Option<String>,
/// Optional authentication password.
///
/// Plain text password for basic or bearer token authentication. This should be
/// Plain text password for Basic authentication. This should be
/// used only for testing or low-security environments.
pub password: Option<String>,
/// Optional hashed authentication password.
@@ -67,6 +79,11 @@ pub struct ServerConfig {
/// Pre-hashed password (Unix crypt format) for secure authentication. Preferred
/// over plain text password for production use.
pub password_hash: Option<String>,
/// Optional JWT secret for token-based authentication.
///
/// When set, the server validates JWT tokens (HS256) and checks permission claims
/// (read, write, delete) for each request. Takes priority over password auth.
pub jwt_secret: Option<String>,
/// Optional path to TLS certificate file (PEM).
///
/// When both cert_file and key_file are set, the server uses HTTPS.
@@ -75,6 +92,12 @@ pub struct ServerConfig {
///
/// When both cert_file and key_file are set, the server uses HTTPS.
pub key_file: Option<PathBuf>,
/// Optional CORS allowed origin.
///
/// When set, cross-origin requests are restricted to this origin.
/// Defaults to "http://localhost" if not specified. Use "*" to allow
/// all origins (not recommended for production).
pub cors_origin: Option<String>,
}
/// Application state shared across all routes.
@@ -84,7 +107,8 @@ pub struct ServerConfig {
///
/// # Examples
///
/// ```rust
/// ```rust,ignore
/// // AppState requires runtime values (db connection, settings) not available in doctests.
/// use keep::modes::server::common::AppState;
/// use std::sync::Arc;
/// use tokio::sync::Mutex;
@@ -134,9 +158,9 @@ pub struct AppState {
///
/// ```rust
/// use keep::modes::server::common::ApiResponse;
/// let response: ApiResponse<Vec<ItemInfo>> = ApiResponse {
/// let response: ApiResponse<String> = ApiResponse {
/// success: true,
/// data: Some(items),
/// data: Some("items".to_string()),
/// error: None,
/// };
/// ```
@@ -159,6 +183,26 @@ pub struct ApiResponse<T> {
pub error: Option<String>,
}
impl<T> ApiResponse<T> {
/// Creates a successful API response with the given data.
pub fn ok(data: T) -> Self {
Self {
success: true,
data: Some(data),
error: None,
}
}
/// Creates a successful API response with no data.
pub fn empty() -> Self {
Self {
success: true,
data: None,
error: None,
}
}
}
/// Response type for list of item information.
///
/// Specialized response for endpoints that return multiple items.
@@ -169,7 +213,7 @@ pub struct ApiResponse<T> {
/// use keep::modes::server::common::ItemInfoListResponse;
/// let response = ItemInfoListResponse {
/// success: true,
/// data: Some(vec![item_info]),
/// data: Some(vec![]),
/// error: None,
/// };
/// ```
@@ -199,7 +243,7 @@ pub struct ItemInfoListResponse {
/// use keep::modes::server::common::ItemInfoResponse;
/// let response = ItemInfoResponse {
/// success: true,
/// data: Some(item_info),
/// data: None,
/// error: None,
/// };
/// ```
@@ -229,7 +273,7 @@ pub struct ItemInfoResponse {
/// use keep::modes::server::common::ItemContentInfoResponse;
/// let response = ItemContentInfoResponse {
/// success: true,
/// data: Some(content_info),
/// data: None,
/// error: None,
/// };
/// ```
@@ -259,7 +303,7 @@ pub struct ItemContentInfoResponse {
/// use keep::modes::server::common::MetadataResponse;
/// let response = MetadataResponse {
/// success: true,
/// data: Some(meta_map),
/// data: None,
/// error: None,
/// };
/// ```
@@ -289,7 +333,7 @@ pub struct MetadataResponse {
/// use keep::modes::server::common::StatusInfoResponse;
/// let response = StatusInfoResponse {
/// success: true,
/// data: Some(status_info),
/// data: None,
/// error: None,
/// };
/// ```
@@ -322,10 +366,13 @@ pub struct StatusInfoResponse {
/// let item_info = ItemInfo {
/// id: 42,
/// ts: "2023-12-01T15:30:45Z".to_string(),
/// size: Some(1024),
/// uncompressed_size: Some(1024),
/// compressed_size: Some(512),
/// closed: true,
/// compression: "gzip".to_string(),
/// tags: vec!["important".to_string()],
/// metadata: HashMap::from([("mime_type".to_string(), "text/plain".to_string())]),
/// file_size: Some(512),
/// };
/// ```
#[derive(Serialize, Deserialize, ToSchema)]
@@ -341,11 +388,19 @@ pub struct ItemInfo {
/// The creation timestamp of the item in ISO 8601 format.
#[schema(example = "2023-12-01T15:30:45Z")]
pub ts: String,
/// Size in bytes.
/// Uncompressed size in bytes.
///
/// The size of the item's content in bytes, may be None if not set.
/// The uncompressed size of the item's content in bytes, may be None if not set.
#[schema(example = 1024)]
pub size: Option<i64>,
pub uncompressed_size: Option<i64>,
/// Compressed size in bytes.
///
/// The compressed file size on disk in bytes, may be None if not set.
#[schema(example = 512)]
pub compressed_size: Option<i64>,
/// Whether the item has been fully written and closed.
#[schema(example = true)]
pub closed: bool,
/// Compression type.
///
/// The compression algorithm used for the item's content.
@@ -361,6 +416,56 @@ pub struct ItemInfo {
/// Key-value pairs containing additional metadata about the item.
#[schema(example = json!({"mime_type": "text/plain", "mime_encoding": "utf-8", "line_count": "42"}))]
pub metadata: HashMap<String, String>,
/// Actual file size in bytes.
///
/// The filesystem-reported size of the item's data file. This may differ from
/// `compressed_size` if the file was written and the database hasn't been updated.
/// None if the file cannot be read (e.g., file not found, permission denied).
#[schema(example = 512)]
pub file_size: Option<i64>,
}
impl ItemInfo {
/// Enriches this `ItemInfo` with the actual filesystem-reported size.
///
/// Reads the size of the item's data file from disk and sets `file_size`.
/// If the file cannot be read, `file_size` is left as None.
///
/// # Arguments
///
/// * `data_dir` - The data directory path containing item files.
///
/// # Returns
///
/// A new `ItemInfo` with `file_size` populated from the filesystem.
pub fn with_file_size(mut self, data_dir: &std::path::Path) -> Self {
let item_path = data_dir.join(self.id.to_string());
self.file_size = std::fs::metadata(&item_path).map(|m| m.len() as i64).ok();
self
}
}
impl TryFrom<ItemWithMeta> for ItemInfo {
type Error = anyhow::Error;
fn try_from(item_with_meta: ItemWithMeta) -> Result<Self, Self::Error> {
let tags = item_with_meta.tag_names();
let metadata = item_with_meta.meta_as_map();
Ok(ItemInfo {
id: item_with_meta
.item
.id
.ok_or_else(|| anyhow::anyhow!("Item missing ID"))?,
ts: item_with_meta.item.ts.to_rfc3339(),
uncompressed_size: item_with_meta.item.uncompressed_size,
compressed_size: item_with_meta.item.compressed_size,
closed: item_with_meta.item.closed,
compression: item_with_meta.item.compression,
tags,
metadata,
file_size: None,
})
}
}
/// Item information including content and metadata, with binary detection.
@@ -427,14 +532,20 @@ pub struct TagsQuery {
/// ```rust
/// use keep::modes::server::common::ListItemsQuery;
/// let query = ListItemsQuery {
/// ids: None,
/// tags: Some("important".to_string()),
/// order: Some("newest".to_string()),
/// start: Some(0),
/// count: Some(10),
/// meta: None,
/// };
/// ```
#[derive(Debug, Deserialize)]
pub struct ListItemsQuery {
/// Optional comma-separated item IDs for filtering.
///
/// String containing numeric IDs to filter the item list.
pub ids: Option<String>,
/// Optional comma-separated tags for filtering.
///
/// String containing tags to filter the item list.
@@ -451,6 +562,11 @@ pub struct ListItemsQuery {
///
/// Unsigned integer limiting the number of items returned.
pub count: Option<u32>,
/// Optional metadata filter as JSON string.
///
/// JSON object where keys are metadata keys and values are either
/// `null` (filter by key existence) or a string (filter by exact value match).
pub meta: Option<String>,
}
/// Query parameters for item retrieval.
@@ -467,6 +583,7 @@ pub struct ListItemsQuery {
/// length: 1024,
/// stream: false,
/// as_meta: false,
/// decompress: true,
/// };
/// ```
#[derive(Debug, Deserialize, utoipa::ToSchema)]
@@ -517,6 +634,7 @@ pub struct ItemQuery {
/// length: 1024,
/// stream: false,
/// as_meta: false,
/// decompress: true,
/// };
/// ```
#[derive(Debug, Deserialize, utoipa::ToSchema)]
@@ -609,6 +727,31 @@ pub struct CreateItemQuery {
/// Set to false when the client has already collected metadata.
#[serde(default = "default_true")]
pub meta: bool,
/// Compression type used by the client (e.g. "lz4", "gzip").
/// Only used when compress=false — tells the server what compression
/// the client applied so the correct type is recorded in the database.
pub compression_type: Option<String>,
/// Optional timestamp for the item (RFC 3339 format).
/// Used during import to preserve the original item's timestamp.
/// If not provided, the server uses the current time.
pub ts: Option<String>,
}
/// Query parameters for updating item metadata via POST.
///
/// Query parameters for POST /api/item/{item_id}/update.
/// Re-runs specified meta plugins on the stored content and/or
/// applies direct metadata key-value overrides.
#[derive(Debug, Deserialize)]
pub struct UpdateItemQuery {
/// Optional comma-separated list of plugin names to re-run.
pub plugins: Option<String>,
/// Optional metadata overrides as JSON string.
pub metadata: Option<String>,
/// Optional comma-separated tags to add.
pub tags: Option<String>,
/// Optional uncompressed size to set on the item.
pub uncompressed_size: Option<i64>,
}
/// Request body for creating a new item.
@@ -626,53 +769,59 @@ pub struct CreateItemRequest {
pub metadata: Option<std::collections::HashMap<String, String>>,
}
/// Validates bearer authentication token.
/// Checks authorization header for valid credentials.
///
/// This function checks if the provided authorization string is a valid Bearer token
/// matching the expected password or hash.
/// This function inspects the HTTP Authorization header for valid Basic
/// authentication credentials against the provided username and password or hash.
/// Bearer tokens are not checked here — JWT validation is handled separately
/// in the middleware.
///
/// # Arguments
///
/// * `auth_str` - The authorization string from the header.
/// * `expected_password` - The expected plain text password.
/// * `expected_hash` - Optional expected password hash.
/// * `headers` - HTTP headers from the request.
/// * `username` - Optional expected username (defaults to "keep").
/// * `password` - Optional expected password.
/// * `password_hash` - Optional expected password hash.
///
/// # Returns
///
/// * `true` - If authentication succeeds.
/// * `false` - Otherwise.
///
/// # Errors
///
/// None; returns false on failure.
fn check_bearer_auth(
auth_str: &str,
expected_password: &str,
expected_hash: &Option<String>,
/// * `true` - If authorized (or no auth required).
/// * `false` - If unauthorized.
pub fn check_auth(
headers: &HeaderMap,
username: &Option<String>,
password: &Option<String>,
password_hash: &Option<String>,
) -> bool {
if !auth_str.starts_with("Bearer ") {
return false;
// If neither password nor hash is set, no authentication required
if password.is_none() && password_hash.is_none() {
return true;
}
let provided_password = &auth_str[7..];
let effective_username = username.as_deref().unwrap_or("keep");
// If we have a password hash, verify against it
if let Some(hash) = expected_hash {
return pwhash::unix::verify(provided_password, hash);
if let Some(auth_header) = headers.get("authorization")
&& let Ok(auth_str) = auth_header.to_str()
{
return check_basic_auth(
auth_str,
effective_username,
password.as_deref().unwrap_or(""),
password_hash,
);
}
// Otherwise, do direct comparison
provided_password == expected_password
false
}
/// Validates basic authentication credentials.
///
/// This function decodes and validates Basic Auth credentials from the authorization
/// header against the expected password or hash.
/// header against the expected username and password or hash.
///
/// # Arguments
///
/// * `auth_str` - The authorization string from the header.
/// * `expected_username` - The expected username.
/// * `expected_password` - The expected plain text password.
/// * `expected_hash` - Optional expected password hash.
///
@@ -686,6 +835,7 @@ fn check_bearer_auth(
/// Returns false on decode or validation failure.
fn check_basic_auth(
auth_str: &str,
expected_username: &str,
expected_password: &str,
expected_hash: &Option<String>,
) -> bool {
@@ -694,63 +844,33 @@ fn check_basic_auth(
}
let encoded = &auth_str[6..];
if let Ok(decoded_bytes) = base64::engine::general_purpose::STANDARD.decode(encoded) {
if let Ok(decoded_str) = String::from_utf8(decoded_bytes) {
if let Some(colon_pos) = decoded_str.find(':') {
if let Ok(decoded_bytes) = base64::engine::general_purpose::STANDARD.decode(encoded)
&& let Ok(decoded_str) = String::from_utf8(decoded_bytes)
&& let Some(colon_pos) = decoded_str.find(':')
{
let provided_username = &decoded_str[..colon_pos];
let provided_password = &decoded_str[colon_pos + 1..];
// Check username with constant-time comparison
if !bool::from(
provided_username
.as_bytes()
.ct_eq(expected_username.as_bytes()),
) {
return false;
}
// If we have a password hash, verify against it
if let Some(hash) = expected_hash {
return pwhash::unix::verify(provided_password, hash);
}
// Otherwise, do direct comparison
let expected_credentials = format!("keep:{expected_password}");
return decoded_str == expected_credentials;
}
}
}
false
}
/// Checks authorization header for valid credentials.
///
/// This function inspects the HTTP Authorization header for valid Bearer or Basic
/// authentication credentials against the provided password or hash.
///
/// # Arguments
///
/// * `headers` - HTTP headers from the request.
/// * `password` - Optional expected password.
/// * `password_hash` - Optional expected password hash.
///
/// # Returns
///
/// * `true` - If authorized (or no auth required).
/// * `false` - If unauthorized.
///
/// # Examples
///
/// ```
/// if check_auth(&headers, &Some("pass".to_string()), &None) {
/// // Proceed
/// }
/// ```
pub fn check_auth(
headers: &HeaderMap,
password: &Option<String>,
password_hash: &Option<String>,
) -> bool {
// If neither password nor hash is set, no authentication required
if password.is_none() && password_hash.is_none() {
return true;
}
if let Some(auth_header) = headers.get("authorization") {
if let Ok(auth_str) = auth_header.to_str() {
return check_bearer_auth(auth_str, password.as_deref().unwrap_or(""), password_hash)
|| check_basic_auth(auth_str, password.as_deref().unwrap_or(""), password_hash);
}
// Otherwise, do constant-time comparison to prevent timing attacks
return bool::from(
provided_password
.as_bytes()
.ct_eq(expected_password.as_bytes()),
);
}
false
}
@@ -817,29 +937,31 @@ pub async fn logging_middleware(
/// Creates authentication middleware for the application.
///
/// This function returns a middleware that enforces authentication on protected routes
/// using Bearer token or Basic Auth, challenging unauthorized requests with appropriate
/// headers.
/// This function returns a middleware that enforces authentication on protected routes.
///
/// **JWT and Basic Auth are mutually exclusive.** When `jwt_secret` is set, the
/// middleware validates JWT (HS256) tokens and checks permission claims (read, write,
/// delete) based on the HTTP method. Requests without a valid Bearer token are
/// rejected with 401 — Basic Auth is **not** consulted as a fallback.
///
/// When `jwt_secret` is not set, Basic Auth password authentication is used instead.
///
/// # Arguments
///
/// * `username` - Optional username (defaults to "keep").
/// * `password` - Optional plain text password.
/// * `password_hash` - Optional hashed password.
/// * `jwt_secret` - Optional JWT secret for token-based authentication.
///
/// # Returns
///
/// A clonable async middleware function for Axum.
///
/// # Examples
///
/// ```
/// let auth_middleware = create_auth_middleware(Some("pass".to_string()), None);
/// router.layer(auth_middleware);
/// ```
#[allow(clippy::type_complexity)]
pub fn create_auth_middleware(
username: Option<String>,
password: Option<String>,
password_hash: Option<String>,
jwt_secret: Option<String>,
) -> impl Fn(
ConnectInfo<SocketAddr>,
Request,
@@ -849,13 +971,62 @@ pub fn create_auth_middleware(
+ Clone
+ Send {
move |ConnectInfo(addr): ConnectInfo<SocketAddr>, request: Request, next: Next| {
let username = username.clone();
let password = password.clone();
let password_hash = password_hash.clone();
let jwt_secret = jwt_secret.clone();
Box::pin(async move {
let headers = request.headers().clone();
let uri = request.uri().clone();
let method = request.method().clone();
if !check_auth(&headers, &password, &password_hash) {
// CORS preflight requests pass through without authentication
if method == Method::OPTIONS {
return Ok(next.run(request).await);
}
// JWT authentication takes priority when secret is configured
if let Some(ref secret) = jwt_secret
&& let Some(auth_header) = headers.get("authorization")
&& let Ok(auth_str) = auth_header.to_str()
&& let Some(token) = auth_str.strip_prefix("Bearer ")
{
match super::auth::validate_jwt(token, secret) {
Ok(claims) => {
let required = super::auth::required_permission(&method);
if !super::auth::check_permission(&claims, required) {
warn!(
"Forbidden: {method} {uri} from {addr} \
(sub={}, missing permission: {required})",
claims.sub
);
let mut response = Response::new(axum::body::Body::from("Forbidden"));
*response.status_mut() = StatusCode::FORBIDDEN;
return Ok(response);
}
// JWT valid and authorized, proceed
let response = next.run(request).await;
return Ok(response);
}
Err(e) => {
warn!("JWT validation failed for {uri} from {addr}: {e}");
let mut response = Response::new(axum::body::Body::from("Unauthorized"));
*response.status_mut() = StatusCode::UNAUTHORIZED;
return Ok(response);
}
}
}
// JWT secret configured but no valid Bearer token provided
if jwt_secret.is_some() {
warn!("Missing JWT token for {uri} from {addr}");
let mut response = Response::new(axum::body::Body::from("Unauthorized"));
*response.status_mut() = StatusCode::UNAUTHORIZED;
return Ok(response);
}
// Fall back to Basic Auth password authentication
if !check_auth(&headers, &username, &password, &password_hash) {
warn!("Unauthorized request to {uri} from {addr}");
// Add WWW-Authenticate header to trigger basic auth in browsers
let mut response = Response::new(axum::body::Body::from("Unauthorized"));

View File

@@ -1,83 +0,0 @@
pub mod server;
pub mod tools;
pub use server::KeepMcpServer;
/// Module for handling MCP (Model Context Protocol) requests in the server.
///
/// Provides handlers for JSON-RPC style requests to interact with Keep's storage
/// via the API.
use axum::{Json, extract::State, http::StatusCode, response::IntoResponse};
use serde::Deserialize;
use serde_json::Value;
use crate::modes::server::common::ApiResponse;
use crate::modes::server::common::AppState;
/// Request structure for MCP JSON-RPC calls.
///
/// # Fields
///
/// * `method` - The MCP method name (e.g., "save_item").
/// * `params` - Optional JSON parameters for the method.
#[derive(Deserialize)]
pub struct McpRequest {
pub method: String,
pub params: Option<Value>,
}
/// Handles an MCP request via the Axum framework.
///
/// Parses the JSON request, delegates to `KeepMcpServer`, and returns an API response.
/// Attempts to parse the result as JSON; falls back to string if invalid.
///
/// # Arguments
///
/// * `State(state)` - The application state.
/// * `Json(request)` - The deserialized MCP request.
///
/// # Returns
///
/// An `IntoResponse` with status code and JSON API response.
///
/// # Errors
///
/// Returns 400 Bad Request on handler errors.
pub async fn handle_mcp_request(
State(state): State<AppState>,
Json(request): Json<McpRequest>,
) -> impl IntoResponse {
let mcp_server = KeepMcpServer::new(state);
match mcp_server
.handle_request(&request.method, request.params)
.await
{
Ok(result) => match serde_json::from_str(&result) {
Ok(parsed_result) => {
let response = ApiResponse {
success: true,
data: Some(parsed_result),
error: None,
};
(StatusCode::OK, Json(response))
}
Err(_) => {
let response = ApiResponse {
success: true,
data: Some(serde_json::Value::String(result)),
error: None,
};
(StatusCode::OK, Json(response))
}
},
Err(e) => {
let response = ApiResponse {
success: false,
data: None,
error: Some(e.to_string()),
};
(StatusCode::BAD_REQUEST, Json(response))
}
}
}

View File

@@ -1,83 +0,0 @@
use log::debug;
use serde_json::Value;
use super::tools::{KeepTools, ToolError};
use crate::modes::server::common::AppState;
/// Server handler for MCP (Model Context Protocol) requests.
///
/// Routes requests to appropriate tools and handles responses. Clones AppState for tool usage.
///
/// # Fields
///
/// * `state` - The shared application state (DB, config, etc.).
#[derive(Clone)]
pub struct KeepMcpServer {
state: AppState,
}
/// Creates a new `KeepMcpServer` instance.
///
/// # Arguments
///
/// * `state` - The application state containing DB, config, and services.
///
/// # Returns
///
/// A new `KeepMcpServer` instance.
///
/// # Examples
///
/// ```
/// let server = KeepMcpServer::new(app_state);
/// ```
impl KeepMcpServer {
pub fn new(state: AppState) -> Self {
Self { state }
}
/// Handles an MCP request by routing to the appropriate tool.
///
/// Supports methods like "save_item", "get_item", "list_items". Logs the request and delegates to KeepTools.
///
/// # Arguments
///
/// * `method` - The MCP method name (string).
/// * `params` - Optional JSON parameters as serde_json::Value.
///
/// # Returns
///
/// `Ok(String)` with JSON-serialized response on success, or `Err(ToolError)` on failure.
///
/// # Errors
///
/// * ToolError::UnknownTool if method unsupported.
/// * Propagates tool-specific errors (e.g., invalid args, DB failures).
///
/// # Examples
///
/// ```
/// let result = server.handle_request("save_item", Some(params)).await?;
/// ```
pub async fn handle_request(
&self,
method: &str,
params: Option<Value>,
) -> Result<String, ToolError> {
debug!(
"MCP: Handling request '{}' with params: {:?}",
method, params
);
let tools = KeepTools::new(self.state.clone());
match method {
"save_item" => tools.save_item(params).await,
"get_item" => tools.get_item(params).await,
"get_latest_item" => tools.get_latest_item(params).await,
"list_items" => tools.list_items(params).await,
"search_items" => tools.search_items(params).await,
_ => Err(ToolError::UnknownTool(method.to_string())),
}
}
}

View File

@@ -1,344 +0,0 @@
use anyhow::{Result, anyhow};
use log::debug;
use serde_json::Value;
use std::collections::HashMap;
use crate::modes::server::common::AppState;
use crate::services::async_item_service::AsyncItemService;
use crate::services::error::CoreError;
#[derive(Debug, thiserror::Error)]
pub enum ToolError {
#[error("Unknown tool: {0}")]
UnknownTool(String),
#[error("Invalid arguments: {0}")]
InvalidArguments(String),
#[error("Database error: {0}")]
Database(#[from] rusqlite::Error),
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
#[error("JSON error: {0}")]
Json(#[from] serde_json::Error),
#[error("Parse error: {0}")]
Parse(#[from] strum::ParseError),
#[error("Other error: {0}")]
Other(#[from] anyhow::Error),
}
pub struct KeepTools {
state: AppState,
}
impl KeepTools {
pub fn new(state: AppState) -> Self {
Self { state }
}
pub async fn save_item(&self, args: Option<Value>) -> Result<String, ToolError> {
let args =
args.ok_or_else(|| ToolError::InvalidArguments("Missing arguments".to_string()))?;
let content = args
.get("content")
.and_then(|v| v.as_str())
.ok_or_else(|| ToolError::InvalidArguments("Missing 'content' field".to_string()))?;
let tags: Vec<String> = args
.get("tags")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
let metadata: HashMap<String, String> = args
.get("metadata")
.and_then(|v| v.as_object())
.map(|obj| {
obj.iter()
.filter_map(|(k, v)| v.as_str().map(|s| (k.clone(), s.to_string())))
.collect()
})
.unwrap_or_default();
debug!(
"MCP: Saving item with {} bytes, {} tags, {} metadata entries",
content.len(),
tags.len(),
metadata.len()
);
let service = AsyncItemService::new(
self.state.data_dir.clone(),
self.state.db.clone(),
self.state.item_service.clone(),
self.state.cmd.clone(),
self.state.settings.clone(),
);
let item_with_meta = service
.save_item_from_mcp(content.as_bytes().to_vec(), tags, metadata)
.await
.map_err(|e| ToolError::Other(anyhow::Error::from(e)))?;
let item_id = item_with_meta
.item
.id
.ok_or_else(|| anyhow!("Failed to get item ID"))?;
Ok(format!("Successfully saved item with ID: {}", item_id))
}
pub async fn get_item(&self, args: Option<Value>) -> Result<String, ToolError> {
let args =
args.ok_or_else(|| ToolError::InvalidArguments("Missing arguments".to_string()))?;
let item_id = args.get("id").and_then(|v| v.as_i64()).ok_or_else(|| {
ToolError::InvalidArguments("Missing or invalid 'id' field".to_string())
})?;
let service = AsyncItemService::new(
self.state.data_dir.clone(),
self.state.db.clone(),
self.state.item_service.clone(),
self.state.cmd.clone(),
self.state.settings.clone(),
);
let item_with_content = match service.get_item_content(item_id).await {
Ok(iwc) => iwc,
Err(CoreError::ItemNotFound(_)) => {
return Err(ToolError::InvalidArguments(format!(
"Item {} not found",
item_id
)));
}
Err(e) => return Err(ToolError::Other(anyhow::Error::from(e))),
};
let content = String::from_utf8_lossy(&item_with_content.content).to_string();
let tags: Vec<String> = item_with_content
.item_with_meta
.tags
.iter()
.map(|t| t.name.clone())
.collect();
let metadata = item_with_content.item_with_meta.meta_as_map();
let item = item_with_content.item_with_meta.item;
let response = serde_json::json!({
"id": item_id,
"content": content,
"timestamp": item.ts.to_rfc3339(),
"size": item.size,
"compression": item.compression,
"tags": tags,
"metadata": metadata,
});
Ok(serde_json::to_string_pretty(&response)?)
}
pub async fn get_latest_item(&self, args: Option<Value>) -> Result<String, ToolError> {
let tags: Vec<String> = args
.as_ref()
.and_then(|v| v.get("tags"))
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
let service = AsyncItemService::new(
self.state.data_dir.clone(),
self.state.db.clone(),
self.state.item_service.clone(),
self.state.cmd.clone(),
self.state.settings.clone(),
);
let item_with_meta = match service.find_item(vec![], tags, HashMap::new()).await {
Ok(iwm) => iwm,
Err(CoreError::ItemNotFoundGeneric) => {
return Err(ToolError::InvalidArguments("No items found".to_string()));
}
Err(e) => return Err(ToolError::Other(anyhow::Error::from(e))),
};
let item_id = item_with_meta
.item
.id
.ok_or_else(|| anyhow!("Item missing ID after find"))?;
let item_with_content = service
.get_item_content(item_id)
.await
.map_err(|e| ToolError::Other(anyhow::Error::from(e)))?;
let content = String::from_utf8_lossy(&item_with_content.content).to_string();
let tags: Vec<String> = item_with_content
.item_with_meta
.tags
.iter()
.map(|t| t.name.clone())
.collect();
let metadata = item_with_content.item_with_meta.meta_as_map();
let item = item_with_content.item_with_meta.item;
let response = serde_json::json!({
"id": item_id,
"content": content,
"timestamp": item.ts.to_rfc3339(),
"size": item.size,
"compression": item.compression,
"tags": tags,
"metadata": metadata,
});
Ok(serde_json::to_string_pretty(&response)?)
}
pub async fn list_items(&self, args: Option<Value>) -> Result<String, ToolError> {
let args_ref = args.as_ref();
let tags: Vec<String> = args_ref
.and_then(|v| v.get("tags"))
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
let limit = args_ref
.and_then(|v| v.get("limit"))
.and_then(|v| v.as_u64())
.unwrap_or(10) as usize;
let offset = args_ref
.and_then(|v| v.get("offset"))
.and_then(|v| v.as_u64())
.unwrap_or(0) as usize;
let service = AsyncItemService::new(
self.state.data_dir.clone(),
self.state.db.clone(),
self.state.item_service.clone(),
self.state.cmd.clone(),
self.state.settings.clone(),
);
let mut items_with_meta = service
.list_items(tags, HashMap::new())
.await
.map_err(|e| ToolError::Other(anyhow::Error::from(e)))?;
// Sort by timestamp (newest first) and apply pagination
items_with_meta.sort_by(|a, b| b.item.ts.cmp(&a.item.ts));
let items_with_meta: Vec<_> = items_with_meta
.into_iter()
.skip(offset)
.take(limit)
.collect();
let items_info: Vec<_> = items_with_meta
.into_iter()
.map(|item_with_meta| {
let item_tags: Vec<String> =
item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let item_meta = item_with_meta.meta_as_map();
let item = item_with_meta.item;
let item_id = item.id.unwrap_or(0);
serde_json::json!({
"id": item_id,
"timestamp": item.ts.to_rfc3339(),
"size": item.size,
"compression": item.compression,
"tags": item_tags,
"metadata": item_meta
})
})
.collect();
let response = serde_json::json!({
"items": items_info,
"count": items_info.len(),
"offset": offset,
"limit": limit
});
Ok(serde_json::to_string_pretty(&response)?)
}
pub async fn search_items(&self, args: Option<Value>) -> Result<String, ToolError> {
let tags: Vec<String> = args
.as_ref()
.and_then(|v| v.get("tags"))
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
let metadata: HashMap<String, String> = args
.as_ref()
.and_then(|v| v.get("metadata"))
.and_then(|v| v.as_object())
.map(|obj| {
obj.iter()
.filter_map(|(k, v)| v.as_str().map(|s| (k.clone(), s.to_string())))
.collect()
})
.unwrap_or_default();
let service = AsyncItemService::new(
self.state.data_dir.clone(),
self.state.db.clone(),
self.state.item_service.clone(),
self.state.cmd.clone(),
self.state.settings.clone(),
);
let mut items_with_meta = service
.list_items(tags.clone(), metadata.clone())
.await
.map_err(|e| ToolError::Other(anyhow::Error::from(e)))?;
// Sort by timestamp (newest first)
items_with_meta.sort_by(|a, b| b.item.ts.cmp(&a.item.ts));
let items_info: Vec<_> = items_with_meta
.into_iter()
.map(|item_with_meta| {
let item_tags: Vec<String> =
item_with_meta.tags.iter().map(|t| t.name.clone()).collect();
let item_meta = item_with_meta.meta_as_map();
let item = item_with_meta.item;
let item_id = item.id.unwrap_or(0);
serde_json::json!({
"id": item_id,
"timestamp": item.ts.to_rfc3339(),
"size": item.size,
"compression": item.compression,
"tags": item_tags,
"metadata": item_meta
})
})
.collect();
let response = serde_json::json!({
"items": items_info,
"count": items_info.len(),
"search_criteria": {
"tags": tags,
"metadata": metadata
}
});
Ok(serde_json::to_string_pretty(&response)?)
}
}

View File

@@ -1,7 +1,10 @@
use crate::config;
use crate::services::item_service::ItemService;
use anyhow::Result;
use axum::{Router, routing::post};
use axum::Router;
use axum::http::{HeaderValue, header};
use axum::middleware::Next;
use axum::response::Response;
use clap::Command;
use log::{debug, info};
use std::net::SocketAddr;
@@ -13,13 +16,28 @@ use tower_http::cors::CorsLayer;
use tower_http::trace::TraceLayer;
mod api;
pub mod auth;
pub mod common;
#[cfg(feature = "mcp")]
mod mcp;
mod pages;
pub use common::{AppState, create_auth_middleware, logging_middleware};
/// Adds security headers to all responses.
async fn security_headers(req: axum::extract::Request, next: Next) -> Response {
let mut response = next.run(req).await;
let headers = response.headers_mut();
headers.insert(
header::X_CONTENT_TYPE_OPTIONS,
HeaderValue::from_static("nosniff"),
);
headers.insert(header::X_FRAME_OPTIONS, HeaderValue::from_static("DENY"));
headers.insert(
header::REFERRER_POLICY,
HeaderValue::from_static("strict-origin-when-cross-origin"),
);
response
}
pub fn mode_server(
cmd: &mut Command,
settings: &config::Settings,
@@ -50,10 +68,13 @@ pub fn mode_server(
let server_config = common::ServerConfig {
address: server_address,
port: Some(server_port),
username: settings.server_username(),
password: settings.server_password(),
password_hash: settings.server_password_hash(),
jwt_secret: settings.server_jwt_secret(),
cert_file: settings.server_cert_file(),
key_file: settings.server_key_file(),
cors_origin: settings.server_cors_origin(),
};
// Create ItemService once
@@ -103,48 +124,73 @@ async fn run_server(
settings: Arc::new(settings.clone()),
};
#[cfg(feature = "mcp")]
let mcp_router = Router::new()
.route("/mcp", post(mcp::handle_mcp_request))
.with_state(state.clone());
#[cfg_attr(not(feature = "mcp"), allow(unused_mut))]
let mut protected_router = Router::new()
let protected_router = Router::new()
.merge(api::add_routes(Router::new()))
.merge(pages::add_routes(Router::new()));
.merge(pages::add_routes(Router::new()))
.layer(axum::middleware::from_fn(create_auth_middleware(
config.username.clone(),
config.password.clone(),
config.password_hash.clone(),
config.jwt_secret.clone(),
)));
#[cfg(feature = "mcp")]
{
protected_router = protected_router.merge(mcp_router);
}
let protected_router = protected_router.layer(axum::middleware::from_fn(
create_auth_middleware(config.password.clone(), config.password_hash.clone()),
));
// Build CORS layer - restricted by default, configurable via cors_origin setting
let cors_origin = config.cors_origin.as_deref().unwrap_or("http://localhost");
let cors_layer = if cors_origin == "*" {
CorsLayer::permissive()
} else {
CorsLayer::new()
.allow_origin(
cors_origin
.parse::<axum::http::HeaderValue>()
.unwrap_or_else(|_| {
log::warn!(
"Invalid CORS origin '{cors_origin}', defaulting to http://localhost"
);
"http://localhost".parse().unwrap()
}),
)
.allow_methods([
axum::http::Method::GET,
axum::http::Method::POST,
axum::http::Method::PUT,
axum::http::Method::DELETE,
])
.allow_headers([header::CONTENT_TYPE, header::AUTHORIZATION, header::ACCEPT])
};
// Create the app with documentation routes open and others protected
let app = Router::new()
// Add documentation routes without authentication
.merge(api::add_docs_routes(Router::new()))
// Add API, pages, and MCP routes with authentication
// Add API and pages routes with authentication
.merge(protected_router)
// Apply state to all routes
.with_state(state)
// Add other middleware layers to all routes
.layer(axum::middleware::from_fn(security_headers))
.layer(axum::middleware::from_fn(logging_middleware))
.layer(
ServiceBuilder::new()
.layer(TraceLayer::new_for_http())
.layer(CorsLayer::permissive()),
.layer(cors_layer),
);
let addr: SocketAddr = bind_address.parse()?;
// Warn if authentication is enabled without TLS
if (config.password.is_some() || config.password_hash.is_some() || config.jwt_secret.is_some())
&& (config.cert_file.is_none() || config.key_file.is_none())
{
log::warn!(
"SECURITY: Authentication enabled but TLS is not configured. Credentials will be transmitted in plain text!"
);
}
// Build the app into a service
let service = app.into_make_service_with_connect_info::<SocketAddr>();
// Use TLS if both cert and key files are provided
#[cfg(feature = "tls")]
if let (Some(cert_file), Some(key_file)) = (&config.cert_file, &config.key_file) {
info!("SERVER: HTTPS server listening on {addr}");

View File

@@ -6,11 +6,24 @@ use axum::{
extract::{Path, Query, State},
response::{Html, Response},
};
use html_escape::{encode_double_quoted_attribute, encode_text};
use log::debug;
use rusqlite::Connection;
use serde::Deserialize;
use std::collections::HashMap;
/// Escape text content for safe HTML insertion.
#[inline]
fn esc(s: &str) -> String {
encode_text(s).to_string()
}
/// Escape attribute values for safe HTML attribute insertion.
#[inline]
fn esc_attr(s: &str) -> String {
encode_double_quoted_attribute(s).to_string()
}
#[derive(Deserialize)]
/// Query parameters for the item list endpoint.
///
@@ -62,7 +75,7 @@ fn default_count() -> usize {
///
/// # Examples
///
/// ```
/// ```ignore
/// let app = pages::add_routes(axum::Router::new());
/// ```
pub fn add_routes(app: axum::Router<AppState>) -> axum::Router<AppState> {
@@ -90,7 +103,9 @@ async fn list_items(
.map_err(|_| Html("<html><body>Internal Server Error</body></html>".to_string()))?;
Ok(response)
}
Err(e) => Err(Html(format!("<html><body>Error: {e}</body></html>"))),
Err(_e) => Err(Html(
"<html><body>An internal error occurred</body></html>".to_string(),
)),
}
}
@@ -121,7 +136,8 @@ fn build_item_list(
// Apply pagination
let start = params.start;
let end = std::cmp::min(start + params.count, sorted_items.len());
let count = params.count.min(10000);
let end = std::cmp::min(start + count, sorted_items.len());
let page_items = if start < sorted_items.len() {
sorted_items[start..std::cmp::min(end, sorted_items.len())].to_vec()
} else {
@@ -153,14 +169,14 @@ fn build_item_list(
// Collect all tags from all items, keeping track of their timestamps
let mut all_tags_with_time: Vec<(String, chrono::DateTime<chrono::Utc>)> = Vec::new();
for item in &sorted_items {
if let Some(item_id) = item.id {
if let Some(tags) = tags_map.get(&item_id) {
if let Some(item_id) = item.id
&& let Some(tags) = tags_map.get(&item_id)
{
for tag in tags {
all_tags_with_time.push((tag.name.clone(), item.ts));
}
}
}
}
// Sort by timestamp descending (most recent first)
all_tags_with_time.sort_by(|a, b| b.1.cmp(&a.1));
@@ -184,7 +200,9 @@ fn build_item_list(
html.push_str("<p>");
for tag in recent_tags {
html.push_str(&format!(
"<a href=\"/?tags={tag}\" style=\"margin-right: 8px;\">{tag}</a>"
"<a href=\"/?tags={}\" style=\"margin-right: 8px;\">{}</a>",
esc_attr(&tag),
esc(&tag)
));
}
html.push_str("</p>");
@@ -196,7 +214,7 @@ fn build_item_list(
// Table headers
html.push_str("<tr>");
for column in columns {
html.push_str(&format!("<th>{}</th>", column.label));
html.push_str(&format!("<th>{}</th>", esc(&column.label)));
}
html.push_str("<th>Actions</th>");
html.push_str("</tr>");
@@ -224,12 +242,21 @@ fn build_item_list(
format!("<a href=\"/item/{item_id}\">{id_value}</a>")
}
"time" => item.ts.format("%Y-%m-%d %H:%M:%S").to_string(),
"size" => item.size.map(|s| s.to_string()).unwrap_or_default(),
"size" => item
.uncompressed_size
.map(|s| s.to_string())
.unwrap_or_default(),
"tags" => {
// Make sure we're using all tags for the item
let tag_links: Vec<String> = tags
.iter()
.map(|t| format!("<a href=\"/?tags={}\">{}</a>", t.name, t.name))
.map(|t| {
format!(
"<a href=\"/?tags={}\">{}</a>",
esc_attr(&t.name),
esc(&t.name)
)
})
.collect();
tag_links.join(", ")
}
@@ -268,7 +295,15 @@ fn build_item_list(
crate::config::ColumnAlignment::Center => "text-align: center;",
};
html.push_str(&format!("<td style=\"{align_style}\">{display_value}</td>"));
let rendered_value = if column.name == "tags" {
display_value // Already contains escaped HTML links
} else {
esc(&display_value)
};
html.push_str(&format!(
"<td style=\"{align_style}\">{rendered_value}</td>"
));
}
// Actions column
@@ -361,7 +396,9 @@ async fn show_item(
.map_err(|_| Html("<html><body>Internal Server Error</body></html>".to_string()))?;
Ok(response)
}
Err(e) => Err(Html(format!("<html><body>Error: {e}</body></html>"))),
Err(_e) => Err(Html(
"<html><body>An internal error occurred</body></html>".to_string(),
)),
}
}
@@ -392,11 +429,11 @@ fn build_item_details(conn: &Connection, id: i64) -> Result<String> {
));
html.push_str(&format!(
"<tr><th>Size</th><td>{}</td></tr>",
item.size.unwrap_or(0)
item.uncompressed_size.unwrap_or(0)
));
html.push_str(&format!(
"<tr><th>Compression</th><td>{}</td></tr>",
item.compression
esc(&item.compression)
));
// Tags row
@@ -406,7 +443,13 @@ fn build_item_details(conn: &Connection, id: i64) -> Result<String> {
} else {
let tag_links: Vec<String> = tags
.iter()
.map(|t| format!("<a href=\"/?tags={}\">{}</a>", t.name, t.name))
.map(|t| {
format!(
"<a href=\"/?tags={}\">{}</a>",
esc_attr(&t.name),
esc(&t.name)
)
})
.collect();
html.push_str(&tag_links.join(", "));
}
@@ -419,7 +462,8 @@ fn build_item_details(conn: &Connection, id: i64) -> Result<String> {
for meta in metas {
html.push_str(&format!(
"<tr><th>{}</th><td>{}</td></tr>",
meta.name, meta.value
esc(&meta.name),
esc(&meta.value)
));
}
}

View File

@@ -10,26 +10,11 @@ use comfy_table::{Attribute, Cell, Table};
use serde_json;
use serde_yaml;
use crate::common::status::PathInfo;
use crate::meta_plugin::MetaPluginType;
use crate::meta_plugin::get_meta_plugin;
fn build_path_table(path_info: &PathInfo) -> Table {
let mut path_table = crate::modes::common::create_table(true);
path_table.set_header(vec![
Cell::new("Type").add_attribute(Attribute::Bold),
Cell::new("Path").add_attribute(Attribute::Bold),
]);
path_table.add_row(vec!["Data", &path_info.data]);
path_table.add_row(vec!["Database", &path_info.database]);
path_table
}
fn build_config_table(settings: &config::Settings) -> Table {
let mut config_table = crate::modes::common::create_table(true);
let mut config_table = crate::modes::common::create_table_with_config(&settings.table_config);
config_table.set_header(vec![
Cell::new("Setting").add_attribute(Attribute::Bold),
@@ -52,7 +37,10 @@ fn build_config_table(settings: &config::Settings) -> Table {
config_table
}
fn build_meta_plugins_configured_table(status_info: &StatusInfo) -> Option<Table> {
fn build_meta_plugins_configured_table(
status_info: &StatusInfo,
table_config: &config::TableConfig,
) -> Option<Table> {
let meta_plugins = status_info.configured_meta_plugins.as_ref()?;
if meta_plugins.is_empty() {
return None;
@@ -62,7 +50,7 @@ fn build_meta_plugins_configured_table(status_info: &StatusInfo) -> Option<Table
let mut sorted_meta_plugins = meta_plugins.clone();
sorted_meta_plugins.sort_by(|a, b| a.name.cmp(&b.name));
let mut table = crate::modes::common::create_table(true);
let mut table = crate::modes::common::create_table_with_config(table_config);
table.set_header(vec![
Cell::new("Plugin Name").add_attribute(Attribute::Bold),
@@ -78,7 +66,9 @@ fn build_meta_plugins_configured_table(status_info: &StatusInfo) -> Option<Table
};
// First, create a default plugin to get its default options
let default_plugin = get_meta_plugin(meta_plugin_type.clone(), None, None);
let Ok(default_plugin) = get_meta_plugin(meta_plugin_type.clone(), None, None) else {
continue;
};
// Start with the default options
let mut effective_options = default_plugin.options().clone();
@@ -96,14 +86,18 @@ fn build_meta_plugins_configured_table(status_info: &StatusInfo) -> Option<Table
.collect();
// Create the actual plugin with merged options - the constructor will handle setting up outputs
let actual_plugin = get_meta_plugin(
let Ok(actual_plugin) = get_meta_plugin(
meta_plugin_type.clone(),
Some(effective_options.clone()),
Some(outputs_converted),
);
) else {
continue;
};
// Get the default plugin to see its default options
let default_plugin = get_meta_plugin(meta_plugin_type.clone(), None, None);
let Ok(default_plugin) = get_meta_plugin(meta_plugin_type.clone(), None, None) else {
continue;
};
// Start with the default options
let mut all_options = default_plugin.options().clone();
@@ -192,7 +186,7 @@ pub fn mode_status(
let status_service = crate::services::status_service::StatusService::new();
let output_format = crate::modes::common::settings_output_format(settings);
debug!("STATUS: About to generate status info");
let status_info = status_service.generate_status(cmd, settings, data_path, db_path);
let status_info = status_service.generate_status(cmd, settings, data_path, db_path)?;
debug!("STATUS: Status info generated successfully");
match output_format {
@@ -206,7 +200,8 @@ pub fn mode_status(
println!();
println!("PATHS:");
let path_table = build_path_table(&status_info.paths);
let path_table =
crate::modes::common::build_path_table(&status_info.paths, &settings.table_config);
println!(
"{}",
crate::modes::common::trim_lines_end(&path_table.trim_fmt())
@@ -214,7 +209,9 @@ pub fn mode_status(
println!();
// Always try to print META PLUGINS CONFIGURED section using status_info
if let Some(meta_plugins_table) = build_meta_plugins_configured_table(&status_info) {
if let Some(meta_plugins_table) =
build_meta_plugins_configured_table(&status_info, &settings.table_config)
{
println!("META PLUGINS CONFIGURED:");
println!(
"{}",
@@ -229,12 +226,11 @@ pub fn mode_status(
Ok(())
}
OutputFormat::Json => {
// Create a subset for status info that includes everything
println!("{}", serde_json::to_string_pretty(&status_info)?);
crate::modes::common::print_serialized(&status_info, &output_format)?;
Ok(())
}
OutputFormat::Yaml => {
println!("{}", serde_yaml::to_string(&status_info)?);
crate::modes::common::print_serialized(&status_info, &output_format)?;
Ok(())
}
}

View File

@@ -60,6 +60,7 @@ use crate::meta_plugin::{MetaPluginType, get_meta_plugin};
fn build_meta_plugin_table(
meta_plugin_info: &std::collections::HashMap<String, MetaPluginInfo>,
table_config: &crate::config::TableConfig,
) -> Table {
// Builds a formatted table displaying meta plugin information.
//
@@ -72,7 +73,7 @@ fn build_meta_plugin_table(
// # Returns
//
// A formatted `comfy_table::Table`.
let mut meta_plugin_table = crate::modes::common::create_table(true);
let mut meta_plugin_table = crate::modes::common::create_table_with_config(table_config);
meta_plugin_table.set_header(vec![
Cell::new("Plugin Name").add_attribute(Attribute::Bold),
@@ -92,7 +93,9 @@ fn build_meta_plugin_table(
};
// Create a default plugin to get its default options
let default_plugin = get_meta_plugin(meta_plugin_type.clone(), None, None);
let Ok(default_plugin) = get_meta_plugin(meta_plugin_type.clone(), None, None) else {
continue;
};
// Get and sort options
let mut options: Vec<_> = default_plugin.options().iter().collect();
@@ -124,7 +127,10 @@ fn build_meta_plugin_table(
meta_plugin_table
}
fn build_compression_table(compression_info: &Vec<CompressionInfo>) -> Table {
fn build_compression_table(
compression_info: &Vec<CompressionInfo>,
table_config: &crate::config::TableConfig,
) -> Table {
// Builds a formatted table displaying compression plugin information.
//
// # Arguments
@@ -134,7 +140,7 @@ fn build_compression_table(compression_info: &Vec<CompressionInfo>) -> Table {
// # Returns
//
// A formatted `comfy_table::Table`.
let mut compression_table = crate::modes::common::create_table(true);
let mut compression_table = crate::modes::common::create_table_with_config(table_config);
compression_table.set_header(vec![
Cell::new("Type").add_attribute(Attribute::Bold),
@@ -165,7 +171,10 @@ fn build_compression_table(compression_info: &Vec<CompressionInfo>) -> Table {
compression_table
}
fn build_filter_plugin_table(filter_plugins: &[crate::common::status::FilterPluginInfo]) -> Table {
fn build_filter_plugin_table(
filter_plugins: &[crate::common::status::FilterPluginInfo],
table_config: &crate::config::TableConfig,
) -> Table {
// Builds a formatted table displaying filter plugin information.
//
// Sorts plugins by name and formats options as YAML sequence.
@@ -177,7 +186,7 @@ fn build_filter_plugin_table(filter_plugins: &[crate::common::status::FilterPlug
// # Returns
//
// A formatted `comfy_table::Table`.
let mut filter_plugin_table = crate::modes::common::create_table(true);
let mut filter_plugin_table = crate::modes::common::create_table_with_config(table_config);
filter_plugin_table.set_header(vec![
Cell::new("Plugin Name").add_attribute(Attribute::Bold),
@@ -296,13 +305,14 @@ pub fn mode_status_plugins(
let status_service = crate::services::status_service::StatusService::new();
let output_format = crate::modes::common::settings_output_format(settings);
debug!("STATUS_PLUGINS: About to generate status info");
let status_info = status_service.generate_status(cmd, settings, data_path, db_path);
let status_info = status_service.generate_status(cmd, settings, data_path, db_path)?;
debug!("STATUS_PLUGINS: Status info generated successfully");
match output_format {
OutputFormat::Table => {
println!("META PLUGINS:");
let meta_table = build_meta_plugin_table(&status_info.meta_plugins);
let meta_table =
build_meta_plugin_table(&status_info.meta_plugins, &settings.table_config);
println!(
"{}",
crate::modes::common::trim_lines_end(&meta_table.trim_fmt())
@@ -310,7 +320,8 @@ pub fn mode_status_plugins(
println!();
println!("COMPRESSION PLUGINS:");
let compression_table = build_compression_table(&status_info.compression);
let compression_table =
build_compression_table(&status_info.compression, &settings.table_config);
println!(
"{}",
crate::modes::common::trim_lines_end(&compression_table.trim_fmt())
@@ -318,7 +329,8 @@ pub fn mode_status_plugins(
println!();
println!("FILTER PLUGINS:");
let filter_table = build_filter_plugin_table(&status_info.filter_plugins);
let filter_table =
build_filter_plugin_table(&status_info.filter_plugins, &settings.table_config);
println!(
"{}",
crate::modes::common::trim_lines_end(&filter_table.trim_fmt())

235
src/modes/update.rs Normal file
View File

@@ -0,0 +1,235 @@
use anyhow::{Context, Result};
use std::io::Read;
use std::path::{Path, PathBuf};
use crate::common::PIPESIZE;
use crate::config;
use crate::db;
use crate::services::compression_service::CompressionService;
use crate::services::meta_service::MetaService;
use clap::Command;
use log::debug;
use rusqlite::Connection;
/// Handles the update mode: modifies tags and metadata for an existing item by ID.
///
/// This function processes a single item ID, updating its metadata based on `--meta`
/// arguments and optionally replacing its tags with positional arguments.
/// If the item's size is not set, it backfills it by streaming through the content file.
///
/// # Arguments
///
/// * `cmd` - Clap command for error handling.
/// * `settings` - Global settings containing metadata and meta plugin config.
/// * `ids` - List containing exactly one item ID.
/// * `conn` - Database connection.
/// * `data_path` - Path to data directory.
///
/// # Returns
///
/// `Result<()>` on success, or an error if the update fails.
pub fn mode_update(
cmd: &mut Command,
settings: &config::Settings,
ids: &mut [i64],
tags: &mut Vec<String>,
conn: &mut Connection,
data_path: PathBuf,
) -> Result<()> {
if ids.len() != 1 {
cmd.error(
clap::error::ErrorKind::InvalidValue,
"--update requires exactly one numeric ID",
)
.exit();
}
let item_id = ids[0];
// Look up the item
let item =
db::get_item(conn, item_id)?.ok_or_else(|| anyhow::anyhow!("Item {item_id} not found"))?;
debug!("UPDATE: Found item {item_id}: {item:?}");
// Parse --meta arguments into set and delete lists
let mut set_meta: Vec<(String, String)> = Vec::new();
let mut delete_keys: Vec<String> = Vec::new();
for (key, value) in &settings.meta {
match value {
Some(v) => set_meta.push((key.clone(), v.clone())),
None => delete_keys.push(key.clone()),
}
}
// Apply metadata changes
for (key, value) in &set_meta {
debug!("UPDATE: Setting meta {key}={value}");
db::store_meta(
conn,
db::Meta {
id: item_id,
name: key.clone(),
value: value.clone(),
},
)?;
}
for key in &delete_keys {
debug!("UPDATE: Deleting meta {key}");
db::query_delete_meta(
conn,
db::Meta {
id: item_id,
name: key.clone(),
value: String::new(),
},
)?;
}
// Replace tags if provided
if !tags.is_empty() {
debug!("UPDATE: Replacing tags with {:?}", tags);
db::set_item_tags(conn, item.clone(), tags)?;
}
// Run meta plugins if --meta-plugin flags are provided
let plugin_names = settings.meta_plugins_names();
if !plugin_names.is_empty() {
debug!("UPDATE: Running meta plugins: {:?}", plugin_names);
run_meta_plugins_on_item(conn, cmd, settings, &data_path, &item, item_id)?;
}
// Backfill size if not set
let mut updated_item = item.clone();
if item.uncompressed_size.is_none() {
debug!("UPDATE: Size not set, backfilling from content file");
if let Some(size) = compute_item_size(&data_path, &item) {
debug!("UPDATE: Computed size: {size}");
updated_item.uncompressed_size = Some(size);
db::update_item(conn, updated_item.clone())?;
}
}
// Backfill compressed_size if not set
if item.compressed_size.is_none() {
let item_path = data_path.join(item_id.to_string());
if let Ok(meta) = std::fs::metadata(&item_path) {
updated_item.compressed_size = Some(meta.len() as i64);
db::update_item(conn, updated_item.clone())?;
}
}
// Print confirmation
if !settings.quiet {
let mut parts = Vec::new();
if !set_meta.is_empty() {
parts.push(format!("set {} metadata", set_meta.len()));
}
if !delete_keys.is_empty() {
parts.push(format!("deleted {} metadata", delete_keys.len()));
}
if !tags.is_empty() {
parts.push(format!("tags: {}", tags.join(" ")));
}
let action = if parts.is_empty() {
"no changes".to_string()
} else {
parts.join(", ")
};
eprintln!("KEEP: Updated item {item_id} ({action})");
}
Ok(())
}
/// Computes the decompressed size of an item by streaming through its content file.
///
/// Reads the compressed file in PIPESIZE chunks and counts total decompressed bytes.
/// Returns None if the file doesn't exist or decompression fails.
fn compute_item_size(data_path: &Path, item: &db::Item) -> Option<i64> {
let item_id = item.id?;
let mut item_path = data_path.to_path_buf();
item_path.push(item_id.to_string());
if !item_path.exists() {
debug!("UPDATE: Content file not found: {item_path:?}");
return None;
}
let compression_service = CompressionService::new();
let mut reader = match compression_service.stream_item_content(item_path, &item.compression) {
Ok(r) => r,
Err(e) => {
debug!("UPDATE: Failed to open content stream: {e}");
return None;
}
};
let mut buffer = [0u8; PIPESIZE];
let mut total_bytes: i64 = 0;
loop {
match reader.read(&mut buffer) {
Ok(0) => break,
Ok(n) => {
total_bytes += n as i64;
}
Err(e) => {
debug!("UPDATE: Error reading content: {e}");
return None;
}
}
}
Some(total_bytes)
}
/// Runs meta plugins on an existing item's content and stores the results.
fn run_meta_plugins_on_item(
conn: &mut Connection,
cmd: &mut Command,
settings: &config::Settings,
data_path: &Path,
item: &db::Item,
item_id: i64,
) -> Result<()> {
let mut item_path = data_path.to_path_buf();
item_path.push(item_id.to_string());
if !item_path.exists() {
debug!("UPDATE: Content file not found: {item_path:?}");
return Ok(());
}
// Collect metadata in memory
let (meta_service, collected_meta) = MetaService::with_collector();
let mut plugins = meta_service.get_plugins(cmd, settings);
if plugins.is_empty() {
return Ok(());
}
let compression_service = CompressionService::new();
let mut reader = compression_service.stream_item_content(item_path, &item.compression)?;
meta_service.initialize_plugins(&mut plugins);
crate::common::stream_copy(&mut reader, |chunk| {
meta_service.process_chunk(&mut plugins, chunk);
Ok(())
})?;
meta_service.finalize_plugins(&mut plugins);
// Write collected plugin metadata to DB
if let Ok(entries) = collected_meta.lock() {
for (name, value) in entries.iter() {
db::add_meta(conn, item_id, name, value)?;
}
}
Ok(())
}

View File

@@ -1,30 +0,0 @@
WHITESPACE = _{ " " | "\t" | "\n" | "\r" }
filters = { filter ~ ("," ~ filters)? }
filter = { filter_name ~ ("(" ~ options ~ ")")? }
filter_name = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
options = { option ~ ("," ~ options)? }
option = { (option_name ~ "=")? ~ option_value }
option_name = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
option_value = {
JSON_NUMBER |
JSON_STRING |
JSON_BOOLEAN
}
JSON_NUMBER = @{
("-")? ~
("0" | ASCII_NONZERO_DIGIT ~ ASCII_DIGIT*) ~
("." ~ ASCII_DIGIT*)? ~
(("e" | "E") ~ ("+" | "-")? ~ ASCII_DIGIT+)?
}
JSON_STRING = ${
"\"" ~
(("\\" ~ ANY) | (!("\"" | "\\") ~ ANY))* ~
"\""
}
JSON_BOOLEAN = ${ "true" | "false" }

View File

@@ -1,119 +0,0 @@
use pest::Parser;
use pest_derive::Parser;
use std::collections::HashMap;
use serde_json;
#[derive(Parser)]
#[grammar = "filter.pest"]
pub struct FilterParser;
#[derive(Debug)]
pub struct Filter {
pub name: String,
pub options: HashMap<String, serde_json::Value>,
}
pub fn parse_filter_string(input: &str) -> Result<Vec<Filter>, Box<dyn std::error::Error>> {
let mut filters = Vec::new();
let pairs = FilterParser::parse(<FilterParser as pest::Parser>::Rule::filters, input)?;
for pair in pairs {
if pair.as_rule() == <FilterParser as pest::Parser>::Rule::filter {
let mut name = String::new();
let mut options = HashMap::new();
for inner_pair in pair.into_inner() {
match inner_pair.as_rule() {
<FilterParser as pest::Parser>::Rule::filter_name => {
name = inner_pair.as_str().to_string();
}
<FilterParser as pest::Parser>::Rule::options => {
for option_pair in inner_pair.into_inner() {
if option_pair.as_rule() == <FilterParser as pest::Parser>::Rule::option {
let mut option_name = None;
let mut option_value = None;
for option_inner in option_pair.into_inner() {
match option_inner.as_rule() {
<FilterParser as pest::Parser>::Rule::option_name => {
option_name = Some(option_inner.as_str().to_string());
}
<FilterParser as pest::Parser>::Rule::option_value => {
option_value = Some(parse_option_value(option_inner.as_str())?);
}
_ => {}
}
}
if let Some(value) = option_value {
// If no name is provided, use the filter name as the key
let key = option_name.unwrap_or_else(|| name.clone());
options.insert(key, value);
}
}
}
}
_ => {}
}
}
filters.push(Filter { name, options });
}
}
Ok(filters)
}
fn parse_option_value(input: &str) -> Result<serde_json::Value, Box<dyn std::error::Error>> {
serde_json::from_str(input).map_err(|e| Box::new(e) as Box<dyn std::error::Error>)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_simple_filter() {
let result = parse_filter_string("grep").unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].name, "grep");
assert!(result[0].options.is_empty());
}
#[test]
fn test_parse_filter_with_options() {
let result = parse_filter_string("head_lines(10)").unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].name, "head_lines");
assert_eq!(result[0].options.len(), 1);
if let serde_json::Value::Number(n) = result[0].options.get("head_lines").unwrap() {
assert_eq!(n.as_i64(), Some(10));
} else {
panic!("Expected number");
}
}
#[test]
fn test_parse_filter_with_named_options() {
let result = parse_filter_string(r#"grep(pattern="error")"#).unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].name, "grep");
assert_eq!(result[0].options.get("pattern").unwrap().as_str(), Some("error"));
}
#[test]
fn test_parse_multiple_filters() {
let result = parse_filter_string(r#"head_lines(10),grep(pattern="error")"#).unwrap();
assert_eq!(result.len(), 2);
assert_eq!(result[0].name, "head_lines");
assert_eq!(result[0].options.len(), 1);
if let serde_json::Value::Number(n) = result[0].options.get("head_lines").unwrap() {
assert_eq!(n.as_i64(), Some(10));
} else {
panic!("Expected number");
}
assert_eq!(result[1].name, "grep");
assert_eq!(result[1].options.len(), 1);
assert_eq!(result[1].options.get("pattern").unwrap().as_str(), Some("error"));
}
}

View File

@@ -1,15 +0,0 @@
/// Parsing utilities for filters and other inputs.
///
/// This module provides tools for parsing filter strings and other structured
/// inputs used throughout the application. Currently, it includes a pest-based
/// parser for filter expressions.
///
/// # Examples
///
/// ```
/// use keep::parser::parse_filter_string;
/// let filters = parse_filter_string("head:5|grep:hello").unwrap();
/// ```
pub mod filter_parser;
pub use filter_parser::{FilterParser, parse_filter_string};

View File

@@ -1,300 +0,0 @@
use crate::common::status::StatusInfo;
use crate::config::Settings;
use crate::db::Item;
use crate::db::Meta;
use crate::services::data_service::DataService;
use crate::services::error::CoreError;
use crate::services::types::{ItemWithContent, ItemWithMeta};
use clap::Command;
use futures::Stream;
use rusqlite::Connection;
use std::collections::HashMap;
use std::io::Read;
use std::path::{Path, PathBuf};
use std::pin::Pin;
use std::sync::Arc;
use tokio::sync::Mutex;
pub struct AsyncDataService {
data_path: PathBuf,
settings: Arc<Settings>,
db: Arc<Mutex<Connection>>,
sync_service: crate::services::SyncDataService,
}
impl AsyncDataService {
pub fn new(data_path: PathBuf, settings: Arc<Settings>, db: Arc<Mutex<Connection>>) -> Self {
let sync_service =
crate::services::SyncDataService::new(data_path.clone(), settings.as_ref().clone());
Self {
data_path,
settings,
db,
sync_service,
}
}
pub fn data_path(&self) -> &PathBuf {
&self.data_path
}
pub fn settings(&self) -> Arc<Settings> {
self.settings.clone()
}
pub fn db(&self) -> Arc<Mutex<Connection>> {
self.db.clone()
}
pub async fn get_item(&self, id: i64) -> Result<ItemWithMeta, CoreError> {
let mut conn = self.db.lock().await;
self.get(&mut conn, id)
}
pub async fn add_item_meta(
&self,
item_id: i64,
name: &str,
value: &str,
) -> Result<(), CoreError> {
let conn = self.db.lock().await;
crate::db::add_meta(&conn, item_id, name, value)?;
Ok(())
}
pub async fn list_items(
&self,
tags: Vec<String>,
meta: HashMap<String, String>,
) -> Result<Vec<ItemWithMeta>, CoreError> {
let mut conn = self.db.lock().await;
self.list(&mut conn, tags, meta)
}
pub async fn find_item(
&self,
ids: Vec<i64>,
tags: Vec<String>,
meta: HashMap<String, String>,
) -> Result<ItemWithMeta, CoreError> {
let mut conn = self.db.lock().await;
DataService::find_item(self, &mut conn, ids, tags, meta)
}
pub async fn get_item_content_info(
&self,
id: i64,
_filter: Option<String>,
) -> Result<(Vec<u8>, ItemWithMeta, bool), CoreError> {
let mut conn = self.db.lock().await;
let (mut reader, item_with_meta) = self.get_content(&mut conn, id)?;
let mut content = Vec::new();
reader.read_to_end(&mut content)?;
let is_binary = item_with_meta
.meta
.iter()
.find(|m| m.name == "text")
.map(|m| m.value == "false")
.unwrap_or(false);
Ok((content, item_with_meta, is_binary))
}
pub async fn get_item_content_info_streaming(
&self,
id: i64,
_filter: Option<String>,
) -> Result<
(
Pin<Box<dyn Stream<Item = Result<Vec<u8>, CoreError>> + Send>>,
ItemWithMeta,
bool,
),
CoreError,
> {
let mut conn = self.db.lock().await;
let (reader, item_with_meta) = self.get_content(&mut conn, id)?;
let is_binary = item_with_meta
.meta
.iter()
.find(|m| m.name == "text")
.map(|m| m.value == "false")
.unwrap_or(false);
// Convert reader to stream with optimized buffer reuse
let stream = async_stream::stream! {
let mut reader = reader;
let mut buf = [0u8; 8192];
loop {
match reader.read(&mut buf) {
Ok(0) => break,
Ok(n) => yield Ok(buf[..n].to_vec()),
Err(e) => yield Err(CoreError::from(e)),
}
}
};
Ok((Box::pin(stream), item_with_meta, is_binary))
}
pub async fn stream_item_content_by_id_with_metadata(
&self,
id: i64,
_metadata: &HashMap<String, String>,
_force_text: bool,
offset: u64,
length: u64,
_filter: Option<String>,
) -> Result<
(
Pin<Box<dyn Stream<Item = Result<Vec<u8>, std::io::Error>> + Send>>,
u64,
),
CoreError,
> {
let mut conn = self.db.lock().await;
let (mut reader, _item_with_meta) = self.get_content(&mut conn, id)?;
// Skip bytes for offset
if offset > 0 {
let mut skip_buf = [0u8; 8192];
let mut remaining = offset;
while remaining > 0 {
let to_read = std::cmp::min(8192, remaining as usize);
let n = reader.read(&mut skip_buf[..to_read])?;
if n == 0 {
break;
}
remaining -= n as u64;
}
}
let content_length = if length > 0 { length } else { u64::MAX };
// Optimized stream that reuses a single buffer for reading
let stream = async_stream::stream! {
let mut reader = reader;
let mut remaining = content_length;
let mut buf = [0u8; 8192];
while remaining > 0 {
let to_read = std::cmp::min(8192, remaining as usize);
match reader.read(&mut buf[..to_read]) {
Ok(0) => break,
Ok(n) => {
remaining -= n as u64;
yield Ok(buf[..n].to_vec());
}
Err(e) => {
yield Err(e);
break;
}
}
}
};
Ok((Box::pin(stream), content_length))
}
/// Get raw item content without decompression.
///
/// Reads the stored file bytes directly from disk, bypassing decompression.
/// Used when the client requests raw bytes with `decompress=false`.
pub async fn get_raw_item_content(&self, id: i64) -> Result<Vec<u8>, CoreError> {
let data_path = self.data_path.clone();
tokio::task::spawn_blocking(move || {
let mut item_path = data_path;
item_path.push(id.to_string());
let mut file = std::fs::File::open(&item_path).map_err(|e| {
CoreError::Io(std::io::Error::new(
std::io::ErrorKind::NotFound,
format!("Item file not found: {item_path:?}: {e}"),
))
})?;
let mut content = Vec::new();
file.read_to_end(&mut content)?;
Ok(content)
})
.await
.map_err(|e| CoreError::Other(anyhow::anyhow!("Task join error: {}", e)))?
}
}
impl DataService for AsyncDataService {
type Error = CoreError;
fn save<R: Read>(
&self,
content: R,
cmd: &mut Command,
settings: &Settings,
tags: Vec<String>,
conn: &mut Connection,
) -> Result<Item, Self::Error> {
self.sync_service.save(content, cmd, settings, tags, conn)
}
fn get(&self, conn: &mut Connection, id: i64) -> Result<ItemWithMeta, Self::Error> {
self.sync_service.get(conn, id)
}
fn get_content(
&self,
conn: &mut Connection,
id: i64,
) -> Result<(Box<dyn Read + Send>, ItemWithMeta), Self::Error> {
self.sync_service.get_content(conn, id)
}
fn list(
&self,
conn: &mut Connection,
tags: Vec<String>,
meta: HashMap<String, String>,
) -> Result<Vec<ItemWithMeta>, Self::Error> {
self.sync_service.list(conn, tags, meta)
}
fn delete(&self, conn: &mut Connection, id: i64) -> Result<Item, Self::Error> {
self.sync_service.delete(conn, id)
}
fn find_item(
&self,
conn: &mut Connection,
ids: Vec<i64>,
tags: Vec<String>,
meta: HashMap<String, String>,
) -> Result<ItemWithMeta, Self::Error> {
self.sync_service.find_item(conn, ids, tags, meta)
}
fn get_items(
&self,
conn: &mut Connection,
ids: &[i64],
tags: &[String],
meta: &HashMap<String, String>,
) -> Result<Vec<ItemWithMeta>, Self::Error> {
self.sync_service.get_items(conn, ids, tags, meta)
}
fn generate_status(
&self,
settings: &Settings,
data_path: &Path,
db_path: &Path,
) -> Result<StatusInfo, Self::Error> {
let mut cmd = Command::new("keep");
let status_service = crate::services::StatusService::new();
Ok(status_service.generate_status(
&mut cmd,
settings,
data_path.to_path_buf(),
db_path.to_path_buf(),
))
}
}

View File

@@ -1,402 +0,0 @@
/// Asynchronous service wrapper for `ItemService`.
///
/// Uses `tokio::task::spawn_blocking` to offload synchronous operations (DB/FS)
/// to a blocking thread pool, allowing non-blocking async usage in servers.
use crate::common::PIPESIZE;
use crate::config::Settings;
use crate::services::error::CoreError;
use crate::services::item_service::ItemService;
use crate::services::types::{ItemWithContent, ItemWithMeta};
use clap::Command;
use rusqlite::Connection;
use std::collections::HashMap;
use std::io::Read;
use std::path::PathBuf;
use std::sync::Arc;
use tokio::sync::Mutex;
/// An asynchronous wrapper around the `ItemService` for use in async contexts like the web server.
/// It uses `tokio::task::spawn_blocking` to run synchronous database and filesystem operations
/// on a dedicated thread pool, preventing them from blocking the async runtime.
#[allow(dead_code)]
/// Async wrapper for ItemService operations.
pub struct AsyncItemService {
pub data_dir: PathBuf,
db: Arc<Mutex<Connection>>,
item_service: Arc<ItemService>,
cmd: Arc<Mutex<Command>>,
settings: Arc<Settings>,
}
#[allow(dead_code)]
impl AsyncItemService {
/// Creates a new `AsyncItemService`.
///
/// # Arguments
///
/// * `data_dir` - Path to data directory.
/// * `db` - Arc-wrapped mutex for DB connection.
/// * `item_service` - Arc-wrapped ItemService.
/// * `cmd` - Arc-wrapped mutex for Clap command.
/// * `settings` - Arc-wrapped settings.
///
/// # Returns
///
/// A new `AsyncItemService`.
pub fn new(
data_dir: PathBuf,
db: Arc<Mutex<Connection>>,
item_service: Arc<ItemService>,
cmd: Arc<Mutex<Command>>,
settings: Arc<Settings>,
) -> Self {
Self {
data_dir,
db,
item_service,
cmd,
settings,
}
}
/// Internal helper to execute synchronous operations in a blocking task.
///
/// Spawns a blocking task with the DB connection and ItemService.
///
/// # Type Parameters
///
/// * `F` - Closure type.
/// * `T` - Return type.
///
/// # Arguments
///
/// * `f` - The synchronous closure to execute.
///
/// # Returns
///
/// Result of the closure, or CoreError on task failure.
async fn execute_blocking<F, T>(&self, f: F) -> Result<T, CoreError>
where
F: FnOnce(&Connection, &ItemService) -> Result<T, CoreError> + Send + 'static,
T: Send + 'static,
{
let db = self.db.clone();
let item_service = self.item_service.clone();
tokio::task::spawn_blocking(move || {
let conn = db.blocking_lock();
f(&conn, &item_service)
})
.await
.map_err(|e| CoreError::Other(anyhow::anyhow!("Blocking task failed: {}", e)))?
}
pub async fn get_item(&self, id: i64) -> Result<ItemWithMeta, CoreError> {
self.execute_blocking(move |conn, item_service| item_service.get_item(conn, id))
.await
}
pub async fn get_item_content(&self, id: i64) -> Result<ItemWithContent, CoreError> {
self.execute_blocking(move |conn, item_service| item_service.get_item_content(conn, id))
.await
}
pub async fn get_item_content_info(
&self,
id: i64,
filter: Option<String>,
) -> Result<(Vec<u8>, String, bool), CoreError> {
self.execute_blocking(move |conn, item_service| {
item_service.get_item_content_info(conn, id, filter)
})
.await
}
pub async fn stream_item_content_by_id(
&self,
item_id: i64,
allow_binary: bool,
offset: u64,
length: u64,
) -> Result<
(
std::pin::Pin<
Box<
dyn tokio_stream::Stream<
Item = Result<tokio_util::bytes::Bytes, std::io::Error>,
> + Send,
>,
>,
String,
),
CoreError,
> {
let content = self
.execute_blocking(move |conn, item_service| {
let item_with_content = item_service.get_item_content(conn, item_id)?;
Ok::<_, CoreError>(item_with_content.content)
})
.await?;
// Clone content for use in the binary check closure
let content_clone = content.clone();
// Get metadata to determine MIME type and binary status
let (mime_type, is_binary) = {
let db = self.db.clone();
let item_service = self.item_service.clone();
tokio::task::spawn_blocking(move || {
let conn = db.blocking_lock();
let item_with_meta = item_service.get_item(&conn, item_id)?;
let metadata = item_with_meta.meta_as_map();
let mime_type = metadata
.get("mime_type")
.map(|s| s.to_string())
.unwrap_or_else(|| "application/octet-stream".to_string());
let is_binary = if let Some(text_val) = metadata.get("text") {
text_val == "false"
} else {
crate::common::is_binary::is_binary(&content_clone)
};
Ok::<_, CoreError>((mime_type, is_binary))
})
.await
.unwrap()?
};
// Check if content is binary when allow_binary is false
if !allow_binary && is_binary {
return Err(CoreError::InvalidInput(
"Binary content not allowed".to_string(),
));
}
// Create a stream that reads only the requested portion
let content_len = content.len() as u64;
// Apply offset and length constraints
let start = std::cmp::min(offset, content_len);
let end = if length > 0 {
std::cmp::min(start + length, content_len)
} else {
content_len
};
let stream = if start < content_len {
let chunk =
tokio_util::bytes::Bytes::from(content[start as usize..end as usize].to_vec());
Box::pin(tokio_stream::iter(vec![Ok(chunk)]))
} else {
Box::pin(tokio_stream::iter(vec![]))
};
Ok((stream, mime_type))
}
pub async fn stream_item_content_by_id_with_metadata(
&self,
item_id: i64,
metadata: &HashMap<String, String>,
allow_binary: bool,
offset: u64,
length: u64,
filter: Option<String>,
) -> Result<
(
std::pin::Pin<
Box<
dyn tokio_stream::Stream<
Item = Result<tokio_util::bytes::Bytes, std::io::Error>,
> + Send,
>,
>,
String,
),
CoreError,
> {
// Use provided metadata to determine MIME type and binary status
let mime_type = metadata
.get("mime_type")
.map(|s| s.to_string())
.unwrap_or_else(|| "application/octet-stream".to_string());
// Check if content is binary when allow_binary is false
if !allow_binary {
let is_binary = if let Some(text_val) = metadata.get("text") {
text_val == "false"
} else {
// Get binary status using streaming approach
let (_, _, is_binary) = self.get_item_content_info_streaming(item_id, None).await?;
is_binary
};
if is_binary {
return Err(CoreError::InvalidInput(
"Binary content not allowed".to_string(),
));
}
}
// Get a streaming reader for the content with filtering applied
let reader = {
let db = self.db.clone();
let item_service = self.item_service.clone();
let filter = filter.clone();
tokio::task::spawn_blocking(move || {
let conn = db.blocking_lock();
item_service
.get_item_content_info_streaming(&conn, item_id, filter)
.map(|(reader, _, _)| reader)
})
.await
.map_err(|e| CoreError::Other(anyhow::anyhow!("Blocking task failed: {}", e)))?
};
// Convert the reader into an async stream manually
use tokio_util::bytes::Bytes;
// Create a channel to stream data between the blocking thread and async runtime
let (tx, rx) = tokio::sync::mpsc::channel(1);
// Spawn a blocking task to read from the reader and send chunks
tokio::task::spawn_blocking(move || {
let mut reader = reader;
// Apply offset by reading and discarding bytes
if offset > 0 {
let mut remaining = offset;
let mut buf = [0; PIPESIZE];
while remaining > 0 {
let to_read = std::cmp::min(remaining, buf.len() as u64);
match reader.as_mut().unwrap().read(&mut buf[..to_read as usize]) {
Ok(0) => break, // EOF reached before offset
Ok(n) => remaining -= n as u64,
Err(e) => {
let _ = tx.blocking_send(Err(e));
return;
}
}
}
}
// Read and send data up to the specified length
let mut remaining_length = length;
let mut buffer = [0; PIPESIZE];
loop {
// Determine how much to read in this iteration
let to_read = if length > 0 {
// If length is specified, don't read more than remaining_length
std::cmp::min(remaining_length, buffer.len() as u64) as usize
} else {
buffer.len()
};
if to_read == 0 {
break; // We've read the requested length
}
match reader.as_mut().unwrap().read(&mut buffer[..to_read]) {
Ok(0) => break, // EOF
Ok(n) => {
let chunk = Bytes::copy_from_slice(&buffer[..n]);
// Block on sending to the channel
if tx.blocking_send(Ok(chunk)).is_err() {
break; // Receiver dropped
}
if length > 0 {
remaining_length -= n as u64;
if remaining_length == 0 {
break; // Reached the requested length
}
}
}
Err(e) => {
let _ = tx.blocking_send(Err(e));
break;
}
}
}
});
// Convert the receiver into a stream
let stream = tokio_stream::wrappers::ReceiverStream::new(rx);
Ok((Box::pin(stream), mime_type))
}
pub async fn get_item_content_info_streaming(
&self,
item_id: i64,
filter: Option<String>,
) -> Result<(Box<dyn Read + Send>, String, bool), CoreError> {
self.execute_blocking(move |conn, item_service| {
item_service.get_item_content_info_streaming(conn, item_id, filter)
})
.await
}
pub async fn find_item(
&self,
ids: Vec<i64>,
tags: Vec<String>,
meta: HashMap<String, String>,
) -> Result<ItemWithMeta, CoreError> {
let ids_clone = ids.clone();
let tags_clone = tags.clone();
let meta_clone = meta.clone();
self.execute_blocking(move |conn, item_service| {
item_service.find_item(conn, &ids_clone, &tags_clone, &meta_clone)
})
.await
}
pub async fn list_items(
&self,
tags: Vec<String>,
meta: HashMap<String, String>,
) -> Result<Vec<ItemWithMeta>, CoreError> {
let tags_clone = tags.clone();
let meta_clone = meta.clone();
self.execute_blocking(move |conn, item_service| {
item_service.list_items(conn, &tags_clone, &meta_clone)
})
.await
}
pub async fn delete_item(&self, id: i64) -> Result<(), CoreError> {
let db = self.db.clone();
let item_service = self.item_service.clone();
tokio::task::spawn_blocking(move || {
let mut conn = db.blocking_lock();
item_service.delete_item(&mut conn, id)
})
.await
.unwrap()
}
pub async fn save_item_from_mcp(
&self,
content: Vec<u8>,
tags: Vec<String>,
metadata: HashMap<String, String>,
) -> Result<ItemWithMeta, CoreError> {
let db = self.db.clone();
let item_service = self.item_service.clone();
let cmd = self.cmd.clone();
let settings = self.settings.clone();
tokio::task::spawn_blocking(move || {
let mut conn = db.blocking_lock();
let mut cmd = cmd.blocking_lock();
let settings = settings.as_ref();
item_service
.save_item_from_mcp(&content, &tags, &metadata, &mut cmd, settings, &mut conn)
})
.await
.unwrap()
}
}

View File

@@ -1,33 +1,12 @@
use crate::compression_engine::{CompressionType, get_compression_engine};
use crate::services::error::CoreError;
use anyhow::anyhow;
use std::io::Read;
use std::io::{Read, Write};
use std::path::PathBuf;
use std::str::FromStr;
pub struct CompressionService;
/// Service for handling compression and decompression of item content.
///
/// Provides methods to read compressed item files either fully into memory
/// or as streaming readers. Supports various compression types via engines.
/// This service abstracts the underlying compression engines for consistent access.
///
/// # Examples
///
/// ```ignore
/// let service = CompressionService::new();
/// let content = service.get_item_content(path, "gzip")?;
/// ```
/// Provides methods to read compressed item files either fully into memory
/// or as streaming readers. Supports various compression types via engines.
///
/// # Examples
///
/// ```ignore
/// let service = CompressionService::new();
/// let content = service.get_item_content(path, "gzip")?;
/// ```
impl CompressionService {
/// Creates a new CompressionService instance.
///
@@ -132,6 +111,67 @@ impl CompressionService {
})?;
Ok(reader)
}
pub fn decompressing_reader(
reader: Box<dyn Read>,
compression: &CompressionType,
) -> Result<Box<dyn Read>, CoreError> {
match compression {
CompressionType::GZip => {
use flate2::read::GzDecoder;
Ok(Box::new(GzDecoder::new(reader)))
}
CompressionType::LZ4 => {
use lz4_flex::frame::FrameDecoder;
Ok(Box::new(FrameDecoder::new(reader)))
}
#[cfg(feature = "zstd")]
CompressionType::ZStd => {
use zstd::stream::read::Decoder;
Ok(Box::new(Decoder::new(reader).map_err(|e| {
CoreError::Compression(format!("zstd decoder error: {}", e))
})?))
}
_ => Ok(reader),
}
}
/// Creates a compressing writer wrapping the given writer.
///
/// Returns a boxed writer that compresses on the fly based on the compression type.
/// Useful for compressing data to network streams or pipes.
///
/// # Arguments
///
/// * `writer` - The underlying destination writer.
/// * `compression` - Compression type string (e.g., "gzip", "lz4").
///
/// # Returns
///
/// A boxed compressing writer. Unknown/none types pass through unchanged.
pub fn compressing_writer(
writer: Box<dyn Write>,
compression: &CompressionType,
) -> Result<Box<dyn Write>, CoreError> {
match compression {
CompressionType::GZip => {
use flate2::Compression;
use flate2::write::GzEncoder;
Ok(Box::new(GzEncoder::new(writer, Compression::default())))
}
CompressionType::LZ4 => Ok(Box::new(lz4_flex::frame::FrameEncoder::new(writer))),
#[cfg(feature = "zstd")]
CompressionType::ZStd => {
use zstd::stream::write::Encoder;
Ok(Box::new(
Encoder::new(writer, 3)
.map_err(|e| CoreError::Compression(format!("zstd encoder error: {}", e)))?
.auto_finish(),
))
}
_ => Ok(writer),
}
}
}
impl Default for CompressionService {

Some files were not shown because too many files have changed in this diff Show More