Andrew Phillips e7d8a83369 feat: add plugin schema system, tokenizer cache, and config validation
- Add plugin schema types and runtime discovery for meta/filter plugins
- Rewrite --generate-config to use schema system instead of hardcoded types
- Add Settings::validate_config() for startup validation
- Cache tokenizer instances via static Lazy to avoid repeated BPE loading
- Add split_by_token_iter() and count_bounded() to Tokenizer
- Fix double-counting bug in TokensMetaPlugin when buffer < max_buffer_size
- Eliminate unnecessary allocations in token count methods
- Refactor token filters: remove Option<Tokenizer>, use iterator API
- Fix TailTokensFilter correctness: unbounded buffer instead of ring buffer
- Add encoding option to all token filters
- Add description() to MetaPlugin and FilterPlugin traits
- Fix unused_mut warning in compression engine (feature-gated code)

Co-Authored-By: code-review-bot <noreply@anthropic.com>
2026-03-13 20:23:17 -03:00
2024-02-26 13:39:34 -04:00
Ugh
2026-02-19 13:57:39 -04:00

Keep

A command-line utility for storing and retrieving temporary data with automatic compression, metadata extraction, and querying. Pipe any output into keep for organized storage — no more losing data in /tmp files with cryptic names.

# Instead of this:
curl -s https://api.example.com/data > /tmp/api-data.json

# Do this:
curl -s https://api.example.com/data | keep --save api-data
keep --get api-data

Table of Contents

Features

  • Store and retrieve — Save content with tags, retrieve by ID or tag
  • Automatic compression — LZ4, GZip, BZip2, XZ, ZStd support
  • Metadata plugins — Auto-extract file type, digests, hostname, user info, and more
  • Filters — Apply transformations (head, tail, grep, strip ANSI) on retrieval
  • Querying — List, search, diff items with flexible formatting
  • Client/server architecture — Optional HTTP server with streaming support
  • MCP support — Model Context Protocol integration for AI assistants
  • Modular design — Extensible plugin system for compression, metadata, and filtering

Installation

From Source

Requires Rust and Cargo.

cargo build --release

Install via Cargo

cargo install --path .

Static Binary (Linux)

./build-static.bash
# Binary at bin/keep

Build with Server/Client Features

# Server only
cargo build --release --features server

# Client only (for connecting to a remote keep server)
cargo build --release --features client

# Server + client + all optional features
cargo build --release --features server,tls,client,swagger,mcp

Quick Start

# Save content with a tag
echo "Hello, world!" | keep --save greeting

# Retrieve by tag
keep --get greeting

# List all stored items
keep --list

# Get item details
keep --info greeting

# Delete by tag
keep --delete greeting

Real-World Examples

# Save API response
curl -s https://api.github.com/repos/user/repo | keep --save repo-info

# Save test output with metadata
npm test 2>&1 | keep --save test-results --meta project=myapp --meta env=staging

# Chain commands: process and store
cat data.csv | sort | uniq | keep --save cleaned-data

# Diff two versions
keep --diff 1 5

# Get first 20 lines of an item
keep --get 1 --filters "head_lines(20)"

# List items from a specific project
keep --list --meta project=myapp

Usage

Save Mode

Save stdin content with tags and metadata.

# Save (auto-assigned ID, no tag)
echo "data" | keep --save

# Save with a tag
echo "data" | keep --save my-tag

# Save with multiple tags and metadata
cat report.pdf | keep --save report --meta project=alpha --meta env=prod

# Specify compression and digest algorithm
echo "data" | keep --save my-tag --compression gzip --digest sha256

Tags and metadata make items easy to find later. Tags are simple identifiers; metadata is key-value pairs.

Get Mode

Retrieve items by ID or tags. This is the default mode when IDs are provided.

# Get by ID
keep --get 1
keep 1

# Get by tag
keep --get my-tag
keep my-tag

# Get with filters applied
keep --get 1 --filters "head_lines(10)"

# Get by metadata filter
keep --get --meta project=alpha

# Force binary output to TTY (override safety check)
keep --get 1 --force

List Mode

List stored items with filtering and formatting.

# List all items
keep --list

# List by tag
keep --list my-tag

# Filter by metadata
keep --list --meta env=prod

# Custom column format
keep --list --list-format "id,time,size,tags"

# JSON output for scripting
keep --list --output-format json

# Human-readable file sizes
keep --list --human-readable

Info Mode

Show detailed information about an item.

keep --info 1
keep --info my-tag
keep --info --meta key=value

Update Mode

Update an item's tags and metadata.

# Replace tags
keep --update 1 new-tag

# Update metadata
keep --update 1 --meta key=newvalue

# Remove a metadata key
keep --update 1 --meta key

Delete Mode

Delete items by ID.

keep --delete 1
keep --delete 1 2 3

Diff Mode

Show differences between two items.

keep --diff 1 2

Status Mode

Show system status and supported features.

keep --status
keep --status-plugins
keep --status --verbose

Filters

Apply transformations to item content during retrieval. Filters are chained with |.

# First 10 lines
keep --get 1 --filters "head_lines(10)"

# Skip first 5 lines, then grep for errors
keep --get 1 --filters "skip_lines(5)|grep(pattern=error)"

# Strip ANSI escape codes
keep --get 1 --filters "strip_ansi"

# Last 100 bytes
keep --get 1 --filters "tail_bytes(100)"

# Complex chain
keep --get 1 --filters "skip_lines(10)|grep(pattern=TODO)|head_lines(5)"

Available Filters

Filter Description Parameters
head_bytes(n) First n bytes count
head_lines(n) First n lines count
tail_bytes(n) Last n bytes count
tail_lines(n) Last n lines count
skip_bytes(n) Skip first n bytes count
skip_lines(n) Skip first n lines count
grep(pattern) Filter matching lines pattern (regex)
strip_ansi Remove ANSI escape codes none

Set KEEP_FILTERS to apply a default filter chain to all retrievals.

Compression

Items are compressed automatically on save. Default: LZ4.

Algorithm Type Speed Ratio
lz4 Internal Fastest Lower
gzip Internal Fast Good
bzip2 External Slow Better
xz External Slowest Best
zstd External Fast Good
none Internal N/A N/A
# Specify compression per item
echo "data" | keep --save my-tag --compression zstd

# Set default via environment
export KEEP_COMPRESSION=gzip

External compression programs (bzip2, xz, zstd) must be installed on the system.

Meta Plugins

Metadata is automatically extracted when saving items.

Plugin Key Description
env * Capture KEEP_META_* environment variables
magic_file file_type File type detection (requires magic feature)
text text_line_count, text_word_count Line and word counts
user uid, user, gid, group Current user info
shell shell Current shell path
shell_pid shell_pid Shell process ID
keep_pid keep_pid Keep process ID
digest digest_sha256, digest_md5 Content digests
read_time read_time Time to read content
read_rate read_rate Data read rate
hostname hostname, hostname_short System hostname
exec Custom Run external commands for metadata
cwd cwd Current working directory
# Use specific plugins
echo "data" | keep --save tag --meta-plugins "digest,text,user"

# Capture custom metadata via environment
KEEP_META_project=alpha echo "data" | keep --save tag

# Combine environment and CLI metadata
KEEP_META_build=1234 echo "data" | keep --save tag --meta env=staging

Configuration

Environment Variables

Variable Description Default
KEEP_DIR Storage directory ~/.keep
KEEP_CONFIG Config file path ~/.config/keep/config.yml
KEEP_COMPRESSION Compression algorithm lz4
KEEP_META_PLUGINS Meta plugins to use env
KEEP_FILTERS Default filter chain none
KEEP_LIST_FORMAT List column format built-in defaults
KEEP_SERVER_ADDRESS Server bind address 127.0.0.1
KEEP_SERVER_PORT Server port 21080
KEEP_SERVER_USERNAME Server Basic auth username keep
KEEP_SERVER_PASSWORD Server password none
KEEP_SERVER_PASSWORD_HASH Server password hash none
KEEP_SERVER_JWT_SECRET JWT secret for token auth none
KEEP_SERVER_JWT_SECRET_FILE Path to JWT secret file none
KEEP_SERVER_CERT TLS certificate file path (PEM) none
KEEP_SERVER_KEY TLS private key file path (PEM) none
KEEP_CLIENT_URL Remote keep server URL none
KEEP_CLIENT_USERNAME Remote server username keep
KEEP_CLIENT_PASSWORD Remote server password none
KEEP_CLIENT_JWT JWT token for remote server none

Any config setting can be overridden with KEEP__<SETTING> environment variables (double underscore separator).

Configuration File

Default location: ~/.config/keep/config.yml

Generate a default configuration:

keep --generate-config > ~/.config/keep/config.yml
# Storage directory
dir: ~/.keep

# List view columns
list_format:
  - name: id
    label: "Item"
    align: right
  - name: time
    label: "Time"
    align: right
  - name: size
    label: "Size"
    align: right
  - name: tags
    label: "Tags"
    align: left

# Table styling
table_config:
  style: utf8_full
  content_arrangement: dynamic

# Default compression
compression_plugin:
  name: gzip

# Default meta plugins
meta_plugins:
  - name: env
  - name: digest
    options:
      algorithm: sha256

# Server settings
server:
  address: "127.0.0.1"
  port: 21080
  username: "keep"
  password: "secret"
  # JWT authentication (takes priority over password)
  # jwt_secret: "my-secret-key"
  # jwt_secret_file: /path/to/jwt_secret
  # TLS (requires tls feature)
  # cert_file: /path/to/cert.pem
  # key_file: /path/to/key.pem

# Client settings
client:
  url: "http://localhost:21080"
  username: "keep"
  password: "secret"
  # Or use JWT token
  # jwt: "eyJhbGciOiJIUzI1NiIs..."

human_readable: true
quiet: false
force: false

Client/Server Mode

Keep supports a client/server architecture where one machine runs a keep server and other machines connect as clients. This is useful for:

  • Centralizing stored data across multiple machines
  • Sharing items between team members
  • Offloading storage to a dedicated server
  • Piping data from long-running processes without local storage

Server Mode

Start an HTTP REST API server:

# Default: 127.0.0.1:21080
keep --server

# Custom address and port
keep --server --server-address 0.0.0.0 --server-port 8080

# With password authentication
keep --server --server-password mypassword

# With custom username
keep --server --server-username admin --server-password mypassword

# With JWT authentication
keep --server --server-jwt-secret my-secret-key

JWT Authentication

JWT (JSON Web Token) authentication provides permission-based access control. When a JWT secret is configured, the server validates tokens and checks permission claims for each request.

Configuration:

# Via CLI flag
keep --server --server-jwt-secret my-secret-key

# Via environment variable
export KEEP_SERVER_JWT_SECRET=my-secret-key
keep --server

# Via config file (config.yml)
server:
  jwt_secret: "my-secret-key"

# Via secret file (for Docker/secrets management)
keep --server --server-jwt-secret-file /path/to/secret

Token format:

JWTs must use HS256 algorithm with the following claims:

Claim Type Required Description
sub string Yes Subject (client identifier)
exp number Yes Expiration time (Unix timestamp)
read boolean No Permission for GET requests (default: false)
write boolean No Permission for POST/PUT requests (default: false)
delete boolean No Permission for DELETE requests (default: false)

Permission mapping:

HTTP Method Required Permission
GET read
POST, PUT, PATCH write
DELETE delete

Example token payload:

{
  "sub": "ci-pipeline",
  "exp": 1735689600,
  "read": true,
  "write": true,
  "delete": false
}

Generating tokens:

The server does not generate tokens — use any JWT library or tool:

# Using jwt-cli (https://github.com/mike-engel/jwt-cli)
jwt encode --secret my-secret-key \
  --exp=$(date -d '+24 hours' +%s) \
  '{"sub":"my-client","read":true,"write":true,"delete":false}'

# Using Python
python3 -c "
import jwt, time
token = jwt.encode({
    'sub': 'my-client',
    'exp': int(time.time()) + 86400,
    'read': True, 'write': True, 'delete': False
}, 'my-secret-key', algorithm='HS256')
print(token)
"

Using tokens:

# With curl
curl -H "Authorization: Bearer <jwt-token>" http://localhost:21080/api/item/

# The keep client uses --client-jwt for JWT tokens
keep --client-url http://server:21080 --client-jwt <jwt-token> --save my-tag

Response codes:

Code Meaning
200 Authorized
401 Missing, invalid, or expired token
403 Valid token but insufficient permissions

Notes:

  • When jwt_secret is set, password authentication is disabled — all requests must present a valid JWT Bearer token
  • JWT and password authentication are mutually exclusive — when both jwt_secret and password are configured, only JWT is used
  • Permission fields default to false if omitted — tokens must explicitly grant permissions
  • JWT authentication requires the server feature (jsonwebtoken is included automatically)

HTTPS / TLS

Build with the tls feature to enable HTTPS:

cargo build --release --features server,tls

Provide a TLS certificate and private key (both PEM format):

# Via CLI flags
keep --server \
  --server-cert /path/to/cert.pem \
  --server-key /path/to/key.pem

# Via environment variables
export KEEP_SERVER_CERT=/path/to/cert.pem
export KEEP_SERVER_KEY=/path/to/key.pem
keep --server

# Via config file (config.yml)
server:
  cert_file: /path/to/cert.pem
  key_file: /path/to/key.pem

When cert and key are provided, the server listens with HTTPS. Without them, it falls back to plain HTTP. The port is controlled by --server-port (default: 21080).

Self-signed certificates (for development):

# Generate a self-signed cert
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem \
  -days 365 -nodes -subj "/CN=localhost"

# Start server with self-signed cert
keep --server --server-cert cert.pem --server-key key.pem

# Connect client with HTTPS
keep --client-url https://localhost:21080 --save my-tag

The server accepts data from both dumb clients (raw HTTP/curl) and smart clients (the keep CLI).

Server Query Parameters

The server supports query parameters that control processing:

Parameter Default Description
tags none Comma-separated tags
metadata none JSON-encoded metadata
compress true false = client already compressed, store as-is
meta true false = client handles metadata, skip server-side plugins
decompress true false = return raw compressed bytes on GET

When using a smart client, these are set automatically. For curl, the server handles everything by default.

Example: Curl as a Dumb Client

# Save (server handles compression and metadata)
curl -X POST -d "my data" http://localhost:21080/api/item/?tags=my-tag

# Retrieve (server decompresses)
curl http://localhost:21080/api/item/1/content

# Save compressed (client handles compression, server skips)
gzip -c data.txt | curl -X POST -d @- "http://localhost:21080/api/item/?compress=false&tags=my-tag"

Client Mode

The keep CLI can connect to a remote server as a smart client. Build with the client feature:

cargo build --release --features client
# Set server URL via flag or environment
keep --client-url http://server:21080 --save my-tag
export KEEP_CLIENT_URL=http://server:21080

# With password authentication
keep --client-url http://server:21080 --client-password mypassword --save my-tag
export KEEP_CLIENT_PASSWORD=mypassword

# With custom username
keep --client-url http://server:21080 --client-username admin --client-password mypassword --save my-tag

# With JWT authentication
keep --client-url http://server:21080 --client-jwt <jwt-token> --save my-tag
export KEEP_CLIENT_JWT=<jwt-token>

How Client Mode Works

Client mode uses local plugins and remote storage:

  1. Save: Local compression and metadata plugins run on the client; compressed data streams to the server
  2. Get: Server sends raw compressed data; client decompresses locally and applies filters
  3. Other operations (list, info, delete, diff): Delegated directly to the server

This means client behavior is consistent with local mode — the same compression settings and filters apply.

Streaming Architecture

Client save uses a 3-thread streaming pipeline for constant memory usage regardless of data size:

┌──────────────┐     OS pipe      ┌────────────────┐
│ Reader thread ├──────────────────┤ Streamer thread│
│              │  (compressed     │                │
│ stdin → tee  │   bytes)         │ pipe → POST    │
│    → hash    │                  │   (chunked)    │
│    → compress│                  │                │
└──────────────┘                  └────────────────┘
        │                                │
        ▼                                ▼
    stdout +                    Server stores blob
    SHA-256 digest
  • Reader thread: Reads stdin, tees output to stdout, computes SHA-256, compresses data, writes to OS pipe
  • Streamer thread: Reads compressed bytes from pipe, streams to server via chunked HTTP POST
  • Main thread: After streaming completes, sends computed metadata (digest, hostname, size) to server

Memory usage is O(PIPESIZE) — typically 64KB — regardless of how much data is being stored.

Example: Remote Pipeline

# On a build server, pipe logs to a central keep server
make build 2>&1 | keep --client-url http://logserver:21080 \
  --save build-logs \
  --meta project=myapp \
  --meta branch=$(git branch --show-current)

# Retrieve from any machine
keep --client-url http://logserver:21080 --get build-logs

# List recent builds from a specific project
keep --client-url http://logserver:21080 --list --meta project=myapp

API Endpoints

Method Path Description
GET /api/status System status
GET /api/plugins/status Plugin status
GET /api/item/ List items (tags, order, start, count params)
POST /api/item/ Create item (body: raw content, params: tags, metadata, compress, meta)
GET /api/item/latest/content Latest item content
GET /api/item/latest/meta Latest item metadata
GET /api/item/{id} Item info by ID
GET /api/item/{id}/content Item content by ID
GET /api/item/{id}/meta Item metadata by ID
GET /api/item/{id}/info Item info by ID
POST /api/item/{id}/meta Add metadata to existing item (body: JSON object)
DELETE /api/item/{id} Delete item by ID
GET /api/diff Diff two items (id_a, id_b params)

Authentication

The server supports three authentication modes:

1. Password (HTTP Basic auth):

# Default username is "keep"
curl -u keep:mypassword http://localhost:21080/api/status

# Custom username
curl -u admin:mypassword http://localhost:21080/api/status

2. JWT (permission-based):

# Valid JWT with read permission allows GET requests
curl -H "Authorization: Bearer <jwt-token>" http://localhost:21080/api/item/

See JWT Authentication for token format and configuration.

3. No authentication:

When neither password nor JWT secret is configured, authentication is disabled.

Swagger UI

Build with the swagger feature to enable OpenAPI documentation:

cargo build --features server,swagger

Swagger UI available at /swagger, OpenAPI spec at /openapi.json.

MCP (Model Context Protocol)

AI assistant integration via the Model Context Protocol. Enable with the mcp feature.

cargo build --features server,mcp

MCP endpoint available at /mcp/sse when the server is running.

Available Tools

Tool Description Parameters
save_item Save new content content, tags[], metadata{}
get_item Get item by ID id
get_latest_item Get latest item tags[]
list_items List items tags[], limit, offset
search_items Search items tags[], metadata{}

Shell Integration

Source profile.bash to enable shell integration:

source /path/to/keep/profile.bash

This provides:

  • keep function — Captures the current command in metadata automatically
  • @ alias — Shorthand for keep --save
  • @@ alias — Shorthand for keep --get
# Save with automatic command capture
curl -s api.example.com | @ api-response

# Quick retrieve
@@ api-response

Feature Flags

Feature Default Description
magic Yes File type detection via libmagic
lz4 Yes LZ4 compression (internal)
gzip Yes GZip compression (internal)
server No HTTP REST API server
tls No HTTPS/TLS server support (requires server)
client No HTTP client for remote server
mcp No Model Context Protocol support
swagger No Swagger UI for API docs
bzip2 No BZip2 compression (external program)
xz No XZ compression (external program)
zstd No ZStd compression (external program)
# Server with Swagger UI
cargo build --features server,swagger

# Server with HTTPS
cargo build --features server,tls

# Client only
cargo build --features client

# Everything
cargo build --features server,tls,client,mcp,swagger,magic

License

MIT License - see LICENSE for details.

Contact

Andrew Phillips - andrew@gt0.ca

Description
No description provided
Readme MIT 13 MiB
Languages
Rust 99.5%
Shell 0.3%
Dockerfile 0.2%