- Add SaveMetaFn callback pattern: meta plugins receive a closure instead of
&Connection, enabling the same plugin code to work in local, client, and
server contexts (collect-to-Vec, collect-to-HashMap, or direct DB write)
- Client save now runs meta plugins locally during streaming (smart client
sets meta=false, server skips its own plugins)
- Add POST /api/item/{id}/update endpoint for re-running plugins on stored
content without downloading compressed data
- Add client update mode (--update with --meta-plugin flags)
- Extract shared utilities: stream_copy, print_serialized, build_path_table,
ensure_default_tag to reduce duplication across modes
- Add upsert_tag for idempotent tag addition (INSERT OR IGNORE)
- Add warn logging on save_meta lock failure in BaseMetaPlugin and MetaService
Major overhaul of server architecture and security posture:
- Streaming: Unified all I/O through PIPESIZE (8192-byte) buffers.
POST bodies stream via MpscReader through the save pipeline. GET
content streams from disk via decompression to client. Removed
save_item_with_reader, get_item_content_info, ChannelReader.
413 responses keep partial items (nonfatal by design).
- Security: XSS protection in all HTML pages via html_escape crate.
Security headers middleware (nosniff, frame deny, referrer policy).
CORS tightened to explicit headers. Input validation for tags
(256 chars), metadata (128/4096), pagination (10k cap). Config
file reads use from_utf8_lossy. Generic error messages in HTML.
Diff endpoint has 10 MB per-item cap. max_body_size config option.
- Panics eliminated: Path unwraps → proper error propagation.
Mutex unwraps → map_err (registries) / expect with message (local).
- MCP removed: Deleted all MCP code, rmcp dependency, mcp feature.
- Docs: Updated README, DESIGN, AGENTS to reflect all changes.
- Add plugin schema types and runtime discovery for meta/filter plugins
- Rewrite --generate-config to use schema system instead of hardcoded types
- Add Settings::validate_config() for startup validation
- Cache tokenizer instances via static Lazy to avoid repeated BPE loading
- Add split_by_token_iter() and count_bounded() to Tokenizer
- Fix double-counting bug in TokensMetaPlugin when buffer < max_buffer_size
- Eliminate unnecessary allocations in token count methods
- Refactor token filters: remove Option<Tokenizer>, use iterator API
- Fix TailTokensFilter correctness: unbounded buffer instead of ring buffer
- Add encoding option to all token filters
- Add description() to MetaPlugin and FilterPlugin traits
- Fix unused_mut warning in compression engine (feature-gated code)
Co-Authored-By: code-review-bot <noreply@anthropic.com>
Add tiktoken-based token counting via new 'tokens' feature flag.
New components:
- Shared tokenizer module wrapping tiktoken CoreBPE (cl100k_base, o200k_base)
- TokensMetaPlugin: streaming token counter, tokenizes each chunk independently
- head_tokens(N): stream first N tokens, split at exact boundary when mid-chunk
- skip_tokens(N): skip first N tokens, stream the rest
- tail_tokens(N): bounded ring buffer (~16KB), outputs last N tokens at finalize
All filters are fully streaming — no full-stream buffering.
Meta plugin accuracy: exact for normal text, ±1-2 tokens if long whitespace
sequence spans a chunk boundary.
Also: add 'client' and 'tokens' to default features, add curl to Dockerfile builder stage.