asp/keep - keep - Gitea: Git with a cup of tea

asp/keep

Fork 0

Commit Graph

Author	SHA1	Message	Date
Andrew Phillips	a07bb6b350	feat: plugin-declared parallel execution, switch to env_logger, update deps Parallel execution (opt-in via MetaPlugin::parallel_safe): - Add Send bound to MetaPlugin, parallel_safe() method (default false) - Override to true in digest, tokens, exec, magic_file plugins - MetaService: std::thread::scope for initialize_plugins and process_chunk - Extract plugins via NullMetaPlugin sentinel + std::mem::replace (no unsafe) - Panic tracking: join errors logged, NullMetaPlugin restored and finalized - MetaPluginExec: Box<dyn Write> -> Box<dyn Write + Send> - SendCookie wrapper for libmagic Cookie with unsafe impl Send Logging (stderrlog -> env_logger): - Custom format: [SSSSSS.mmm] LEVEL [module:] message (time-since-start ms) - Default level: Warn (matches previous behavior) - -v: Debug, -vv+: Trace, -q: off - -vv+ shows module path Maintenance: - Bump deps: thiserror 2.0, config 0.15, dns-lookup 3.0, lz4_flex 0.12, ringbuf 0.4, rand 0.9, lazy_static 1.5, env_logger 0.11 - Update Cargo.lock (186 transitive packages) - Clippy fixes: is_multiple_of, to_string_in_format_args, collapsible_if - Fix double-counting bug in TokensMetaPlugin::update - Fix schema description using plugin.description() Co-Authored-By: opencode <noreply@opencode.ai>	2026-03-13 21:49:51 -03:00
Andrew Phillips	e7d8a83369	feat: add plugin schema system, tokenizer cache, and config validation - Add plugin schema types and runtime discovery for meta/filter plugins - Rewrite --generate-config to use schema system instead of hardcoded types - Add Settings::validate_config() for startup validation - Cache tokenizer instances via static Lazy to avoid repeated BPE loading - Add split_by_token_iter() and count_bounded() to Tokenizer - Fix double-counting bug in TokensMetaPlugin when buffer < max_buffer_size - Eliminate unnecessary allocations in token count methods - Refactor token filters: remove Option<Tokenizer>, use iterator API - Fix TailTokensFilter correctness: unbounded buffer instead of ring buffer - Add encoding option to all token filters - Add description() to MetaPlugin and FilterPlugin traits - Fix unused_mut warning in compression engine (feature-gated code) Co-Authored-By: code-review-bot <noreply@anthropic.com>	2026-03-13 20:23:17 -03:00
Andrew Phillips	914190e119	feat: add LLM token counting meta plugin and token filters Add tiktoken-based token counting via new 'tokens' feature flag. New components: - Shared tokenizer module wrapping tiktoken CoreBPE (cl100k_base, o200k_base) - TokensMetaPlugin: streaming token counter, tokenizes each chunk independently - head_tokens(N): stream first N tokens, split at exact boundary when mid-chunk - skip_tokens(N): skip first N tokens, stream the rest - tail_tokens(N): bounded ring buffer (~16KB), outputs last N tokens at finalize All filters are fully streaming — no full-stream buffering. Meta plugin accuracy: exact for normal text, ±1-2 tokens if long whitespace sequence spans a chunk boundary. Also: add 'client' and 'tokens' to default features, add curl to Dockerfile builder stage.	2026-03-13 16:48:31 -03:00

Author

SHA1

Message

Date

Andrew Phillips

a07bb6b350

feat: plugin-declared parallel execution, switch to env_logger, update deps

Parallel execution (opt-in via MetaPlugin::parallel_safe):
- Add Send bound to MetaPlugin, parallel_safe() method (default false)
- Override to true in digest, tokens, exec, magic_file plugins
- MetaService: std::thread::scope for initialize_plugins and process_chunk
- Extract plugins via NullMetaPlugin sentinel + std::mem::replace (no unsafe)
- Panic tracking: join errors logged, NullMetaPlugin restored and finalized
- MetaPluginExec: Box<dyn Write> -> Box<dyn Write + Send>
- SendCookie wrapper for libmagic Cookie with unsafe impl Send

Logging (stderrlog -> env_logger):
- Custom format: [SSSSSS.mmm] LEVEL [module:] message (time-since-start ms)
- Default level: Warn (matches previous behavior)
- -v: Debug, -vv+: Trace, -q: off
- -vv+ shows module path

Maintenance:
- Bump deps: thiserror 2.0, config 0.15, dns-lookup 3.0, lz4_flex 0.12,
  ringbuf 0.4, rand 0.9, lazy_static 1.5, env_logger 0.11
- Update Cargo.lock (186 transitive packages)
- Clippy fixes: is_multiple_of, to_string_in_format_args, collapsible_if
- Fix double-counting bug in TokensMetaPlugin::update
- Fix schema description using plugin.description()

Co-Authored-By: opencode <noreply@opencode.ai>

2026-03-13 21:49:51 -03:00

Andrew Phillips

e7d8a83369

feat: add plugin schema system, tokenizer cache, and config validation

- Add plugin schema types and runtime discovery for meta/filter plugins
- Rewrite --generate-config to use schema system instead of hardcoded types
- Add Settings::validate_config() for startup validation
- Cache tokenizer instances via static Lazy to avoid repeated BPE loading
- Add split_by_token_iter() and count_bounded() to Tokenizer
- Fix double-counting bug in TokensMetaPlugin when buffer < max_buffer_size
- Eliminate unnecessary allocations in token count methods
- Refactor token filters: remove Option<Tokenizer>, use iterator API
- Fix TailTokensFilter correctness: unbounded buffer instead of ring buffer
- Add encoding option to all token filters
- Add description() to MetaPlugin and FilterPlugin traits
- Fix unused_mut warning in compression engine (feature-gated code)

Co-Authored-By: code-review-bot <noreply@anthropic.com>

2026-03-13 20:23:17 -03:00

Andrew Phillips

914190e119

feat: add LLM token counting meta plugin and token filters

Add tiktoken-based token counting via new 'tokens' feature flag.

New components:
- Shared tokenizer module wrapping tiktoken CoreBPE (cl100k_base, o200k_base)
- TokensMetaPlugin: streaming token counter, tokenizes each chunk independently
- head_tokens(N): stream first N tokens, split at exact boundary when mid-chunk
- skip_tokens(N): skip first N tokens, stream the rest
- tail_tokens(N): bounded ring buffer (~16KB), outputs last N tokens at finalize

All filters are fully streaming — no full-stream buffering.
Meta plugin accuracy: exact for normal text, ±1-2 tokens if long whitespace
sequence spans a chunk boundary.

Also: add 'client' and 'tokens' to default features, add curl to Dockerfile builder stage.

2026-03-13 16:48:31 -03:00

3 Commits