feat: add LLM token counting meta plugin and token filters

Add tiktoken-based token counting via new 'tokens' feature flag. New components: - Shared tokenizer module wrapping tiktoken CoreBPE (cl100k_base, o200k_base) - TokensMetaPlugin: streaming token counter, tokenizes each chunk independently - head_tokens(N): stream first N tokens, split at exact boundary when mid-chunk - skip_tokens(N): skip first N tokens, stream the rest - tail_tokens(N): bounded ring buffer (~16KB), outputs last N tokens at finalize All filters are fully streaming — no full-stream buffering. Meta plugin accuracy: exact for normal text, ±1-2 tokens if long whitespace sequence spans a chunk boundary. Also: add 'client' and 'tokens' to default features, add curl to Dockerfile builder stage.
2026-03-13 16:48:31 -03:00
parent e672ec751e
commit 914190e119
9 changed files with 1128 additions and 3 deletions
--- a/src/meta_plugin/mod.rs
+++ b/src/meta_plugin/mod.rs
@@ -16,8 +16,9 @@ pub mod read_time;
 pub mod shell;
 pub mod shell_pid;
 pub mod text;
+#[cfg(feature = "tokens")]
+pub mod tokens;
 pub mod user;
-// pub mod text; // Removed duplicate

 pub use digest::DigestMetaPlugin;
 pub use exec::MetaPluginExec;
@@ -232,6 +233,7 @@ pub enum MetaPluginType {
    Hostname,
    Exec,
    Env,
+    Tokens,
 }

 /// Central function to handle metadata output with name mapping.