feat: add LLM token counting meta plugin and token filters
Add tiktoken-based token counting via new 'tokens' feature flag. New components: - Shared tokenizer module wrapping tiktoken CoreBPE (cl100k_base, o200k_base) - TokensMetaPlugin: streaming token counter, tokenizes each chunk independently - head_tokens(N): stream first N tokens, split at exact boundary when mid-chunk - skip_tokens(N): skip first N tokens, stream the rest - tail_tokens(N): bounded ring buffer (~16KB), outputs last N tokens at finalize All filters are fully streaming — no full-stream buffering. Meta plugin accuracy: exact for normal text, ±1-2 tokens if long whitespace sequence spans a chunk boundary. Also: add 'client' and 'tokens' to default features, add curl to Dockerfile builder stage.
This commit is contained in:
@@ -16,8 +16,9 @@ pub mod read_time;
|
||||
pub mod shell;
|
||||
pub mod shell_pid;
|
||||
pub mod text;
|
||||
#[cfg(feature = "tokens")]
|
||||
pub mod tokens;
|
||||
pub mod user;
|
||||
// pub mod text; // Removed duplicate
|
||||
|
||||
pub use digest::DigestMetaPlugin;
|
||||
pub use exec::MetaPluginExec;
|
||||
@@ -232,6 +233,7 @@ pub enum MetaPluginType {
|
||||
Hostname,
|
||||
Exec,
|
||||
Env,
|
||||
Tokens,
|
||||
}
|
||||
|
||||
/// Central function to handle metadata output with name mapping.
|
||||
|
||||
Reference in New Issue
Block a user