Files
keep/Cargo.toml
Andrew Phillips 914190e119 feat: add LLM token counting meta plugin and token filters
Add tiktoken-based token counting via new 'tokens' feature flag.

New components:
- Shared tokenizer module wrapping tiktoken CoreBPE (cl100k_base, o200k_base)
- TokensMetaPlugin: streaming token counter, tokenizes each chunk independently
- head_tokens(N): stream first N tokens, split at exact boundary when mid-chunk
- skip_tokens(N): skip first N tokens, stream the rest
- tail_tokens(N): bounded ring buffer (~16KB), outputs last N tokens at finalize

All filters are fully streaming — no full-stream buffering.
Meta plugin accuracy: exact for normal text, ±1-2 tokens if long whitespace
sequence spans a chunk boundary.

Also: add 'client' and 'tokens' to default features, add curl to Dockerfile builder stage.
2026-03-13 16:48:31 -03:00

3.5 KiB