Files
keep/Dockerfile
Andrew Phillips 914190e119 feat: add LLM token counting meta plugin and token filters
Add tiktoken-based token counting via new 'tokens' feature flag.

New components:
- Shared tokenizer module wrapping tiktoken CoreBPE (cl100k_base, o200k_base)
- TokensMetaPlugin: streaming token counter, tokenizes each chunk independently
- head_tokens(N): stream first N tokens, split at exact boundary when mid-chunk
- skip_tokens(N): skip first N tokens, stream the rest
- tail_tokens(N): bounded ring buffer (~16KB), outputs last N tokens at finalize

All filters are fully streaming — no full-stream buffering.
Meta plugin accuracy: exact for normal text, ±1-2 tokens if long whitespace
sequence spans a chunk boundary.

Also: add 'client' and 'tokens' to default features, add curl to Dockerfile builder stage.
2026-03-13 16:48:31 -03:00

68 lines
1.8 KiB
Docker

# Build stage
FROM rust:1.88-slim AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
cmake \
curl \
make \
gcc \
musl-tools \
pkg-config \
&& rm -rf /var/lib/apt/lists/*
RUN rustup target add x86_64-unknown-linux-musl
WORKDIR /app
# Copy manifests and fetch dependencies (cached layer)
COPY Cargo.toml Cargo.lock ./
RUN mkdir src && echo 'fn main() {}' > src/main.rs && echo '' > src/lib.rs
RUN cargo fetch --target x86_64-unknown-linux-musl
# Copy real source and build static binary
# magic feature excluded (requires shared libmagic; fallback uses `file` command)
COPY src/ src/
RUN cargo build --release --target x86_64-unknown-linux-musl \
--no-default-features --features lz4,gzip,server,mcp,swagger,client,tls \
&& strip target/x86_64-unknown-linux-musl/release/keep
# Runtime stage - scratch since binary is fully static
FROM scratch
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/keep /keep
COPY --from=builder /etc/ssl/certs/ /etc/ssl/certs/
EXPOSE 21080
# General options
# ENV KEEP_CONFIG=/config/config.yml
# Mount a volume for persistent storage: -v keep-data:/data
ENV KEEP_DIR=/data
ENV KEEP_LIST_FORMAT="id,time,size,tags,meta:hostname"
# Item options
# ENV KEEP_COMPRESSION=lz4
# ENV KEEP_META_PLUGINS=""
# ENV KEEP_FILTERS=""
# Server options
ENV KEEP_SERVER_ADDRESS=0.0.0.0
ENV KEEP_SERVER_PORT=21080
# ENV KEEP_SERVER_USERNAME="keep"
# ENV KEEP_SERVER_PASSWORD=""
# ENV KEEP_SERVER_PASSWORD_HASH=""
# ENV KEEP_SERVER_JWT_SECRET=""
# ENV KEEP_SERVER_JWT_SECRET_FILE=/config/jwt_secret
# TLS options
# ENV KEEP_SERVER_CERT=/certs/cert.pem
# ENV KEEP_SERVER_KEY=/certs/key.pem
# Client options
# ENV KEEP_CLIENT_URL=""
# ENV KEEP_CLIENT_USERNAME="keep"
# ENV KEEP_CLIENT_PASSWORD=""
# ENV KEEP_CLIENT_JWT=""
ENTRYPOINT ["/keep"]