refactor: streaming, security hardening, and MCP removal

Major overhaul of server architecture and security posture:

- Streaming: Unified all I/O through PIPESIZE (8192-byte) buffers.
  POST bodies stream via MpscReader through the save pipeline. GET
  content streams from disk via decompression to client. Removed
  save_item_with_reader, get_item_content_info, ChannelReader.
  413 responses keep partial items (nonfatal by design).

- Security: XSS protection in all HTML pages via html_escape crate.
  Security headers middleware (nosniff, frame deny, referrer policy).
  CORS tightened to explicit headers. Input validation for tags
  (256 chars), metadata (128/4096), pagination (10k cap). Config
  file reads use from_utf8_lossy. Generic error messages in HTML.
  Diff endpoint has 10 MB per-item cap. max_body_size config option.

- Panics eliminated: Path unwraps → proper error propagation.
  Mutex unwraps → map_err (registries) / expect with message (local).

- MCP removed: Deleted all MCP code, rmcp dependency, mcp feature.

- Docs: Updated README, DESIGN, AGENTS to reflect all changes.
This commit is contained in:
2026-03-14 00:03:42 -03:00
parent 560ba6e20c
commit 17be6abaab
51 changed files with 876 additions and 1309 deletions

View File

@@ -33,7 +33,6 @@ keep --get api-data
- [Server Mode](#server-mode)
- [Client Mode](#client-mode)
- [API Endpoints](#api-endpoints)
- [MCP (Model Context Protocol)](#mcp-model-context-protocol)
- [Shell Integration](#shell-integration)
- [Feature Flags](#feature-flags)
- [License](#license)
@@ -46,7 +45,6 @@ keep --get api-data
- **Filters** — Apply transformations (head, tail, grep, strip ANSI) on retrieval
- **Querying** — List, search, diff items with flexible formatting
- **Client/server architecture** — Optional HTTP server with streaming support
- **MCP support** — Model Context Protocol integration for AI assistants
- **Modular design** — Extensible plugin system for compression, metadata, and filtering
## Installation
@@ -82,7 +80,7 @@ cargo build --release --features server
cargo build --release --features client
# Server + client + all optional features
cargo build --release --features server,tls,client,swagger,mcp
cargo build --release --features server,client,swagger
```
## Quick Start
@@ -356,6 +354,7 @@ KEEP_META_build=1234 echo "data" | keep --save tag --meta env=staging
| `KEEP_SERVER_PASSWORD_HASH` | Server password hash | none |
| `KEEP_SERVER_JWT_SECRET` | JWT secret for token auth | none |
| `KEEP_SERVER_JWT_SECRET_FILE` | Path to JWT secret file | none |
| `KEEP_SERVER_MAX_BODY_SIZE` | Maximum POST body size in bytes (0=unlimited) | unlimited |
| `KEEP_SERVER_CERT` | TLS certificate file path (PEM) | none |
| `KEEP_SERVER_KEY` | TLS private key file path (PEM) | none |
| `KEEP_CLIENT_URL` | Remote keep server URL | none |
@@ -416,6 +415,8 @@ server:
port: 21080
username: "keep"
password: "secret"
# Maximum POST body size in bytes (0 = unlimited)
# max_body_size: 52428800 # 50 MB
# JWT authentication (takes priority over password)
# jwt_secret: "my-secret-key"
# jwt_secret_file: /path/to/jwt_secret
@@ -612,6 +613,33 @@ keep --client-url https://localhost:21080 --save my-tag
The server accepts data from both dumb clients (raw HTTP/curl) and smart clients (the keep CLI).
#### Server Streaming
The server streams all data through fixed-size buffers (8192 bytes). At no point is the entire file content held in memory.
- **POST**: Body streams through the compression and storage pipeline in chunks. When `max_body_size` is exceeded, the server returns `413 PAYLOAD_TOO_LARGE` while keeping the partial item already saved through the pipeline.
- **GET**: Content streams from disk through decompression to the client using the same fixed-size buffers.
- **Diff**: Individual items are capped at 10 MB for the diff endpoint to prevent unbounded memory use.
##### Max Body Size
Control the maximum accepted body size with:
```sh
# Via CLI flag (bytes)
keep --server --server-max-body-size 52428800
# Via environment variable
export KEEP_SERVER__MAX_BODY_SIZE=52428800
keep --server
# Via config file (config.yml)
server:
max_body_size: 52428800 # 50 MB
```
When set to `0` or omitted, no limit is enforced.
#### Server Query Parameters
The server supports query parameters that control processing:
@@ -696,7 +724,7 @@ Client save uses a 3-thread streaming pipeline for constant memory usage regardl
- **Streamer thread**: Reads compressed bytes from pipe, streams to server via chunked HTTP POST
- **Main thread**: After streaming completes, sends computed metadata (digest, hostname, size) to server
Memory usage is O(PIPESIZE) — typically 64KB — regardless of how much data is being stored.
Memory usage is O(PIPESIZE) — typically 8 KB — regardless of how much data is being stored.
#### Example: Remote Pipeline
@@ -769,25 +797,16 @@ cargo build --features server,swagger
Swagger UI available at `/swagger`, OpenAPI spec at `/openapi.json`.
## MCP (Model Context Protocol)
#### Security
AI assistant integration via the Model Context Protocol. Enable with the `mcp` feature.
The server applies the following security measures:
```sh
cargo build --features server,mcp
```
MCP endpoint available at `/mcp/sse` when the server is running.
### Available Tools
| Tool | Description | Parameters |
|------|-------------|------------|
| `save_item` | Save new content | `content`, `tags[]`, `metadata{}` |
| `get_item` | Get item by ID | `id` |
| `get_latest_item` | Get latest item | `tags[]` |
| `list_items` | List items | `tags[]`, `limit`, `offset` |
| `search_items` | Search items | `tags[]`, `metadata{}` |
- **Input validation**: Item IDs are validated as positive integers; tags and metadata have length limits (256 and 128 characters respectively).
- **XSS protection**: All user-controlled data rendered into HTML pages is escaped.
- **Security headers**: Responses include `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, and `Referrer-Policy: strict-origin-when-cross-origin`.
- **CORS**: Explicit allowed headers (`Content-Type`, `Authorization`, `Accept`); no wildcard headers.
- **Path traversal**: Item IDs are validated to prevent directory traversal attacks.
- **Internal errors**: Internal error details are never exposed in HTML responses — only generic messages are shown.
## Shell Integration
@@ -821,7 +840,6 @@ curl -s api.example.com | @ api-response
| `server` | No | HTTP REST API server |
| `tls` | No | HTTPS/TLS server support (requires `server`) |
| `client` | No | HTTP client for remote server |
| `mcp` | No | Model Context Protocol support |
| `swagger` | No | Swagger UI for API docs |
| `bzip2` | No | BZip2 compression (external program) |
| `xz` | No | XZ compression (external program) |
@@ -838,7 +856,7 @@ cargo build --features server,tls
cargo build --features client
# Everything
cargo build --features server,tls,client,mcp,swagger,magic
cargo build --features server,tls,client,swagger,magic
```
## License