refactor: streaming, security hardening, and MCP removal

Major overhaul of server architecture and security posture:

- Streaming: Unified all I/O through PIPESIZE (8192-byte) buffers.
  POST bodies stream via MpscReader through the save pipeline. GET
  content streams from disk via decompression to client. Removed
  save_item_with_reader, get_item_content_info, ChannelReader.
  413 responses keep partial items (nonfatal by design).

- Security: XSS protection in all HTML pages via html_escape crate.
  Security headers middleware (nosniff, frame deny, referrer policy).
  CORS tightened to explicit headers. Input validation for tags
  (256 chars), metadata (128/4096), pagination (10k cap). Config
  file reads use from_utf8_lossy. Generic error messages in HTML.
  Diff endpoint has 10 MB per-item cap. max_body_size config option.

- Panics eliminated: Path unwraps → proper error propagation.
  Mutex unwraps → map_err (registries) / expect with message (local).

- MCP removed: Deleted all MCP code, rmcp dependency, mcp feature.

- Docs: Updated README, DESIGN, AGENTS to reflect all changes.
This commit is contained in:
2026-03-14 00:03:42 -03:00
parent 560ba6e20c
commit 17be6abaab
51 changed files with 876 additions and 1309 deletions

View File

@@ -128,16 +128,19 @@
### Item Operations
- `GET /api/item/` - Get a list of items as JSON. Optional params: `order=newest|oldest`, `start=0`, `count=100`, `tags=tag1,tag2`
- `POST /api/item/` - Add a new item (body: raw content). Query params: `tags`, `metadata` (JSON), `compress=true|false`, `meta=true|false`
- `POST /api/item/` - Add a new item (body: raw content, **streamed** through fixed-size 8192-byte buffers). Query params: `tags`, `metadata` (JSON), `compress=true|false`, `meta=true|false`
- `POST /api/item/<#>/meta` - Add metadata to an existing item (body: JSON object)
- `DELETE /api/item/<#>` - Delete an item
- `GET /api/item/latest` - Return the latest item as JSON. Optional params: `tags=tag1,tag2`, `allow_binary=true|false`
- `GET /api/item/latest/meta` - Return the latest item metadata as JSON. Optional params: `tags=tag1,tag2`
- `GET /api/item/latest/content` - Return the raw content of the latest item. Optional params: `tags=tag1,tag2`, `decompress=true|false`
- `GET /api/item/latest/content` - Return the raw content of the latest item (**streamed**). Optional params: `tags=tag1,tag2`, `decompress=true|false`
- `GET /api/item/<#>` - Return the item as JSON. Optional params: `allow_binary=true|false`
- `GET /api/item/<#>/meta` - Return the item metadata as JSON
- `GET /api/item/<#>/content` - Return the raw content of the item. Optional params: `decompress=true|false`
- `GET /api/diff` - Diff two items. Params: `id_a`, `id_b`
- `GET /api/item/<#>/content` - Return the raw content of the item (**streamed**). Optional params: `decompress=true|false`
- `GET /api/diff` - Diff two items. Params: `id_a`, `id_b` (individual items capped at 10 MB)
### Server Configuration
- `max_body_size` - Maximum POST body size in bytes (default: unlimited). When exceeded, server returns `413 PAYLOAD_TOO_LARGE` while keeping the partial item already saved through the streaming pipeline. Set to `0` for unlimited.
### Server Modes
- **Plain HTTP** (default): `tokio::net::TcpListener` + `axum::serve()`
@@ -149,6 +152,8 @@
- Dumb clients (curl) use defaults (`compress=true`, `meta=true`), server handles everything
- GET responses include `X-Keep-Compression` header when `decompress=false`
- Streaming save uses chunked transfer encoding for constant memory usage
- **Universal streaming**: All server paths (POST, GET, diff) use `PIPESIZE` (8192) byte buffers
- **413 partial item**: When `max_body_size` is exceeded, the server returns `413` but keeps the partial item already saved through the pipeline (nonfatal design — pipes continue normally)
### Authentication
- Bearer token authentication: `Authorization: Bearer <password>`
@@ -207,12 +212,19 @@
- TLS/HTTPS support via rustls when certificate and key are provided
- Proper resource cleanup using RAII patterns
- Safe handling of external processes with proper stdin/stdout management
- **Streaming architecture**: All server I/O uses fixed-size 8192-byte buffers; no full file contents held in memory
- **XSS protection**: All user-controlled data in HTML pages is escaped via `html-escape`
- **Security headers**: `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin-when-cross-origin`
- **CORS**: Explicit allowed headers only (`Content-Type`, `Authorization`, `Accept`); no wildcard headers
- **Input limits**: Tags (256 chars), metadata keys (128 chars), metadata values (4096 chars), pagination (10,000 max)
- **Config file size**: 4 KB cap with `from_utf8_lossy` for safe UTF-8 handling
- **Error sanitization**: Internal errors never exposed in HTML responses
- **No `unsafe_code`**: Enforced via `#![deny(unsafe_code)]` (exceptions: `libc::umask` in main.rs, `unsafe impl Send` for `SendCookie` in magic_file.rs)
## Feature Flags
- `server` - HTTP REST API server (axum-based)
- `tls` - HTTPS/TLS support for server (axum-server + rustls)
- `client` - HTTP client for remote server (ureq-based, includes streaming save)
- `mcp` - Model Context Protocol for AI assistant integration
- `swagger` - OpenAPI/Swagger UI documentation
- `magic` - File type detection via libmagic
- `lz4` - LZ4 compression (internal)