Major overhaul of server architecture and security posture: - Streaming: Unified all I/O through PIPESIZE (8192-byte) buffers. POST bodies stream via MpscReader through the save pipeline. GET content streams from disk via decompression to client. Removed save_item_with_reader, get_item_content_info, ChannelReader. 413 responses keep partial items (nonfatal by design). - Security: XSS protection in all HTML pages via html_escape crate. Security headers middleware (nosniff, frame deny, referrer policy). CORS tightened to explicit headers. Input validation for tags (256 chars), metadata (128/4096), pagination (10k cap). Config file reads use from_utf8_lossy. Generic error messages in HTML. Diff endpoint has 10 MB per-item cap. max_body_size config option. - Panics eliminated: Path unwraps → proper error propagation. Mutex unwraps → map_err (registries) / expect with message (local). - MCP removed: Deleted all MCP code, rmcp dependency, mcp feature. - Docs: Updated README, DESIGN, AGENTS to reflect all changes.
235 lines
11 KiB
Markdown
235 lines
11 KiB
Markdown
# PROJECT RULES - KEEP FIRST
|
|
|
|
## Standard Rules
|
|
|
|
1. ALWAYS keep DESIGN.md updated with any architectural or design changes
|
|
2. ALWAYS keep project rules first in this document
|
|
3. ALWAYS use git commands to remove or move files (`git rm`, `git mv`, etc.)
|
|
4. Follow Rust naming conventions and idioms
|
|
5. Use anyhow for error handling throughout the codebase
|
|
6. Maintain comprehensive logging with the log crate
|
|
7. Write unit tests for critical functionality
|
|
8. Document public APIs with rustdoc comments
|
|
9. Keep modules focused on single responsibilities
|
|
10. Prefer composition over inheritance
|
|
11. Handle errors gracefully and provide meaningful error messages
|
|
12. Ensure code is safe and avoids unsafe blocks where possible
|
|
|
|
## Code - Modules
|
|
|
|
### Main Module
|
|
- `main.rs` - Entry point, CLI argument parsing, mode dispatching
|
|
- Interacts with all mode modules based on user input
|
|
- Handles database connection setup and data directory management
|
|
|
|
### Mode Modules
|
|
- `modes/save.rs` - Save new items with tags/metadata
|
|
- `modes/get.rs` - Retrieve items by ID/tags
|
|
- `modes/list.rs` - List items with filtering and formatting
|
|
- `modes/delete.rs` - Delete items by ID
|
|
- `modes/update.rs` - Update item tags/metadata
|
|
- `modes/info.rs` - Show detailed item information
|
|
- `modes/diff.rs` - Compare two items
|
|
- `modes/status.rs` - Show system status and capabilities
|
|
- `modes/server.rs` - REST HTTP/HTTPS server mode with OpenAPI documentation
|
|
- `modes/client.rs` - Client mode for remote server (streaming save, local decompression)
|
|
- `modes/common.rs` - Shared utilities for all modes
|
|
|
|
### Database Module
|
|
- `db.rs` - SQLite database operations
|
|
- Handles items, tags, and metadata storage
|
|
- Provides query functions for all modes
|
|
- Manages database migrations
|
|
|
|
### Compression Engine Module
|
|
- `compression_engine.rs` - Trait and type definitions
|
|
- `compression_engine/gzip.rs` - GZip implementation
|
|
- `compression_engine/lz4.rs` - LZ4 implementation
|
|
- `compression_engine/none.rs` - No compression implementation
|
|
- `compression_engine/program.rs` - External program wrapper
|
|
|
|
### Meta Plugin Module
|
|
- `meta_plugin.rs` - Trait and type definitions
|
|
- `meta_plugin/program.rs` - External program wrapper
|
|
- `meta_plugin/digest.rs` - Internal digest implementations
|
|
- `meta_plugin/system.rs` - System information metadata plugins
|
|
|
|
### Common Modules
|
|
- `common/is_binary.rs` - Binary file detection utilities
|
|
- `common/status.rs` - Status information generation
|
|
|
|
### Client Module
|
|
- `client.rs` - HTTP client wrapper (ureq-based, supports streaming POST)
|
|
- `modes/client/save.rs` - 3-thread streaming save (stdin → tee → compress → pipe → HTTP POST)
|
|
- `modes/client/get.rs` - Get with server-side raw fetch + local decompression
|
|
- `modes/client/list.rs` - List delegation to server
|
|
- `modes/client/info.rs` - Info delegation to server
|
|
- `modes/client/delete.rs` - Delete delegation to server
|
|
- `modes/client/diff.rs` - Diff delegation to server
|
|
- `modes/client/status.rs` - Status delegation to server
|
|
|
|
### Utility Modules
|
|
- `plugins.rs` - Shared plugin utilities
|
|
- `args.rs` - CLI argument definitions
|
|
|
|
## Command Line Interface
|
|
|
|
### Modes
|
|
- Save mode: `keep [--save]` (default when no mode specified and no IDs provided)
|
|
- Get mode: `keep [--get] <ID|tag...>` (default when IDs provided)
|
|
- List mode: `keep [--list] [tag...]`
|
|
- Info mode: `keep [--info] <ID|tag...>`
|
|
- Delete mode: `keep [--delete] <ID...>`
|
|
- Update mode: `keep [--update] <ID> [tag...]`
|
|
- Diff mode: `keep [--diff] <ID1> <ID2>`
|
|
- Status mode: `keep [--status]`
|
|
- Server mode: `keep [--server] <address:port>`
|
|
|
|
### Item Options
|
|
- `--meta KEY[=VALUE]` - Set metadata for the item, remove if VALUE not provided
|
|
- `--digest <sha256|md5>` - Digest algorithm to use when saving items
|
|
- `--compression <lz4|gzip|bzip2|xz|zstd|none>` - Compression algorithm to use when saving items
|
|
- `--meta-plugins <plugin[,plugin...]>` - Meta plugins to use when saving items
|
|
|
|
### General Options
|
|
- `--dir <PATH>` - Specify the directory to use for storage
|
|
- `--list-format <FORMAT>` - A comma separated list of columns to display with --list
|
|
- `--human-readable` - Display file sizes with units
|
|
- `--verbose` - Increase message verbosity
|
|
- `--quiet` - Do not show any messages
|
|
- `--output-format <table|json|yaml>` - Output format for info, status, and list modes
|
|
- `--server-password <PASSWORD>` - Password for server authentication
|
|
- `--server-cert <PATH>` - TLS certificate file (PEM) for HTTPS server
|
|
- `--server-key <PATH>` - TLS private key file (PEM) for HTTPS server
|
|
- `--force` - Force output even when binary data would be sent to a TTY
|
|
|
|
### Client Options (requires `client` feature)
|
|
- `--client-url <URL>` - Remote keep server URL
|
|
- `--client-password <PASSWORD>` - Remote server password
|
|
|
|
## Data Storage
|
|
|
|
### Database Schema
|
|
- `items` table: id (primary key), ts (timestamp), size (optional), compression
|
|
- `tags` table: id (foreign key to items), name (tag name)
|
|
- `metas` table: id (foreign key to items), name (meta key), value (meta value)
|
|
- Indexes on tag names and meta names for faster queries
|
|
|
|
### File Storage
|
|
- Data directory contains compressed item files named by their item ID
|
|
- Database file stored in data directory
|
|
- File permissions set to be private to user (umask 077)
|
|
|
|
## REST API Endpoints
|
|
|
|
### Status Operations
|
|
- `GET /api/status` - Get system status information
|
|
- `GET /api/plugins/status` - Get plugin status information
|
|
|
|
### Item Operations
|
|
- `GET /api/item/` - Get a list of items as JSON. Optional params: `order=newest|oldest`, `start=0`, `count=100`, `tags=tag1,tag2`
|
|
- `POST /api/item/` - Add a new item (body: raw content, **streamed** through fixed-size 8192-byte buffers). Query params: `tags`, `metadata` (JSON), `compress=true|false`, `meta=true|false`
|
|
- `POST /api/item/<#>/meta` - Add metadata to an existing item (body: JSON object)
|
|
- `DELETE /api/item/<#>` - Delete an item
|
|
- `GET /api/item/latest` - Return the latest item as JSON. Optional params: `tags=tag1,tag2`, `allow_binary=true|false`
|
|
- `GET /api/item/latest/meta` - Return the latest item metadata as JSON. Optional params: `tags=tag1,tag2`
|
|
- `GET /api/item/latest/content` - Return the raw content of the latest item (**streamed**). Optional params: `tags=tag1,tag2`, `decompress=true|false`
|
|
- `GET /api/item/<#>` - Return the item as JSON. Optional params: `allow_binary=true|false`
|
|
- `GET /api/item/<#>/meta` - Return the item metadata as JSON
|
|
- `GET /api/item/<#>/content` - Return the raw content of the item (**streamed**). Optional params: `decompress=true|false`
|
|
- `GET /api/diff` - Diff two items. Params: `id_a`, `id_b` (individual items capped at 10 MB)
|
|
|
|
### Server Configuration
|
|
- `max_body_size` - Maximum POST body size in bytes (default: unlimited). When exceeded, server returns `413 PAYLOAD_TOO_LARGE` while keeping the partial item already saved through the streaming pipeline. Set to `0` for unlimited.
|
|
|
|
### Server Modes
|
|
- **Plain HTTP** (default): `tokio::net::TcpListener` + `axum::serve()`
|
|
- **HTTPS** (with `tls` feature): `axum_server::bind_rustls()` with rustls when `--server-cert` and `--server-key` are provided
|
|
- Conditional selection at startup: cert+key present → HTTPS, otherwise → HTTP
|
|
|
|
### Client/Server Protocol
|
|
- Smart clients (keep CLI) set `compress=false` and `meta=false` on POST, handling compression/metadata locally
|
|
- Dumb clients (curl) use defaults (`compress=true`, `meta=true`), server handles everything
|
|
- GET responses include `X-Keep-Compression` header when `decompress=false`
|
|
- Streaming save uses chunked transfer encoding for constant memory usage
|
|
- **Universal streaming**: All server paths (POST, GET, diff) use `PIPESIZE` (8192) byte buffers
|
|
- **413 partial item**: When `max_body_size` is exceeded, the server returns `413` but keeps the partial item already saved through the pipeline (nonfatal design — pipes continue normally)
|
|
|
|
### Authentication
|
|
- Bearer token authentication: `Authorization: Bearer <password>`
|
|
- Basic authentication: `Authorization: Basic base64(keep:<password>)`
|
|
- When no password is set, authentication is disabled
|
|
|
|
## Supported Compression Types
|
|
- LZ4 (internal implementation)
|
|
- GZip (internal implementation)
|
|
- BZip2 (external program)
|
|
- XZ (external program)
|
|
- ZStd (external program)
|
|
- None (no compression)
|
|
|
|
## Supported Meta Plugins
|
|
- FileMagic - File type detection using file command
|
|
- FileMime - MIME type detection using file command
|
|
- FileEncoding - File encoding detection using file command
|
|
- LineCount - Line count using wc command
|
|
- WordCount - Word count using wc command
|
|
- Cwd - Current working directory
|
|
- Binary - Binary file detection
|
|
- Uid - Current user ID
|
|
- User - Current username
|
|
- Gid - Current group ID
|
|
- Group - Current group name
|
|
- Shell - Shell path from SHELL environment variable
|
|
- ShellPid - Shell process ID from PPID environment variable
|
|
- KeepPid - Keep process ID
|
|
- DigestSha256 - SHA-256 digest
|
|
- DigestMd5 - MD5 digest using md5sum command
|
|
- ReadTime - Time taken to read data
|
|
- ReadRate - Rate of data reading
|
|
- Hostname - System hostname
|
|
- FullHostname - Fully qualified domain name
|
|
|
|
## Testing Strategy
|
|
- Unit tests for each module in `src/tests/`
|
|
- Integration tests for modes
|
|
- Database tests for CRUD operations
|
|
- Compression engine tests for each supported format
|
|
- Meta plugin tests for each plugin type
|
|
- Server tests for API endpoints and authentication
|
|
- Common utilities tests for helper functions
|
|
|
|
## Binary Data Handling
|
|
- Automatic binary detection using file signatures and heuristics
|
|
- Prevents binary data output to TTY unless --force is used
|
|
- Binary meta plugin analyzes content to determine if it's binary
|
|
- API endpoints respect binary flags to prevent accidental binary transmission
|
|
|
|
## Security Considerations
|
|
- File permissions are restricted to user only (umask 077)
|
|
- Input validation for item IDs to prevent path traversal
|
|
- Authentication for server mode with bearer or basic auth
|
|
- TLS/HTTPS support via rustls when certificate and key are provided
|
|
- Proper resource cleanup using RAII patterns
|
|
- Safe handling of external processes with proper stdin/stdout management
|
|
- **Streaming architecture**: All server I/O uses fixed-size 8192-byte buffers; no full file contents held in memory
|
|
- **XSS protection**: All user-controlled data in HTML pages is escaped via `html-escape`
|
|
- **Security headers**: `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin-when-cross-origin`
|
|
- **CORS**: Explicit allowed headers only (`Content-Type`, `Authorization`, `Accept`); no wildcard headers
|
|
- **Input limits**: Tags (256 chars), metadata keys (128 chars), metadata values (4096 chars), pagination (10,000 max)
|
|
- **Config file size**: 4 KB cap with `from_utf8_lossy` for safe UTF-8 handling
|
|
- **Error sanitization**: Internal errors never exposed in HTML responses
|
|
- **No `unsafe_code`**: Enforced via `#![deny(unsafe_code)]` (exceptions: `libc::umask` in main.rs, `unsafe impl Send` for `SendCookie` in magic_file.rs)
|
|
|
|
## Feature Flags
|
|
- `server` - HTTP REST API server (axum-based)
|
|
- `tls` - HTTPS/TLS support for server (axum-server + rustls)
|
|
- `client` - HTTP client for remote server (ureq-based, includes streaming save)
|
|
- `swagger` - OpenAPI/Swagger UI documentation
|
|
- `magic` - File type detection via libmagic
|
|
- `lz4` - LZ4 compression (internal)
|
|
- `gzip` - GZip compression (internal)
|
|
- `bzip2` - BZip2 compression (external)
|
|
- `xz` - XZ compression (external)
|
|
- `zstd` - ZStd compression (external)
|