refactor: decouple meta plugins from DB via SaveMetaFn callback, extract shared utilities

- Add SaveMetaFn callback pattern: meta plugins receive a closure instead of
  &Connection, enabling the same plugin code to work in local, client, and
  server contexts (collect-to-Vec, collect-to-HashMap, or direct DB write)
- Client save now runs meta plugins locally during streaming (smart client
  sets meta=false, server skips its own plugins)
- Add POST /api/item/{id}/update endpoint for re-running plugins on stored
  content without downloading compressed data
- Add client update mode (--update with --meta-plugin flags)
- Extract shared utilities: stream_copy, print_serialized, build_path_table,
  ensure_default_tag to reduce duplication across modes
- Add upsert_tag for idempotent tag addition (INSERT OR IGNORE)
- Add warn logging on save_meta lock failure in BaseMetaPlugin and MetaService
This commit is contained in:
2026-03-14 22:36:59 -03:00
parent fdc5f1d744
commit 5bad7ac7a6
39 changed files with 843 additions and 290 deletions

View File

@@ -256,7 +256,7 @@ keep --info --meta key=value
### Update Mode
Update an item's tags and metadata.
Update an item's tags, metadata, and re-run meta plugins.
```sh
# Replace tags
@@ -267,6 +267,9 @@ keep --update 1 --meta key=newvalue
# Remove a metadata key
keep --update 1 --meta key
# Re-run meta plugins on stored content
keep --update 1 --meta-plugin digest --meta-plugin text
```
### Delete Mode
@@ -706,6 +709,14 @@ The server supports query parameters that control processing:
| `meta` | `true` | `false` = client handles metadata, skip server-side plugins |
| `decompress` | `true` | `false` = return raw compressed bytes on GET |
The `POST /api/item/{id}/update` endpoint accepts additional parameters:
| Parameter | Default | Description |
|-----------|---------|-------------|
| `plugins` | none | Comma-separated plugin names to re-run on stored content |
| `metadata` | none | JSON-encoded metadata overrides to apply |
| `tags` | none | Comma-separated tags to add (idempotent) |
When using a smart client, these are set automatically. For curl, the server handles everything by default.
#### Example: Curl as a Dumb Client
@@ -750,9 +761,10 @@ export KEEP_CLIENT_JWT=<jwt-token>
Client mode uses **local plugins** and **remote storage**:
1. **Save**: Local compression and metadata plugins run on the client; compressed data streams to the server
1. **Save**: Local compression and meta plugins run on the client; compressed data streams to the server. Smart clients set `meta=false` so the server skips its own plugins.
2. **Get**: Server sends raw compressed data; client decompresses locally and applies filters
3. **Other operations** (list, info, delete, diff): Delegated directly to the server
3. **Update**: Meta plugins run on the server to avoid downloading compressed data for re-processing
4. **Other operations** (list, info, delete, diff): Delegated directly to the server
This means client behavior is consistent with local mode — the same compression settings and filters apply.
@@ -761,22 +773,23 @@ This means client behavior is consistent with local mode — the same compressio
Client save uses a 3-thread streaming pipeline for constant memory usage regardless of data size:
```
┌───────────────┐ OS pipe ┌────────────────┐
│ Reader thread ├──────────────────┤ Streamer thread│
│ │ (compressed │ │
│ stdin → tee │ bytes) │ pipe → POST │
│ → hash │ │ (chunked) │
│ → compress │ │ │
└───────────────┘ └────────────────┘
┌───────────────────┐ OS pipe ┌────────────────┐
│ Reader thread ├──────────────────┤ Streamer thread│
│ (compressed │ │
│ stdin → tee │ bytes) │ pipe → POST │
│ → hash │ │ (chunked) │
│ → compress │ │ │
│ → meta plugins │ │ │
└───────────────────┘ └────────────────┘
│ │
▼ ▼
stdout + Server stores blob
SHA-256 digest
computed metadata
```
- **Reader thread**: Reads stdin, tees output to stdout, computes SHA-256, compresses data, writes to OS pipe
- **Reader thread**: Reads stdin, tees output to stdout, computes SHA-256 via digest plugin, compresses data, runs meta plugins (hostname, text, etc.), writes to OS pipe
- **Streamer thread**: Reads compressed bytes from pipe, streams to server via chunked HTTP POST
- **Main thread**: After streaming completes, sends computed metadata (digest, hostname, size) to server
- **Main thread**: After streaming completes, sends plugin-collected metadata to server
Memory usage is O(PIPESIZE) — typically 8 KB — regardless of how much data is being stored.
@@ -811,6 +824,7 @@ keep --client-url http://logserver:21080 --list --meta project=myapp
| `GET` | `/api/item/{id}/meta` | Item metadata by ID |
| `GET` | `/api/item/{id}/info` | Item info by ID |
| `POST` | `/api/item/{id}/meta` | Add metadata to existing item (body: JSON object) |
| `POST` | `/api/item/{id}/update` | Re-run meta plugins on stored content (params: `plugins`, `metadata`, `tags`) |
| `DELETE` | `/api/item/{id}` | Delete item by ID |
| `GET` | `/api/diff` | Diff two items (`id_a`, `id_b` params) |