ADR-0016: v2 service split — fmf-service + named pipe
Date: 2026-06-11 / Status: Accepted (only the duplicated contract constants + value-pin sync operation is superseded by ADR-0018)
Decision
Host the engine in a privileged service fmf-service (hosts fmf-core directly, LocalSystem), make the UI non-privileged (asInvoker), and connect over a named pipe. Wire definitions live in a new rlib fmf-proto, and PipeEngineClient becomes the third implementation of IEngineClient. The canonical spec is the "Pipe protocol" section of docs/ARCHITECTURE.md. The FFI (fmf_engine.dll) and in-proc paths persist for now (--engine=inproc, requires manual elevation).
Rationale
- The MVP's requireAdministrator runs the whole app as administrator: UIPI kills Explorer→window drag & drop (known limitation in README), and "open" needed an explorer.exe de-elevation workaround
- The design reserved this split from the start: the fmf-ffi no-logic rule, the IEngineClient swap boundary, the per-machine index under %ProgramData%, and the "shared with the pipe protocol" note in the error-code table
- A resident service achieves "the index stays fresh via USN tracking even when the UI is not running"
Rejected transports
- COM / RPC (out-of-process) — registry registration, marshalling definitions, and elevation-boundary complexity; worse wire observability vs. a length-prefixed named pipe
- gRPC / HTTP (localhost) — network stack drifts toward the "won't do" server features; dependency (tokio/tonic) clashes with fmf-core's synchronous threading; HTTP/2 overkill for local IPC
- Shared memory + events — fastest page transfer, but self-designing lifetime/permissions/generation loses the "1 FFI function = 1 message" mapping; unneeded since the pipe round-trip has budget headroom (baseline in ARCHITECTURE.md latency-budget section)
- async runtime (tokio) — at most a few connections; blocking I/O + threads fit the existing design; only adds dependency and build time
flush exposure surface (3 options)
The premise is to materialize Engine::flush() (VolumeSlot's shared checkpoint + generation-pair dirty-skip). Three options for the exposure surface were compared:
- Option 1: expose as a pipe opcode — rejected. Client-driven flush spamming repeatedly holds index.read(), a local DoS path that stalls USN application (SECURITY.md threat 6)
- Option 2: not even in FFI, service-internal function only — rejected. The in-proc (--engine=inproc) path and tests cannot reproduce save timing, and it punches a hole in the contract mapping table (1 FFI function = 1 message)
- Option 3: adopted — FFI
fmf_flushis exported, the pipe only reserves opcode 11 as a number
Saving is a service-internal responsibility — periodic (default 300s, staggered across volumes, dirty only) + on SCM Stop/PRESHUTDOWN. Because the PRESHUTDOWN default grace has been shortened to 10 seconds on current Windows (docs/RESEARCH.md), set an explicit extension via SERVICE_PRESHUTDOWN_INFO at install time.
Distribution
MSIX/installer is deferred for this milestone (WindowsPackageType=None kept). Service deployment is established via fmf-service install (sc.exe cannot substitute, because SID capture, DACL setup, and privilege stripping must be done atomically) + a justfile recipe + README instructions. Switching to asInvoker is conditioned on a working service-deployment mechanism (default behavior when the service is not deployed: an InfoBar with explanation + fake fallback + a "Restart as administrator" button).
Consequences
- 2 new crates (fmf-proto / fmf-service). fmf-ffi and the DLL name
fmf_engineare unchanged - The 3 synchronous IEngineClient methods (ListVolumes/StartIndexing/GetStatus) become Task-returning (sync across the pipe = a violation of the UI-thread "must not freeze" rule)
- The single-writer invariant extends across processes:
{index_dir}\.writer.lock+FMF_E_LOCKED=7 - Both the Rust and C# test suites pin identical golden frames (byte sequences), fixing wire drift the same way as contract_tests
- Removal trigger for FfiEngineClient (--engine=inproc): completion of a one-release soak after service GA
- drag-out (results→Explorer) is filed separately as a new feature outside this milestone (only the drop direction is resolved here)
Verification (measured 2026-06-11. Canonical numbers are the CLAUDE.md performance pass-line and the ARCHITECTURE.md latency-budget section)
-
First index, real C: 2.31s @1,268,560 entries (gate: 1M≈60s.
just bench-check). End-to-end via the real service binary (service_admin.rs, console-mode child process) also confirms real-C: scan→Ready→query - USN→event 250.9ms (gate 1s. Measured with periodic flush at a 10s interval firing. Almost all of it is the intended engine-side 200ms debounce. The UI side adds the existing 50ms debounce + rendering)
- kill→restart→restore 1.25s (including process startup. Gate 2s. The engine-alone restore p50 is 108ms). The same test also proves the snapshot survives (durability) via the periodic flush before the hard kill
-
Search p99 ≤5.6ms for all queries on real C: (gate 50ms) / the loopback round-trip for a 64-row ResultPage p99 ≤5ms is constantly asserted by the test (
pipe_loopback.rs::page_roundtrip_stays_inside_the_latency_budget) - RAM: the engine is the same code as the fmf-cli measurement (~99B/entry, WS 119.9MiB @1.27M). The fmf-service addition is only the pipe threads and queues (event queue cap 256×32B/connection)
-
SCM registration (
fmf-service install→ start → stop → uninstall) real-machine smoke — left as a manual procedure. Registering the persistent LocalSystem auto-start service is done by user action (just service-install). The SCM-path code goes through the windows-service crate, and the serve core is shared with the console E2E - SECURITY.md manual verification checklist (other-user reject / remote reject need a separate token / separate machine)
Re-examination triggers
- If environments where pipe page-fetch p99 exceeds 5ms become routine (re-evaluate a multi-page batch-fetch opcode, or shared-memory page transfer)
- Real demand for concurrent multi-user use (
fmf-service authorize <user>to register multiple authorization SIDs)