Technology

Why every answer is grounded.

SwarmMarshal is a local-first data layer over your communication history, plus the runtime that answers questions against it. The design choices below are what let it cite a real message behind every claim — instead of guessing.

Provenance

Every fact carries the message it came from.

Most assistant stacks accumulate orphaned, unsourced rows. SwarmMarshal treats provenance and cleanup as first-class, so an answer can always point back to the record behind it.

SourceRef on every durable record

A canonical shape — kind, account, record id, title, excerpt, observed-at, and deletion behavior — stored on the row itself, not bolted on later. The exact-source pair (account + message) is a runtime-enforced invariant.

Trust-gated knowledge

Unverified inbound claims land in a holding table. Promotion to durable memory requires user-authored, trusted-contact, or corroborated status — and rejected claims are suppressed so they aren't relearned.

Cleanup on source delete

Delete a message and the data derived from it tombstones or cascades. Every writer of source-backed data registers a deletion observer, so the citation behind an answer can never dangle.

Per-account isolated storage Local-first, on your machine Quotes kept separate from inference
Memory

From inbox to knowledge graph.

Messages flow through entity extraction into a local graph of people, companies, topics, and projects. Each neighborhood can get an LLM-written summary, routed through local or cloud models according to your settings — and temporal fields preserve the source message time, not ingestion time.

Ingest Messages indexed
Extract Entities + edges
Summarize Per-neighborhood

Hybrid search

Embeddings handle "the invoice from Acme last quarter." BM25 handles "ORD-4821." Results are fused so phrasing and exact match both work. Browsing and name lookup keep working even with no LLM or embeddings available.

Semantic BM25 Cross-channel

Answer packs

An assistant question blends timeline events, context facts, the knowledge-graph neighborhood, and direct message evidence — so receipts, charge amounts, deadlines, and dispute history come from real records, not recall.

Timeline Facts Message evidence
Assistant runtime

One configurable engine. Many profiles.

SwarmMarshal replaced 25 hand-coded specialist agents with a single tool-use engine. The assistant and any supervised helper are each just a profile: system prompt + enabled tools + LLM provider + budget.

  1. You send a message Goes to AgenticChatServiceV2, which delegates to the turn runner.
  2. Profile resolves and filters tools IProfileResolver picks the active profile and trims the tool catalog to its enabled set.
  3. Context injectors append blocks Life context, artifact summaries, and conversation state are appended to the system prompt per profile.
  4. Streaming tool-use loop IToolUseAgentEngine calls the LLM, emits text deltas + tool-call events, executes each tool, feeds the result back, repeats until the model stops.
  5. Persisted both ways Full tool-use blocks land in AgentTranscriptMessage; user/assistant pairs land in AgentChatTurn for the UI.
Max 40 messages in window 20 tool-use rounds 10-minute hard timeout Journal entry per turn
Wire protocols

Each LLM has its own dialect. The runtime speaks all of them.

Per-provider adapters normalize streaming tool-use into one TurnEvent stream the engine consumes.

Provider Endpoint Streaming Tool-call format
OpenAI /chat/completions SSE Indexed argument deltas
Ollama /chat/completions SSE Shared OpenAI-style adapter
Anthropic /v1/messages Named-event SSE tool_use + input_json_delta
Gemini :streamGenerateContent?alt=sse SSE Atomic functionCall parts; ids synthesized as gem-{n}
Grok (xAI) /chat/completions SSE OpenAI-compatible
DeepSeek /chat/completions SSE OpenAI-compatible
Hugging Face /chat/completions SSE OpenAI-compatible router

OpenAI, Ollama, Grok, DeepSeek, and Hugging Face all share OpenAIStyleToolUseProvider. Anthropic gets AnthropicToolUseProvider. Gemini gets GeminiToolUseProvider — Google emits whole functionCall objects atomically (no argument streaming), and the translator synthesizes ids since Gemini's API doesn't carry them. Subscription CLI sessions (Claude Code, Codex) are first-class routes too.

Sandboxed primitives

A handful of primitive tools. Everything else composes from them.

Path-clamped to %LOCALAPPDATA%/SwarmMarshal/sandbox/. Path escapes throw UnauthorizedAccessException before any I/O happens. Consequential tools pass through a permission broker that surfaces inline approve/deny.

http.request(method, url, body?)

HTTP with a configurable host allowlist. Returns status, headers, and body.

shell.exec(command, cwd?)

Shell command pinned to the sandbox directory. Stdout, stderr, and exit code come back.

fs.read_file(path)

Reads a file, clamped to the sandbox root. Path traversal throws before any I/O.

fs.write_file(path, content)

Writes a file, also sandbox-clamped. Atomic replace; parents auto-created.

fs.list_files(directory?)

Lists entries under the sandbox. Default lists the sandbox root.

code.run_csharp(source)

Ad-hoc C# via the existing code-execution skill. Compiles, runs, returns the result.

skills.run(skillId, args)

Generic runner for any registered skill — markdown SKILL.md or compiled C#.

catalog.search_tools(query)

Semantic search over the tool catalog so an agent can discover new tools at runtime.

Host allowlist for HTTP Sandbox cwd for shell Path resolver clamps fs.* Approval broker on risky tools
Tools, external

Speak Model Context Protocol in both directions.

SwarmMarshal is both an MCP client (consumes external connectors) and an MCP server (exposes its own data to other agents).

Consume

Add an MCP server such as filesystem, GitHub, Slack, SQLite, browser search, or an internal tool. Its tools land next to the built-ins, and the assistant picks them through the same routing and approval flow.

Filesystem GitHub Custom

Expose

Point your own agent — Claude Code, Codex, or another MCP client — at SwarmMarshal's built-in server and query the same source-grounded data, with citations, behind an explicit external-use approval gate.

Built-in server Approval-gated Source refs returned
LLM routing

Right-size every call.

Per-task model assignment with budgets, health checks, and Ollama auto-detect. There is no single "the model" — different tasks pick different tiers, and sensitive work can be pinned local-only.

Auto-detect

Scans for a local Ollama install and registers detected models. Local-first when local works, cloud when it doesn't — with a fully local option for private work.

Per-task

Classification on a fast model, summaries on a smart one, high-stakes drafting on a frontier model. Each function type picks its own tier — and LocalOnly is enforced before any cloud advisor runs.

Spend Guard

Hard caps per model, per agent, per day. Spend Guard cuts off the bill before it surprises you and alerts you when it does.

Memory hygiene

Long-lived memory needs operations discipline.

Memory is a durable asset, so it gets maintenance cycles, calibration gates, and deterministic safety filters — not just a prompt that hopes for the best.

Calibration gates

Prompt and model changes run through the message-pipeline calibration harness. Results are recorded by prompt hash and hardware key — the same model on different hardware is a different decision, and routing fails closed without passing proof.

Weekly reconciliation

A local maintenance pass marks stale claims, escalates contradictions, prunes low-confidence projections, and journals an auditable insight row. Conflicts surface for review instead of being silently merged.

Deterministic safety

When enrichment marks a message not knowledge-worthy, durable artifacts are dropped before persistence. Ungrounded quotes are dropped. The model classifies; the system gates the durable writes.

Self-healing

It fixes its own local infrastructure.

If the local AI stack is wedged, the model is undersized, or the error rate spikes, a diagnostic skill runs and proposes a fix. Common repairs stay inside the app, with explicit approval where they change your machine.

fix-wedged-ollama

Detects Ollama hung on a previous request, resets the runner, and reschedules the pending turn.

diagnose-error-rate

Walks the journal, classifies failures by tool and provider, and surfaces the dominant root cause.

diagnose-slow-local-llm

Compares observed latency against the model's expected envelope and recommends an action.

swap-undersized-local-model

If the model can't keep up, proposes (and with approval, performs) a swap to a better-sized local model.

Guided setup Repairs common OAuth issues Patches firewalls Maintains local AI/browser capabilities
Open-source curious?

Run it locally. Read the runtime.

Everything described here is in the shipping app. Download the preview, then poke around.