SwarmMarshal is a local-first data layer over your communication history, plus the runtime that answers questions against it. The design choices below are what let it cite a real message behind every claim — instead of guessing.
Provenance
Every fact carries the message it came from.
Most assistant stacks accumulate orphaned, unsourced rows. SwarmMarshal treats provenance and cleanup as first-class, so an answer can always point back to the record behind it.
SourceRef on every durable record
A canonical shape — kind, account, record id, title, excerpt, observed-at, and deletion behavior — stored on the row itself, not bolted on later. The exact-source pair (account + message) is a runtime-enforced invariant.
Trust-gated knowledge
Unverified inbound claims land in a holding table. Promotion to durable memory requires user-authored, trusted-contact, or corroborated status — and rejected claims are suppressed so they aren't relearned.
Cleanup on source delete
Delete a message and the data derived from it tombstones or cascades. Every writer of source-backed data registers a deletion observer, so the citation behind an answer can never dangle.
Per-account isolated storageLocal-first, on your machineQuotes kept separate from inference
Memory
From inbox to knowledge graph.
Messages flow through entity extraction into a local graph of people, companies, topics, and projects. Each neighborhood can get an LLM-written summary, routed through local or cloud models according to your settings — and temporal fields preserve the source message time, not ingestion time.
IngestMessages indexed
ExtractEntities + edges
SummarizePer-neighborhood
Hybrid search
Embeddings handle "the invoice from Acme last quarter." BM25 handles "ORD-4821." Results are fused so phrasing and exact match both work. Browsing and name lookup keep working even with no LLM or embeddings available.
SemanticBM25Cross-channel
Answer packs
An assistant question blends timeline events, context facts, the knowledge-graph neighborhood, and direct message evidence — so receipts, charge amounts, deadlines, and dispute history come from real records, not recall.
TimelineFactsMessage evidence
Assistant runtime
One configurable engine. Many profiles.
SwarmMarshal replaced 25 hand-coded specialist agents with a single tool-use engine. The assistant and any supervised helper are each just a profile: system prompt + enabled tools + LLM provider + budget.
You send a messageGoes to AgenticChatServiceV2, which delegates to the turn runner.
Profile resolves and filters toolsIProfileResolver picks the active profile and trims the tool catalog to its enabled set.
Context injectors append blocksLife context, artifact summaries, and conversation state are appended to the system prompt per profile.
Streaming tool-use loopIToolUseAgentEngine calls the LLM, emits text deltas + tool-call events, executes each tool, feeds the result back, repeats until the model stops.
Persisted both waysFull tool-use blocks land in AgentTranscriptMessage; user/assistant pairs land in AgentChatTurn for the UI.
Max 40 messages in window20 tool-use rounds10-minute hard timeoutJournal entry per turn
Wire protocols
Each LLM has its own dialect. The runtime speaks all of them.
Per-provider adapters normalize streaming tool-use into one TurnEvent stream the engine consumes.
Provider
Endpoint
Streaming
Tool-call format
OpenAI
/chat/completions
SSE
Indexed argument deltas
Ollama
/chat/completions
SSE
Shared OpenAI-style adapter
Anthropic
/v1/messages
Named-event SSE
tool_use + input_json_delta
Gemini
:streamGenerateContent?alt=sse
SSE
Atomic functionCall parts; ids synthesized as gem-{n}
Grok (xAI)
/chat/completions
SSE
OpenAI-compatible
DeepSeek
/chat/completions
SSE
OpenAI-compatible
Hugging Face
/chat/completions
SSE
OpenAI-compatible router
OpenAI, Ollama, Grok, DeepSeek, and Hugging Face all share OpenAIStyleToolUseProvider. Anthropic gets AnthropicToolUseProvider. Gemini gets GeminiToolUseProvider — Google emits whole functionCall objects atomically (no argument streaming), and the translator synthesizes ids since Gemini's API doesn't carry them. Subscription CLI sessions (Claude Code, Codex) are first-class routes too.
Sandboxed primitives
A handful of primitive tools. Everything else composes from them.
Path-clamped to %LOCALAPPDATA%/SwarmMarshal/sandbox/. Path escapes throw UnauthorizedAccessException before any I/O happens. Consequential tools pass through a permission broker that surfaces inline approve/deny.
http.request(method, url, body?)
HTTP with a configurable host allowlist. Returns status, headers, and body.
shell.exec(command, cwd?)
Shell command pinned to the sandbox directory. Stdout, stderr, and exit code come back.
fs.read_file(path)
Reads a file, clamped to the sandbox root. Path traversal throws before any I/O.
fs.write_file(path, content)
Writes a file, also sandbox-clamped. Atomic replace; parents auto-created.
fs.list_files(directory?)
Lists entries under the sandbox. Default lists the sandbox root.
code.run_csharp(source)
Ad-hoc C# via the existing code-execution skill. Compiles, runs, returns the result.
skills.run(skillId, args)
Generic runner for any registered skill — markdown SKILL.md or compiled C#.
catalog.search_tools(query)
Semantic search over the tool catalog so an agent can discover new tools at runtime.
Host allowlist for HTTPSandbox cwd for shellPath resolver clamps fs.*Approval broker on risky tools
Tools, external
Speak Model Context Protocol in both directions.
SwarmMarshal is both an MCP client (consumes external connectors) and an MCP server (exposes its own data to other agents).
Consume
Add an MCP server such as filesystem, GitHub, Slack, SQLite, browser search, or an internal tool. Its tools land next to the built-ins, and the assistant picks them through the same routing and approval flow.
FilesystemGitHubCustom
Expose
Point your own agent — Claude Code, Codex, or another MCP client — at SwarmMarshal's built-in server and query the same source-grounded data, with citations, behind an explicit external-use approval gate.
Per-task model assignment with budgets, health checks, and Ollama auto-detect. There is no single "the model" — different tasks pick different tiers, and sensitive work can be pinned local-only.
Auto-detect
Scans for a local Ollama install and registers detected models. Local-first when local works, cloud when it doesn't — with a fully local option for private work.
Per-task
Classification on a fast model, summaries on a smart one, high-stakes drafting on a frontier model. Each function type picks its own tier — and LocalOnly is enforced before any cloud advisor runs.
Spend Guard
Hard caps per model, per agent, per day. Spend Guard cuts off the bill before it surprises you and alerts you when it does.
Memory is a durable asset, so it gets maintenance cycles, calibration gates, and deterministic safety filters — not just a prompt that hopes for the best.
Calibration gates
Prompt and model changes run through the message-pipeline calibration harness. Results are recorded by prompt hash and hardware key — the same model on different hardware is a different decision, and routing fails closed without passing proof.
Weekly reconciliation
A local maintenance pass marks stale claims, escalates contradictions, prunes low-confidence projections, and journals an auditable insight row. Conflicts surface for review instead of being silently merged.
Deterministic safety
When enrichment marks a message not knowledge-worthy, durable artifacts are dropped before persistence. Ungrounded quotes are dropped. The model classifies; the system gates the durable writes.
Self-healing
It fixes its own local infrastructure.
If the local AI stack is wedged, the model is undersized, or the error rate spikes, a diagnostic skill runs and proposes a fix. Common repairs stay inside the app, with explicit approval where they change your machine.
fix-wedged-ollama
Detects Ollama hung on a previous request, resets the runner, and reschedules the pending turn.
diagnose-error-rate
Walks the journal, classifies failures by tool and provider, and surfaces the dominant root cause.
diagnose-slow-local-llm
Compares observed latency against the model's expected envelope and recommends an action.
swap-undersized-local-model
If the model can't keep up, proposes (and with approval, performs) a swap to a better-sized local model.
Guided setupRepairs common OAuth issuesPatches firewallsMaintains local AI/browser capabilities
Open-source curious?
Run it locally. Read the runtime.
Everything described here is in the shipping app. Download the preview, then poke around.