Technology - SwarmMarshal

Provenance

Every fact carries the message it came from.

Most assistant stacks accumulate orphaned, unsourced rows. SwarmMarshal treats provenance and cleanup as first-class, so an answer can always point back to the record behind it.

SourceRef on every durable record

A canonical shape — kind, account, record id, title, excerpt, observed-at, and deletion behavior — stored on the row itself, not bolted on later. The exact-source pair (account + message) is a runtime-enforced invariant.

Trust-gated knowledge

Unverified inbound claims land in a holding table. Promotion to durable memory requires user-authored, trusted-contact, or corroborated status — and rejected claims are suppressed so they aren't relearned.

Cleanup on source delete

Delete a message and the data derived from it tombstones or cascades. Every writer of source-backed data registers a deletion observer, so the citation behind an answer can never dangle.

Per-account isolated storage Local-first, on your machine Quotes kept separate from inference

Memory

From inbox to knowledge graph.

Messages flow through entity extraction into a local graph of people, companies, topics, and projects. Each neighborhood can get an LLM-written summary, routed through local or cloud models according to your settings — and temporal fields preserve the source message time, not ingestion time.

Ingest Messages indexed

Extract Entities + edges

Summarize Per-neighborhood

Hybrid search

Embeddings handle "the invoice from Acme last quarter." BM25 handles "ORD-4821." Results are fused so phrasing and exact match both work. Browsing and name lookup keep working even with no LLM or embeddings available.

Semantic BM25 Cross-channel

Answer packs

An assistant question blends timeline events, context facts, the knowledge-graph neighborhood, and direct message evidence — so receipts, charge amounts, deadlines, and dispute history come from real records, not recall.

Timeline Facts Message evidence

Assistant runtime

One configurable engine. Many profiles.

SwarmMarshal replaced 25 hand-coded specialist agents with a single tool-use engine. The assistant and any supervised helper are each just a profile: system prompt + enabled tools + LLM provider + budget.

You send a message Goes to AgenticChatServiceV2, which delegates to the turn runner.
Profile resolves and filters tools IProfileResolver picks the active profile and trims the tool catalog to its enabled set.
Context injectors append blocks Life context, artifact summaries, and conversation state are appended to the system prompt per profile.
Streaming tool-use loop IToolUseAgentEngine calls the LLM, emits text deltas + tool-call events, executes each tool, feeds the result back, repeats until the model stops.
Persisted both ways Full tool-use blocks land in AgentTranscriptMessage; user/assistant pairs land in AgentChatTurn for the UI.

Max 40 messages in window 20 tool-use rounds 10-minute hard timeout Journal entry per turn

Wire protocols

Each LLM has its own dialect. The runtime speaks all of them.

Per-provider adapters normalize streaming tool-use into one TurnEvent stream the engine consumes.

Provider	Endpoint	Streaming	Tool-call format
OpenAI	`/chat/completions`	SSE	Indexed argument deltas
Ollama	`/chat/completions`	SSE	Shared OpenAI-style adapter
Anthropic	`/v1/messages`	Named-event SSE	`tool_use` + `input_json_delta`
Gemini	`:streamGenerateContent?alt=sse`	SSE	Atomic `functionCall` parts; ids synthesized as `gem-{n}`
Grok (xAI)	`/chat/completions`	SSE	OpenAI-compatible
DeepSeek	`/chat/completions`	SSE	OpenAI-compatible
Hugging Face	`/chat/completions`	SSE	OpenAI-compatible router

OpenAI, Ollama, Grok, DeepSeek, and Hugging Face all share OpenAIStyleToolUseProvider. Anthropic gets AnthropicToolUseProvider. Gemini gets GeminiToolUseProvider — Google emits whole functionCall objects atomically (no argument streaming), and the translator synthesizes ids since Gemini's API doesn't carry them. Subscription CLI sessions (Claude Code, Codex) are first-class routes too.

Sandboxed primitives

A handful of primitive tools. Everything else composes from them.

Path-clamped to %LOCALAPPDATA%/SwarmMarshal/sandbox/. Path escapes throw UnauthorizedAccessException before any I/O happens. Consequential tools pass through a permission broker that surfaces inline approve/deny.

http.request(method, url, body?)

HTTP with a configurable host allowlist. Returns status, headers, and body.

shell.exec(command, cwd?)

Shell command pinned to the sandbox directory. Stdout, stderr, and exit code come back.

fs.read_file(path)

Reads a file, clamped to the sandbox root. Path traversal throws before any I/O.

fs.write_file(path, content)

Writes a file, also sandbox-clamped. Atomic replace; parents auto-created.

fs.list_files(directory?)

Lists entries under the sandbox. Default lists the sandbox root.

code.run_csharp(source)

Ad-hoc C# via the existing code-execution skill. Compiles, runs, returns the result.

skills.run(skillId, args)

Generic runner for any registered skill — markdown SKILL.md or compiled C#.

catalog.search_tools(query)

Semantic search over the tool catalog so an agent can discover new tools at runtime.

Host allowlist for HTTP Sandbox cwd for shell Path resolver clamps fs.* Approval broker on risky tools

Tools, external

Speak Model Context Protocol in both directions.

SwarmMarshal is both an MCP client (consumes external connectors) and an MCP server (exposes its own data to other agents).

Consume

Add an MCP server such as filesystem, GitHub, Slack, SQLite, browser search, or an internal tool. Its tools land next to the built-ins, and the assistant picks them through the same routing and approval flow.

Filesystem GitHub Custom

Expose

Point your own agent — Claude Code, Codex, or another MCP client — at SwarmMarshal's built-in server and query the same source-grounded data, with citations, behind an explicit external-use approval gate.

Built-in server Approval-gated Source refs returned

See the MCP tool catalog →

LLM routing

Right-size every call.

Per-task model assignment with budgets, health checks, and Ollama auto-detect. There is no single "the model" — different tasks pick different tiers, and sensitive work can be pinned local-only.

Auto-detect

Scans for a local Ollama install and registers detected models. Local-first when local works, cloud when it doesn't — with a fully local option for private work.

Per-task

Classification on a fast model, summaries on a smart one, high-stakes drafting on a frontier model. Each function type picks its own tier — and LocalOnly is enforced before any cloud advisor runs.

Spend Guard

Hard caps per model, per agent, per day. Spend Guard cuts off the bill before it surprises you and alerts you when it does.

Full routing model → Local benchmarks →

Memory hygiene

Long-lived memory needs operations discipline.

Memory is a durable asset, so it gets maintenance cycles, calibration gates, and deterministic safety filters — not just a prompt that hopes for the best.

Calibration gates

Prompt and model changes run through the message-pipeline calibration harness. Results are recorded by prompt hash and hardware key — the same model on different hardware is a different decision, and routing fails closed without passing proof.

Weekly reconciliation

A local maintenance pass marks stale claims, escalates contradictions, prunes low-confidence projections, and journals an auditable insight row. Conflicts surface for review instead of being silently merged.

Deterministic safety

When enrichment marks a message not knowledge-worthy, durable artifacts are dropped before persistence. Ungrounded quotes are dropped. The model classifies; the system gates the durable writes.

Self-healing

It fixes its own local infrastructure.

If the local AI stack is wedged, the model is undersized, or the error rate spikes, a diagnostic skill runs and proposes a fix. Common repairs stay inside the app, with explicit approval where they change your machine.

fix-wedged-ollama

Detects Ollama hung on a previous request, resets the runner, and reschedules the pending turn.

diagnose-error-rate

Walks the journal, classifies failures by tool and provider, and surfaces the dominant root cause.

diagnose-slow-local-llm

Compares observed latency against the model's expected envelope and recommends an action.

swap-undersized-local-model

If the model can't keep up, proposes (and with approval, performs) a swap to a better-sized local model.

Guided setup Repairs common OAuth issues Patches firewalls Maintains local AI/browser capabilities

Why every answer is grounded.