Thread-Based Multi-Agent Engineering
Agents execute autonomously, leave clean results, disappear. Named specialists running disciplined execution patterns for GitHub Copilot.
The Problem
Without structure, you get four failure modes. Ghost Protocol eliminates all of them.
Agent expands beyond what you asked
Thread patterns enforce scope limits and escalation triggers
Agent forgets decisions from 20 messages ago
Agent history files and team decisions persist across sessions
You don't know what the agent decided or why
Structured output, decision logging, Scribe session archive
Work ships without structured review
Core Four framework, Review Thread, agent-specific quality checks
The Team
Not generic bots wearing hats. Each agent has defined expertise, boundaries, and a persistent memory of past work.
The one who sees the whole board. Refuses to let tactical wins create strategic debt. Evaluates trade-offs, coordinates big work, kills features to protect architecture.
Patterns:FusionBigReview
Ships working code. Three lines of duplication beats a premature abstraction. Pragmatic, code-first, treats "it compiles" as different from "it works."
Patterns:BaseChainLongZero
The gatekeeper. 80% coverage is the floor, not the ceiling. Prefers integration tests over mocks. Will block a merge if tests are missing.
Patterns:ReviewStatusZero (verify)
The architect of agent behavior. Treats every word in a charter like a line of code. If an agent has to ask for help, the prompt failed.
Patterns:Big (design)Base
If the docs are wrong, the feature doesn't exist. Allergic to jargon. Inline code examples are worth a thousand words.
Patterns:BaseChain (docs phase)
The translator between your code and the outside world. Every API call gets a timeout, every webhook gets idempotency, every response gets validated.
Patterns:ChainBigBase
The paranoid one. Justifiably so. Assumes every input is hostile. Will block a merge over a missing CSRF token. "We'll add auth later" is the most dangerous sentence.
Patterns:ReviewBaseChain
The team's memory. Invisible until you need the receipts. Runs in the background after every thread. If it wasn't logged, it didn't happen.
Patterns:Background (always)
Execution Patterns
Each pattern defines how work is structured: checkpoints, parallelism, phases, verification loops. The right pattern prevents scope creep before it starts.
Single Task
One agent, one task, one review. The fundamental unit. Bug fix, small feature, isolated change.
Checkpoints: Start + End
Parallel
3-5 agents work simultaneously on independent sub-tasks. No shared files, no cross-talk, integration after.
Checkpoints: Start + End
Chain
Sequential phases with human review between each. Different agents can own different phases with explicit handoffs.
Checkpoints: Between every phase
Long Duration
Extended autonomous work with task tracking and self-checkpoints. Plan upfront, execute with breadcrumbs, escalate when needed.
Checkpoints: Optional self-checkpoints
Big / Hierarchical
Flight coordinates a tree of specialist agents with dependencies. Design Review before launch, contracts between agents.
Checkpoints: Start + End
Fusion / Competitive
2-3 agents compete with different strategies. Flight evaluates, user picks the winner. For when the right approach is uncertain.
Checkpoints: Start + Selection + End
Zero-Touch
Fully autonomous. Agent self-verifies through tests. FIDO must approve test coverage first. Max 3 self-fix attempts before escalation.
Checkpoints: End only
Quality Gate
Post-execution evaluation using the Core Four framework. GREEN / YELLOW / RED rating. FIDO leads, RETRO checks security.
Rating: GREEN ยท YELLOW ยท RED
Metrics Dashboard
Thread metrics, agent utilization, optimization suggestions. Read-only. Run at session end for continuous improvement.
Mode: Read-only analysis
Dual Routing
The coordinator selects the specialist agent AND the execution pattern. Not one or the other. Both.
โFix the pagination bug. Off-by-one error returning 9 items instead of 10.โ
โ EECOM executes a scoped fix, presents for review
โImplement auth system. Phase 1: schema. Phase 2: API. Phase 3: frontend. Phase 4: tests.โ
โ Sequential phases with explicit handoffs, human review between each
โEvaluate WebSocket vs SSE vs polling for real-time dashboard updates.โ
โ 3 strategies compete, Flight evaluates, you pick the winner
Quality Framework
Every thread is evaluated against four dimensions. No hand-waving. Scored 1-5 with evidence.
Does the work show understanding of the codebase? Existing patterns followed? Dependencies considered?
CODEBASE AWARERight capability level applied? Simple task = simple approach. No over-engineering, no under-engineering.
RIGHT-SIZEDWork aligns with original request? No scope drift? Ambiguities resolved correctly?
SCOPE CHECKAppropriate tools used? Tests written and run? Linting checked? Right thread pattern selected?
VERIFIEDWhy Ghost Protocol
Squad solved โwho.โ Thread Engineering solved โhow.โ Ghost Protocol solves both.
| Capability | Squad | Thread Engineering | Ghost Protocol |
|---|---|---|---|
| Named specialist agents | Yes | No | Yes |
| Structured execution patterns | No | 9 patterns | 9 patterns |
| Persistent agent memory | Yes | Stateless | Yes |
| Human checkpoints | Ad hoc | Per-pattern | Per-pattern |
| Quality framework | Reviewer gates | Core Four | Core Four + agent review |
| Dual routing (who + how) | Who only | How only | Who + How |
| Ceremonies (retros, design review) | Yes | No | Yes, auto-triggered |
Get Started