Deerfield Green · Prototype
A multi-agent marketing-strategy team on LangGraph. Specialist roles draft a campaign, an evaluator scores it, a repair loop revises the weakest part, a human approves.
Calem is a multi-agent AI marketing-strategy system built on LangGraph. Seven specialist role agents collaborate to draft a full campaign — brand voice, ICP, offers, SEO, content, ads, and competitive intelligence. An LLM evaluator scores the draft against a weighted 10-dimension rubric; a repair loop targets the weakest role and reruns it up to twice. A human approval gate holds before any deliverable ships.
Seven specialist agents, each with a fixed scope and a spliced brand-voice constraint.
Declares one voice archetype once after memory; every downstream role is spliced with this spec so tone never drifts.
Dynamic Send fan-out: N candidate agents draft personas in parallel, a consolidator writes the canonical ICP profile.
Commercial architecture — motion, pricing tiers, trial/POC structure, alternates, dated urgency, verb-led CTAs.
Intent-grouped keyword clusters, competitive gaps, pillar/spoke internal-link patterns matched to the funnel mix.
Quarterly theme tied to one ICP pain; calendar of 5–10 pieces with format, cadence, distribution, hero asset.
Channel + funnel-stage + attribute-level targeting + creative test matrix (headline × hook × CTA × visual).
The only tool-using role — a Tavily ReAct loop; emits research_context telemetry to prove real tool use.
Built on LangGraph 1.x + FastAPI + Pydantic v2. State: Postgres (checkpoints), Dragonfly (org memory), Qdrant (BGE-M3 hybrid grounding). Single model (Kimi K2.6 via Novita).
Brief in → human-approved bundle out. Roles draft, the evaluator gates quality, a bounded repair loop revises the weakest role, human approval triggers publish + org memory.
The evaluator scores each role section. If the composite score falls below 0.72, the graph routes to the repair subgraph which re-runs only the weakest-scoring role — not the full pipeline. Maximum three repair iterations before the graph escalates to approval regardless.
LangGraph's interrupt() suspends graph execution and surfaces the bundle to a human reviewer. The reviewer can approve (triggers publish + learn), request revisions (resumes repair loop), or reject (terminates to END without publish).
LangGraph edges are additive — multiple upstream nodes writing into synthesis require no explicit join barrier. State reducers merge partial outputs automatically. This means content, ads, and competitive_intel can complete in any order; synthesis reads a consistent merged state.
On approve, the learn node writes approved sections back to org memory (D1 + vectorized embeddings). Future runs retrieve these as few-shot examples, progressively tightening brand alignment without re-prompting from scratch each time.
Dashed violet edges = repair/revise cycle. Graph per DEMO Appendix A.
Role agents are spliced with retrieved priors from a Qdrant hybrid store. Representative subsets of real seed YAML.
| Segment | Vertical | Function | Conf |
|---|---|---|---|
| Mid-Market RevOps Director, B2B SaaS ($50M–$250M ARR) Owns forecast accuracy KPI; stack of 6 disconnected tools; integration depth > forecast lift > admin > price. | b2b-saas | revops | 0.92 |
| Series B SaaS CFO, Finance & Strategy Professionalizing finance ahead of Series C; wants audit-ready close + real-time SaaS metrics; SOC 2 on day one. | b2b-saas | finance | 0.88 |
| Mid-Market E-commerce CMO, DTC Apparel & Beauty Meta CAC up 60% YoY; board demands contribution-margin attribution; buys on 2–4 week cycle via peer Slack groups. | ecommerce-dtc | marketing | 0.85 |
| Enterprise IT Director, Fortune 500 Manufacturing SAP ECC → S/4HANA slipping; OT/IT convergence after peer ransomware; 9-month RFI/RFP cycle. | manufacturing | it | 0.83 |
| Fintech Compliance Officer, Series C Payments MTL licensing patchwork; SAR volume up 4x; wants Sandbox API day one; explainable ML. | fintech | compliance | 0.86 |
| Healthcare Operations Director, Multi-Site Specialty Practice PE-backed, push EBITDA 18→25%; 8 EMRs; no-show 12–18%; platform-level vendor selection. | healthcare | operations | 0.81 |
| Manufacturing Plant Manager, Discrete Manufacturing OEE stuck 55–65%; reactive maintenance; skeptical of AI buzzwords; will pilot before purchase. | manufacturing | operations | 0.79 |
| EdTech Product Head, K-12 Curriculum Platform Year-long district cycles; pilots convert at 30%; COPPA/FERPA + state privacy; wants ESSA tier-2 efficacy research. | edtech | product | 0.78 |
| Developer Tools DevOps Lead, Cloud-Native Startup Self-serve PLG buyer; reads docs first, hates sales calls; open-source friendly; transparent usage-based pricing. | devtools | engineering | 0.90 |
| Climate Tech Head of GTM, Carbon Removal Series A Opaque 9–18 month enterprise cycles; needs MRV-ready data; trusts CarbonPlan teardowns. | climate-tech | gtm | 0.72 |
10 weighted dimensions scored 0.0–1.0.
passes_threshold
gates campaigns; 0.72 target.
Below: first-pass score vs after-repair on ICP Sharpness.
0.775
first pass — fails 0.72
0.852
after repair — passes
Sample scores are an illustrative repair scenario. Weights sum to 1.00. Threshold 0.72.