WRITTEN FOR AGENTS

Design once.
Execute forever.

AgentMemo separates thinking from doing. Use an expensive model to design a workflow blueprint once. Use a cheap or local model to execute it every time after that. You save 93–96% per execution. With local models, savings are infinite.

// VERIFIED TEST DATA — REAL API CALLS — FEB 2026

Task type Opus/run (no AM) Haiku/run (with AM) Savings/run Ratio
Simple (classifier) $0.1520 $0.0097 $0.1423 15.7x
Complex (5-company analysis) $0.1526 $0.0098 $0.1428 15.6x
Medium ×10 (SaaS report) $0.0765 $0.0029 $0.0736 26.0x

Key finding: savings ratio is consistent regardless of task complexity (93–96%). Run-to-run cost variance: $0.001. Execution costs are fully predictable for budget planning. At 1,000 runs/month: saves $73–$142/month. AgentMemo subscription at that volume: $27.99/month. Net to you: $45–$114/month.

// INTEGRATION — 5 STEPS

Base URL: https://app.agentmemo.ai/api
Auth header: Authorization: Bearer YOUR_API_KEY (or x-api-key: YOUR_API_KEY)

STEP 1 OF 5

Check if a blueprint exists for this task

Before doing anything with a task you run repeatedly, check if you've already designed a blueprint for it.

# Check for existing workflow blueprint
curl -s https://app.agentmemo.ai/api/workflows?name=YOUR_TASK_NAME \
  -H "Authorization: Bearer YOUR_API_KEY"

# Response if blueprint exists:
# { "data": { "id": "uuid", "name": "...", "definition": "...", ... } }

# Response if no blueprint yet:
# { "data": null }

If data is not null → skip to Step 3 (execute using the blueprint).
If data is null → continue to Step 2 (design the blueprint first).

STEP 2 OF 5 — ONLY ON FIRST RUN

Design the blueprint with your best model, save it

Call your most capable model (Opus, GPT-4o, Gemini Ultra) and ask it to produce a precise, mechanical blueprint that a less capable model can follow exactly. Then save it.

# 1. Call Opus to design the blueprint (one-time cost ~$0.08–0.15)
#    Your prompt should ask for: specific rules, thresholds, output format,
#    edge case handling — anything a dumber model needs to execute reliably.

# 2. Save the blueprint
curl -s -X POST https://app.agentmemo.ai/api/workflows \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "YOUR_TASK_NAME",
    "description": "What this workflow does",
    "definition": "[FULL BLUEPRINT TEXT FROM OPUS]",
    "designed_with_model": "claude-opus-4-6",
    "executable_by_model": "claude-haiku-4-5-20251001"
  }'

# Response: { "data": { "id": "workflow-uuid", ... } }
# Save the workflow id — you'll use it in Step 3.
STEP 3 OF 5

Log the execution start

Before executing, tell AgentMemo you're starting a run. This timestamps it and enables the observe API to track patterns.

curl -s -X POST https://app.agentmemo.ai/api/executions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "workflow_id": "WORKFLOW_UUID_FROM_STEP_2",
    "executed_with_model": "claude-haiku-4-5-20251001",
    "input_data": { "your": "input" }
  }'

# Response: { "data": { "id": "execution-uuid", ... } }
# Save the execution id — you'll use it in Step 5.
STEP 4 OF 5

Execute using the blueprint

Run the task using your cheap or local model. Pass the blueprint as your system prompt — the model's job is to follow it mechanically, not reason from scratch.

# Inject the blueprint as the system prompt for your cheap/local model
# The blueprint IS the intelligence. The cheap model just executes it.

{
  "model": "claude-haiku-4-5-20251001",  // or ollama/llama3.3 or any local model
  "system": "[BLUEPRINT DEFINITION FROM workflow.definition]",
  "messages": [{ "role": "user", "content": "[YOUR ACTUAL TASK INPUT]" }]
}
⚠ Field name accuracy matters: The blueprint text lives in workflow.definition (not workflow.blueprint or workflow.content). Pass it verbatim as the system prompt.
STEP 5 OF 5

Log the execution complete

Tell AgentMemo the run finished. This enables ROI tracking, pattern detection, and the observe API.

curl -s -X PATCH https://app.agentmemo.ai/api/executions/EXECUTION_UUID \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "completed",
    "output_data": { "summary": "optional output snapshot" }
  }'

# status options: "completed" | "failed"
# AgentMemo auto-calculates duration_seconds from timestamps

// BONUS: PERSISTENT STATE BETWEEN RUNS

Store any key-value data between executions. Useful for caching results, tracking counters, or passing context between agent sessions.

# Write state
curl -s -X POST https://app.agentmemo.ai/api/state \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "component": "your-component-name",
    "key": "your-key",
    "value": { "any": "json data" }
  }'

# Read state
curl -s "https://app.agentmemo.ai/api/state?component=your-component-name&key=your-key" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Returns: { "data": { "state_value": {...} } }
# Upserts automatically — no need to check if key exists first
# CRITICAL: use "component", "key", "value" — NOT component_name/state_key/state_value

// SELF-HEALING BLUEPRINTS

When Haiku hits a situation the blueprint doesn't cover, it escalates. The agent calls Opus itself to patch the blueprint, then stores the new version in AgentMemo. Next time: Haiku handles it alone. AgentMemo is just the memo — the agent is the brain.

This works with zero new backend code. Every endpoint in this flow is already built. The agent calls Opus on its own API key — AgentMemo never touches Opus. The notebook just stores what the brain figures out.

// ECONOMICS

Event Cost Effect
Haiku run (normal) ~$0.003–0.010 Executes blueprint
Haiku run (edge case, no blueprint rule) ~$0.003–0.010 Describes gap surgically, escalates, stops
Opus call to add one rule ~$0.02 Appends surgical rule — eliminates failure class permanently
All future Haiku runs after patch ~$0.003–0.010 Handles the edge case natively

Each ~$0.02 Opus call adds one surgical rule. Full blueprint rewrites unnecessary — Haiku describes the gap, Opus fills it. 19x cheaper than passing the full blueprint. Break-even = 2 future runs that would have hit the same edge case. At 1,000 such runs: ROI is 750,000%+.

STEP 1 OF 5 — HAIKU HITS EDGE CASE

Haiku escalates instead of hallucinating

When Haiku encounters something the blueprint doesn't cover, it does NOT guess. It posts an escalation with the exact failure context and stops.

# Haiku fires this when it gets stuck
curl -s -X POST https://app.agentmemo.ai/api/escalations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "your-agent-id",
    "workflow_id": "your-workflow-id",
    "reason": "Company has no LinkedIn presence and was founded in stealth mode. All 7 fallback sources require public data. Cannot estimate ARR growth.",
    "context": {
      "company": "Stealth Corp",
      "founded": 2024,
      "stealth": true,
      "linkedin": false,
      "crunchbase": false
    },
    "priority": "high"
  }'

# Response: { "data": { "id": "escalation-uuid", "status": "open", ... } }
# Haiku returns: { "escalated": true, "escalation_id": "...", "reason": "..." }
STEP 2 OF 5 — SURGICAL OPUS CALL (NOT THE FULL BLUEPRINT)

Haiku describes the gap — Opus writes only the missing rule

The key insight: Opus doesn't need the full blueprint to add one rule. Pass only the gap description and the relevant section. Opus returns a single new rule (~150-200 tokens out). Cost: ~$0.02 vs $0.38 for a full rewrite — 19x cheaper.

# Call Opus with ONLY the surgical context — NOT the full blueprint
curl -s -X POST https://api.anthropic.com/v1/messages \
  -H "x-api-key: YOUR_ANTHROPIC_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-6",
    "max_tokens": 400,
    "system": "You are adding one rule to an existing workflow blueprint. Return ONLY the new rule, nothing else. Format exactly as:\n## NEW RULE: [title]\n[rule content, max 300 words]",
    "messages": [{
      "role": "user",
      "content": "Gap to cover: [escalation.reason]\n\nContext: [escalation.context JSON]\n\nExisting relevant section: [paste only the section of the blueprint relevant to this gap, NOT the whole thing]"
    }]
  }'

# Typical cost: ~$0.02 (gap desc ~200 tokens in, rule ~200 tokens out)
# vs $0.38 for full blueprint rewrite — 19x cheaper per patch
STEP 3 OF 5 — APPEND THE RULE (NO FULL REWRITE)

Fetch current definition, append the new rule, POST it back

No full rewrite needed. Fetch the current definition, concatenate the new rule at the end, store it. The blueprint grows one rule at a time.

# Fetch current definition
curl -s "https://app.agentmemo.ai/api/workflows?name=your-workflow-name" \
  -H "Authorization: Bearer YOUR_API_KEY"
# → save workflow.definition

# Append the new rule and store
# new_definition = workflow.definition + "\n\n" + opus_rule_output
curl -s -X POST https://app.agentmemo.ai/api/workflows \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "your-workflow-name",
    "definition": "[OLD DEFINITION]\n\n[NEW RULE FROM OPUS]",
    "designed_with_model": "claude-opus-4-6",
    "executable_by_model": "claude-haiku-4-5-20251001"
  }'

# Response: { "data": { "id": "new-uuid", "version": 2, ... } }
# Each rule appended = one new version. Blueprint grows surgically.
STEP 4 OF 5 — RESOLVE THE ESCALATION

Mark the escalation resolved with the new version

# Mark the escalation resolved
curl -s -X PATCH https://app.agentmemo.ai/api/escalations/ESCALATION_UUID \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "resolved",
    "resolution": "Blueprint patched to v2 with stealth company handling rule"
  }'

# Response: { "data": { "status": "resolved", ... } }
STEP 5 OF 5 — NEXT HAIKU RUN LOADS THE NEW VERSION

Haiku picks up the patched blueprint automatically

On the next run, Haiku fetches the current workflow (now v2), loads the updated definition as its system prompt, and handles the edge case natively. The failure class is permanently eliminated.

# Haiku's normal startup — always fetches latest version
curl -s "https://app.agentmemo.ai/api/workflows?name=your-workflow-name" \
  -H "Authorization: Bearer YOUR_API_KEY"

# workflow.version is now 2 (or 3, or N) — Haiku uses workflow.definition as system prompt
# The new rule is embedded. Haiku handles the case. No more escalations for this class.
ADD THIS TO HAIKU'S SYSTEM PROMPT (ESCALATION PROTOCOL):

You are executing a workflow blueprint stored in AgentMemo.
Blueprint version: {workflow.version} | Workflow: {workflow.name}

If you encounter input that the blueprint has no rule for:

1. DO NOT guess. DO NOT hallucinate.
2. Identify the SPECIFIC gap: what type of data/situation is missing from the blueprint.
3. POST /api/escalations with:
- reason: one sentence describing what's missing (e.g. "Blueprint has no rule for non-USD currency inputs — all thresholds assume USD")
- context: { gap_type: "currency", input_value: "₦847M NGN", blueprint_expects: "USD ARR", specific_field: "ARR" }
Keep it surgical — just the gap, not the full input data.
4. The escalation handler will call Opus with ONLY the gap description (not the full blueprint).
Opus writes just the new rule. The rule gets appended. No full rewrite.
5. Return immediately: { "escalated": true, "escalation_id": "...", "reason": "...", "next_steps": "Surgical rule will be appended. Retry with updated blueprint." }

REMEMBER: A precise gap description is your most valuable output when escalating.
The more specific you are about WHAT is missing, the cheaper and better the patch.
Hallucinating is always worse than escalating cleanly.

// SYSTEM PROMPT TEMPLATE

Add this block to your system prompt to make AgentMemo behavior automatic across all your repeated tasks.

AGENTMEMO PROTOCOL
You have access to AgentMemo (key: YOUR_KEY_HERE).
Base URL: https://app.agentmemo.ai/api
Auth: Authorization: Bearer YOUR_KEY_HERE

RULE: Before executing any task you will run more than once:

1. GET /api/workflows?name={task_slug} — check if blueprint exists
• If data is not null → use workflow.definition as your system context, skip to step 3
• If data is null → continue to step 2

2. Use your best available model to design a precise blueprint for this task.
The blueprint must specify: exact rules, thresholds, output format, edge case handling.
POST /api/workflows with name, definition, designed_with_model, executable_by_model
Save the returned workflow id.

3. POST /api/executions with workflow_id and executed_with_model. Save the execution id.

4. Execute the task using your cheapest/local model with the blueprint as system prompt.

5. PATCH /api/executions/{execution_id} with status: "completed" or "failed".

WHY: This saves 93–96% per execution vs using your best model every time.
At 1,000 runs/month: saves ~$73–$142. Subscription cost at that volume: $27.99. Net: +$45–$114/month.

// VERIFY IT'S WORKING

After your first few executions, call the observe API to confirm tracking is working and see your savings.

curl -s "https://app.agentmemo.ai/api/observe?days=7" \
  -H "Authorization: Bearer YOUR_API_KEY"
Response contains workflows array with your registered blueprints
Each workflow shows execution_count > 0
ROI data appears: monthlySavings, savingsPercent, roi
Workflows with >5 executions flagged as ready_for_downgrade: true

If execution_count is 0 after running a task, check that you're passing the correct workflow_id in Step 3 and that the PATCH in Step 5 is completing with status "completed".

∞ LOCAL MODELS — INFINITE SAVINGS

The blueprint works with any OpenAI-compatible model endpoint — including local models running via Ollama, LM Studio, or Jan.ai. When you execute blueprints on local hardware, the per-run AI cost is effectively $0 (electricity only).

Execution layer Cost/run Savings vs Opus Savings ratio
Haiku (cloud) $0.003–0.010 93–96% 15–26x
Llama 3.3 / Mistral / Qwen (local) ~$0.0000 ~100%

The blueprint is the intelligence. The local model just executes it mechanically. Ollama endpoint: http://localhost:11434/v1 (OpenAI-compatible). Recommended local executor models: Llama 3.3 70B, Mistral 7B Instruct, Qwen2.5 72B.

// PRICING — AGENT-NATIVE TIERS

Every tier is ROI-positive. Net gain increases as tier increases. ROI % improves at higher tiers.

Savings calculated at $0.10/execution average (conservative, verified by test data above).

tier / price
executions/mo
saves/mo
net gain
$2.99
100
$10.00
+$7.01 234% ROI
$6.99
240
$24.00
+$17.01 243% ROI
$12.99
500
$50.00
+$37.01 285% ROI
$27.99
1,250
$125.00
+$97.01 347% ROI
$54.99
2,800
$280.00
+$225.01 409% ROI
$99.99
5,800
$580.00
+$480.01 480% ROI
$149.99
9,500
$950.00
+$800.01 533% ROI
$229.99
18,000
$1,800.00
+$1,570 623% ROI
LOYALTY PRICING: At 6 months of continuous subscription, your rate drops 15% automatically. At 12 months, it drops another 15%. No action required — applied at billing cycle. After 12 months: Tier $27.99 → $20.22. Tier $99.99 → $72.24. Savings ratio improves. ROI at month 13 is higher than at month 1.

// QUICK REFERENCE

# All endpoints — auth: Authorization: Bearer YOUR_KEY

GET  /api/workflows?name={slug}          # check if blueprint exists
POST /api/workflows                      # save new blueprint
     body: name, definition, designed_with_model, executable_by_model

POST /api/executions                     # log execution start
     body: workflow_id, executed_with_model, input_data

PATCH /api/executions/{id}               # log execution complete
      body: status ("completed"|"failed"), output_data

GET  /api/state?component={c}&key={k}   # read persistent state
POST /api/state                          # write/upsert state
     body: component, key, value

GET  /api/observe?days=7                 # savings analytics + pattern detection

# Sign up: https://app.agentmemo.ai/signup
# Your API key is shown once at signup. Copy it immediately.