AgentMemo separates thinking from doing. Use an expensive model to design a workflow blueprint once. Use a cheap or local model to execute it every time after that. You save 93–96% per execution. With local models, savings are infinite.
| Task type | Opus/run (no AM) | Haiku/run (with AM) | Savings/run | Ratio |
|---|---|---|---|---|
| Simple (classifier) | $0.1520 | $0.0097 | $0.1423 | 15.7x |
| Complex (5-company analysis) | $0.1526 | $0.0098 | $0.1428 | 15.6x |
| Medium ×10 (SaaS report) | $0.0765 | $0.0029 | $0.0736 | 26.0x |
Key finding: savings ratio is consistent regardless of task complexity (93–96%). Run-to-run cost variance: $0.001. Execution costs are fully predictable for budget planning. At 1,000 runs/month: saves $73–$142/month. AgentMemo subscription at that volume: $27.99/month. Net to you: $45–$114/month.
Base URL: https://app.agentmemo.ai/api
Auth header: Authorization: Bearer YOUR_API_KEY (or x-api-key: YOUR_API_KEY)
Before doing anything with a task you run repeatedly, check if you've already designed a blueprint for it.
# Check for existing workflow blueprint
curl -s https://app.agentmemo.ai/api/workflows?name=YOUR_TASK_NAME \
-H "Authorization: Bearer YOUR_API_KEY"
# Response if blueprint exists:
# { "data": { "id": "uuid", "name": "...", "definition": "...", ... } }
# Response if no blueprint yet:
# { "data": null }
If data is not null → skip to Step 3 (execute using the blueprint).
If data is null → continue to Step 2 (design the blueprint first).
Call your most capable model (Opus, GPT-4o, Gemini Ultra) and ask it to produce a precise, mechanical blueprint that a less capable model can follow exactly. Then save it.
# 1. Call Opus to design the blueprint (one-time cost ~$0.08–0.15)
# Your prompt should ask for: specific rules, thresholds, output format,
# edge case handling — anything a dumber model needs to execute reliably.
# 2. Save the blueprint
curl -s -X POST https://app.agentmemo.ai/api/workflows \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "YOUR_TASK_NAME",
"description": "What this workflow does",
"definition": "[FULL BLUEPRINT TEXT FROM OPUS]",
"designed_with_model": "claude-opus-4-6",
"executable_by_model": "claude-haiku-4-5-20251001"
}'
# Response: { "data": { "id": "workflow-uuid", ... } }
# Save the workflow id — you'll use it in Step 3.
Before executing, tell AgentMemo you're starting a run. This timestamps it and enables the observe API to track patterns.
curl -s -X POST https://app.agentmemo.ai/api/executions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"workflow_id": "WORKFLOW_UUID_FROM_STEP_2",
"executed_with_model": "claude-haiku-4-5-20251001",
"input_data": { "your": "input" }
}'
# Response: { "data": { "id": "execution-uuid", ... } }
# Save the execution id — you'll use it in Step 5.
Run the task using your cheap or local model. Pass the blueprint as your system prompt — the model's job is to follow it mechanically, not reason from scratch.
# Inject the blueprint as the system prompt for your cheap/local model
# The blueprint IS the intelligence. The cheap model just executes it.
{
"model": "claude-haiku-4-5-20251001", // or ollama/llama3.3 or any local model
"system": "[BLUEPRINT DEFINITION FROM workflow.definition]",
"messages": [{ "role": "user", "content": "[YOUR ACTUAL TASK INPUT]" }]
}
workflow.definition (not workflow.blueprint or workflow.content).
Pass it verbatim as the system prompt.
Tell AgentMemo the run finished. This enables ROI tracking, pattern detection, and the observe API.
curl -s -X PATCH https://app.agentmemo.ai/api/executions/EXECUTION_UUID \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"status": "completed",
"output_data": { "summary": "optional output snapshot" }
}'
# status options: "completed" | "failed"
# AgentMemo auto-calculates duration_seconds from timestamps
Store any key-value data between executions. Useful for caching results, tracking counters, or passing context between agent sessions.
# Write state
curl -s -X POST https://app.agentmemo.ai/api/state \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"component": "your-component-name",
"key": "your-key",
"value": { "any": "json data" }
}'
# Read state
curl -s "https://app.agentmemo.ai/api/state?component=your-component-name&key=your-key" \
-H "Authorization: Bearer YOUR_API_KEY"
# Returns: { "data": { "state_value": {...} } }
# Upserts automatically — no need to check if key exists first
# CRITICAL: use "component", "key", "value" — NOT component_name/state_key/state_value
When Haiku hits a situation the blueprint doesn't cover, it escalates. The agent calls Opus itself to patch the blueprint, then stores the new version in AgentMemo. Next time: Haiku handles it alone. AgentMemo is just the memo — the agent is the brain.
This works with zero new backend code. Every endpoint in this flow is already built. The agent calls Opus on its own API key — AgentMemo never touches Opus. The notebook just stores what the brain figures out.
| Event | Cost | Effect |
|---|---|---|
| Haiku run (normal) | ~$0.003–0.010 | Executes blueprint |
| Haiku run (edge case, no blueprint rule) | ~$0.003–0.010 | Describes gap surgically, escalates, stops |
| Opus call to add one rule | ~$0.02 | Appends surgical rule — eliminates failure class permanently |
| All future Haiku runs after patch | ~$0.003–0.010 | Handles the edge case natively |
Each ~$0.02 Opus call adds one surgical rule. Full blueprint rewrites unnecessary — Haiku describes the gap, Opus fills it. 19x cheaper than passing the full blueprint. Break-even = 2 future runs that would have hit the same edge case. At 1,000 such runs: ROI is 750,000%+.
When Haiku encounters something the blueprint doesn't cover, it does NOT guess. It posts an escalation with the exact failure context and stops.
# Haiku fires this when it gets stuck
curl -s -X POST https://app.agentmemo.ai/api/escalations \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"agent_id": "your-agent-id",
"workflow_id": "your-workflow-id",
"reason": "Company has no LinkedIn presence and was founded in stealth mode. All 7 fallback sources require public data. Cannot estimate ARR growth.",
"context": {
"company": "Stealth Corp",
"founded": 2024,
"stealth": true,
"linkedin": false,
"crunchbase": false
},
"priority": "high"
}'
# Response: { "data": { "id": "escalation-uuid", "status": "open", ... } }
# Haiku returns: { "escalated": true, "escalation_id": "...", "reason": "..." }
The key insight: Opus doesn't need the full blueprint to add one rule. Pass only the gap description and the relevant section. Opus returns a single new rule (~150-200 tokens out). Cost: ~$0.02 vs $0.38 for a full rewrite — 19x cheaper.
# Call Opus with ONLY the surgical context — NOT the full blueprint
curl -s -X POST https://api.anthropic.com/v1/messages \
-H "x-api-key: YOUR_ANTHROPIC_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-6",
"max_tokens": 400,
"system": "You are adding one rule to an existing workflow blueprint. Return ONLY the new rule, nothing else. Format exactly as:\n## NEW RULE: [title]\n[rule content, max 300 words]",
"messages": [{
"role": "user",
"content": "Gap to cover: [escalation.reason]\n\nContext: [escalation.context JSON]\n\nExisting relevant section: [paste only the section of the blueprint relevant to this gap, NOT the whole thing]"
}]
}'
# Typical cost: ~$0.02 (gap desc ~200 tokens in, rule ~200 tokens out)
# vs $0.38 for full blueprint rewrite — 19x cheaper per patch
No full rewrite needed. Fetch the current definition, concatenate the new rule at the end, store it. The blueprint grows one rule at a time.
# Fetch current definition
curl -s "https://app.agentmemo.ai/api/workflows?name=your-workflow-name" \
-H "Authorization: Bearer YOUR_API_KEY"
# → save workflow.definition
# Append the new rule and store
# new_definition = workflow.definition + "\n\n" + opus_rule_output
curl -s -X POST https://app.agentmemo.ai/api/workflows \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "your-workflow-name",
"definition": "[OLD DEFINITION]\n\n[NEW RULE FROM OPUS]",
"designed_with_model": "claude-opus-4-6",
"executable_by_model": "claude-haiku-4-5-20251001"
}'
# Response: { "data": { "id": "new-uuid", "version": 2, ... } }
# Each rule appended = one new version. Blueprint grows surgically.
# Mark the escalation resolved
curl -s -X PATCH https://app.agentmemo.ai/api/escalations/ESCALATION_UUID \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"status": "resolved",
"resolution": "Blueprint patched to v2 with stealth company handling rule"
}'
# Response: { "data": { "status": "resolved", ... } }
On the next run, Haiku fetches the current workflow (now v2), loads the updated definition as its system prompt, and handles the edge case natively. The failure class is permanently eliminated.
# Haiku's normal startup — always fetches latest version
curl -s "https://app.agentmemo.ai/api/workflows?name=your-workflow-name" \
-H "Authorization: Bearer YOUR_API_KEY"
# workflow.version is now 2 (or 3, or N) — Haiku uses workflow.definition as system prompt
# The new rule is embedded. Haiku handles the case. No more escalations for this class.
Add this block to your system prompt to make AgentMemo behavior automatic across all your repeated tasks.
After your first few executions, call the observe API to confirm tracking is working and see your savings.
curl -s "https://app.agentmemo.ai/api/observe?days=7" \
-H "Authorization: Bearer YOUR_API_KEY"
workflows array with your registered blueprints
execution_count > 0
monthlySavings, savingsPercent, roi
ready_for_downgrade: true
If execution_count is 0 after running a task, check that you're passing the correct workflow_id in Step 3 and that the PATCH in Step 5 is completing with status "completed".
The blueprint works with any OpenAI-compatible model endpoint — including local models running via Ollama, LM Studio, or Jan.ai. When you execute blueprints on local hardware, the per-run AI cost is effectively $0 (electricity only).
| Execution layer | Cost/run | Savings vs Opus | Savings ratio |
|---|---|---|---|
| Haiku (cloud) | $0.003–0.010 | 93–96% | 15–26x |
| Llama 3.3 / Mistral / Qwen (local) | ~$0.0000 | ~100% | ∞ |
The blueprint is the intelligence. The local model just executes it mechanically.
Ollama endpoint: http://localhost:11434/v1 (OpenAI-compatible).
Recommended local executor models: Llama 3.3 70B, Mistral 7B Instruct, Qwen2.5 72B.
Every tier is ROI-positive. Net gain increases as tier increases. ROI % improves at higher tiers.
Savings calculated at $0.10/execution average (conservative, verified by test data above).
# All endpoints — auth: Authorization: Bearer YOUR_KEY
GET /api/workflows?name={slug} # check if blueprint exists
POST /api/workflows # save new blueprint
body: name, definition, designed_with_model, executable_by_model
POST /api/executions # log execution start
body: workflow_id, executed_with_model, input_data
PATCH /api/executions/{id} # log execution complete
body: status ("completed"|"failed"), output_data
GET /api/state?component={c}&key={k} # read persistent state
POST /api/state # write/upsert state
body: component, key, value
GET /api/observe?days=7 # savings analytics + pattern detection
# Sign up: https://app.agentmemo.ai/signup
# Your API key is shown once at signup. Copy it immediately.