// Blog

Technical notes for AI agent builders

Tutorials, comparisons and design patterns for building autonomous agents that self-fund, call 345+ models and orchestrate MCP Tools.

Your first agent in five days: a real walkthrough from layoff to first paying customer

After twenty-two posts walking through protocols, patterns, security, compliance, evaluation craft, the agentic stack synthesis, and a menu of ten niches, the post that closes the loop is the one nobody has written yet: a concrete five-day execution narrative. We follow Mariana, ex-customer success at a B2B SaaS company laid off the previous Friday, picking the inbox-triage niche from the niches post, building her first agent in four focused hours on Monday, landing her first paying customer through a network DM on Tuesday, demoing on Wednesday, onboarding and tuning on Thursday, and cashing her first invoice on Friday. The walkthrough includes the actual prompt she ships, the catalog of MCP servers she connects, the OAuth scopes she requests, the fifteen-case eval suite she builds, the DM script she sends to her first prospect, the demo script that closes the deal, and the two edge cases that broke in week one and what she did about them. We also cover what goes wrong between week two and week eight — honestly, because the operator who only hears about the win curve quits at the first friction.

16 min read →

Ten niches where a solo operator can ship a real agent in a week, with revenue math

After eighteen posts of protocols, patterns and craft, this is the post that turns the theory into a Monday-morning action list. We catalog ten niches we have actually seen work for solo operators using the agentic stack we have been writing about: sales-tax reconciliation for Shopify-tier merchants, FDA correspondence monitoring for medical-device territories, lease document screening for renters' rights jurisdictions, RFP response drafting for under-resourced sales teams, on-chain treasury monitoring for crypto-native family offices, podcast clip extraction for creator agencies, regulatory filing watchers for compliance teams, multi-source competitor pricing monitoring for SaaS founders, B2B accounts-payable invoice screening for finance teams, and personalised meeting prep for executives. For each: the addressable market sketch, the data and tools you need, the typical monetization model, the first-customer revenue range, the time-to-first-paying-customer we have observed, and the biggest barrier. Picks one and ships it; the next ninety days take care of themselves.

13 min read →

The agentic stack in 2026: one diagram, five layers, and the operator's mental model

We have spent a month writing about individual layers — MCP for tools, A2A for agent-to-agent, AP2 for payment authorization, x402 for crypto-native settlement, ERC-8004 for identity and reputation. This post is the synthesis we wish someone had handed us when we first started: one diagram with all five layers stacked the way they actually compose in a production agent system, an explanation of which layer answers which question, the canonical composition pattern from discovery through settlement, where evaluation and security sit transversally across all the layers, and what the operator's mental model has to look like to navigate the whole thing. This is the post you link a colleague when they ask 'what does the agentic stack actually look like in 2026.'

11 min read →

Agent evaluation and observability: the craft that separates a real operator from a hobbyist

An operator who can answer 'what did my agent do on Tuesday at 14:23 and was it right' has a business. An operator who cannot is going to lose their first paying client and spend the second week of the month figuring out why. This post is the practical evaluation and observability recipe we have not written yet — the four metric categories (correctness, cost, latency, drift), the small fast eval suite every operator should ship before their first paying user, the production observability stack that makes drift detectable, the prompt versioning discipline that makes Tuesday's regression Wednesday's rollback, and the canary deployment pattern that catches problems before they reach the whole fleet. We close with how Agent Builder ships sensible defaults for every layer of this stack and what the operator still has to do themselves.

14 min read →

Multi-agent orchestration: the four canonical patterns, when to use each, and the anti-patterns that break fleets

Most public writing on agent systems assumes a single autonomous agent. Most production deployments past the first month do not look like that — they look like fleets of cooperating agents with different specializations, different model tiers, different blast radii, and a real operator coordinating them through a dashboard. This post catalogs the four orchestration patterns we see survive contact with production: supervisor-worker (one planner, many parallel workers), peer-to-peer mesh (agents discover each other via A2A and negotiate work), hierarchical tree (recursive supervisor-worker for tasks too big for one level), and swarm (many homogeneous agents with stochastic load balancing). For each we cover the use case it fits, the cost and latency math, the failure modes, and how Agent Builder implements it. We close with the three anti-patterns we have watched break the most fleets — full mesh, ring leadership, blind aggregation — and a decision tree that walks an operator from 'I have one agent' to 'I have the right shape for thirty.'

14 min read →

The agent security threat model: what attacks are live today, what is coming next, and whether agents are ready for any of it

An autonomous agent is a piece of software with a credit card, a calendar, an inbox, and the trust of its principal. Every one of those affordances is an attack surface. This post is the threat-model document we ship internally and that any agent operator should be reading before they go to production: the eight live attack vectors (direct prompt injection, indirect injection via tool output, tool poisoning, supply-chain rug pulls on MCP servers, agent hijacking, polymorphic phishing agents, invoice-timed malware, synthetic identity farms), the four sophisticated attacks coming online over the next eighteen months (long-con social engineering targeting the agent, sleeper agents with delayed payload, cross-agent reputation laundering, AP2 mandate forgery), the honest answer to whether agents are ready for any of it (they are not, mostly), the criminal economy that is going to deploy this attack surface in volume (fraud-as-a-service, romance scams at scale, dust laundering, fake KYC at industrial speed), and the practical defenses an operator can implement today. This is the document we wish someone had handed us before we deployed our first agent that touched real money.

17 min read →

EU AI Act: what every agent operator needs to know before August 2, 2026, in plain language

The EU AI Act entered into force on 1 August 2024 with most of its provisions phased in over time. The phase that matters most for any operator running agents — Annex III high-risk systems — starts being enforced on 2 August 2026. This post is the version of the law we wish someone had written for us when we first read the text: it explains who is in scope (anyone whose agent reaches EU users, even if the operator is in Argentina or California), what counts as high-risk (recruiting, education, credit scoring, biometric ID, essential public services, employment screening, law enforcement support, critical infrastructure), the four risk tiers explained without legalese, the seven concrete obligations attached to high-risk systems (risk management, data governance, technical documentation, automatic logging, transparency, human oversight, accuracy/robustness/cybersecurity), how the transparency rules under Article 50 apply to any agent that interacts with a person or generates content, the penalty structure, and a one-page operator checklist mapping each obligation to a concrete artifact you need to produce. We close with how LLM4Agents Agent Builder maps to each obligation by default, so the operator running a fleet through Agent Builder is already three-quarters compliant on day one.

15 min read →

MCP deep dive: the Model Context Protocol, end to end, for the agent operator

MCP — Model Context Protocol — is the open standard that lets any LLM application connect to any tool or data source through the same JSON-RPC contract, the way USB-C lets any peripheral connect to any host. Anthropic open-sourced it in November 2024; by mid-2026 every major LLM platform speaks it natively and the official registry lists hundreds of production servers. This post is the comprehensive technical walkthrough we owe future agent operators: the host-client-server architecture, the three server-side primitives (Tools, Resources, Prompts) and the three client-side ones (Sampling, Roots, Elicitation), the stdio and Streamable HTTP transports, the OAuth 2.1 authorization stack, the lifecycle handshake, the security model with consent gates and tool safety, the current 2025-11-25 spec, the upcoming 2026-07-28 release candidate (stateless protocol core, MCP Apps with sandboxed HTML UIs, Tasks as a formal extension, six OAuth hardening proposals, a formal deprecation policy), and the practical operator concerns that the spec does not solve for you: secret management, scope minimisation, observability, audit, and how to recognize a server you should not connect to. We close with how Agent Builder turns MCP from a protocol you implement to a control surface you configure.

18 min read →