Designing tool interfaces your agent can’t misuse

January 22, 202611 min read

Agents are probabilistic; tools should not be. That single constraint explains why so many early agent products feel brittle. Teams often spend weeks tuning instructions while leaving tool interfaces vague, overloaded, or too permissive. The model then has to infer too much from underspecified schemas, and small interpretation mistakes turn into costly downstream behavior. A tool interface is not just a transport layer. It is one of your strongest product controls.

When people say an agent “misused” a tool, the failure is often shared between the model and the interface design. Did the tool ask for fields that were easy to hallucinate? Did one endpoint combine preview, mutation, and deletion behind optional flags? Did the schema encourage defaults the product team would never want applied silently? Strong interfaces reduce ambiguity by narrowing the range of plausible bad decisions before the model ever generates them.

A good tool interface does not merely describe what is possible. It strongly suggests what is safe, normal, and expected.

Prefer narrow verbs over universal endpoints

A common anti-pattern is the “do everything” tool: update_record, manage_workspace, execute_action. These names sound flexible, but flexibility is precisely the problem. Broad verbs force the model to infer intent across too many possibilities, and they force humans to reason about permissions and review logic at an unhelpful level of abstraction. A narrow tool like create_issue, assign_ticket, or draft_reply communicates both purpose and bounds.

Splitting tools by intent may feel verbose to the implementation team, yet it usually reduces complexity elsewhere. Authorization becomes easier to explain. Audit trails become easier to search. Product copy becomes easier to write. And when you need to disable a problematic action, you can turn off one clear path rather than carving exceptions into a generic endpoint. Narrow verbs are not bureaucratic overhead; they are operational leverage.

Signals that a tool is too broad

  • Its name could apply to several unrelated product intents.
  • It requires many optional fields, most of which are irrelevant to a given request.
  • Reviewers struggle to describe the exact action being authorized.
  • Logs tell you that something happened, but not whether it matched the user’s expectation.

Make risky actions explicit in the schema

If a tool can delete data, send an external message, or trigger an irreversible workflow, that risk should be visible in the interface itself. Many teams rely on textual instructions like “only do this if the user confirms,” but leave the underlying schema identical to safer operations. A sturdier pattern is to make dangerous actions require explicit flags, separate endpoints, or prior preview tokens so that the calling path itself contains the safety check.

For example, archive_ticket and delete_ticket should not be the same tool with mode = soft|hard. They have different consequences and deserve different policy treatment. Likewise, draft_email and send_email should be distinct even if they share much of the same payload. The less the model has to infer about severity, the more dependable the system becomes.

type DraftEmail = {
  to: string[]
  subject: string
  body: string
}

type SendEmail = DraftEmail & {
  approvedDraftId: string
}

Validate at the boundary, not in the model

Models are useful at interpreting intent, but they are not a substitute for input validation. Tool interfaces should define which fields are required, what shape they take, and which combinations are invalid. The more validation you can move into deterministic code, the less you have to hope the model always remembers a brittle instruction. Validation also gives the product something concrete to say when an action cannot proceed: a missing assignee, an invalid date range, an unsupported resource type.

Importantly, validation should not just reject bad inputs. It should return structured, actionable errors. “Bad request” is not useful to a model or a user. “project_id is required to create an issue” is. “send_email requires approvedDraftId” is. These errors teach the system what to do next and make recovery much faster in multi-step workflows.

Validation rules that pay off quickly

  • Require explicit identifiers for writes rather than allowing fuzzy matching by default.
  • Reject ambiguous combinations of fields instead of trying to guess user intent in backend code.
  • Constrain enums and formats so invalid states fail early and consistently.
  • Return error codes that the UI can turn into targeted guidance or follow-up questions.

Design for defaults the product can defend

Defaults are one of the easiest ways to hide product risk inside a schema. If fields silently fall back to broad workspaces, external recipients, or current time ranges, the agent may appear helpful while actually making surprising choices on the user’s behalf. Good defaults should be conservative and unsurprising. When the product cannot justify a default to a customer in one sentence, it probably should not exist.

A useful rule is that defaults should either reduce harm or reduce repetitive ceremony without changing meaning. Defaulting a sort order is generally harmless. Defaulting a destination project in a multi-project workspace often is not. The safest interfaces are comfortable saying “I need one more piece of information” instead of guessing into a sensitive path.

When a default changes the consequence of an action, it stops being a convenience and starts being a silent decision.

Tool design is product design

The mistake in many agent stacks is to think of tools as backend plumbing and prompts as the user-facing experience. In reality, tool interfaces shape the UX as much as any screen does. They determine what the model can ask for, what the system can explain, what approvals look like, and how clearly operators can debug failures. A narrow, legible tool surface gives the rest of the product something stable to build on.

If you want an agent your team can trust, start by reading your tool list like a product spec. Are the verbs clear? Are dangerous actions separated? Do the defaults make sense? Are validation errors helpful? Tightening those answers usually does more for reliability than another round of prompt tuning. The interface is the contract. Make it one your agent cannot easily misuse.

Continue reading with more posts from the same category.

← Back to all posts