← Back to blog

Tool Search Reduces Hermes Agent Context Load by 95.8% Through Progressive Disclosure

hermestool-searchprogressive-disclosurebm25context-optimizationmcp
Tool Search Reduces Hermes Agent Context Load by 95.8% Through Progressive Disclosure

Every tool added to an AI agent comes with a schema. Every schema costs tokens. On a 226-tool Hermes Agent deployment, the full tool catalog consumes roughly 53,994 tokens per request - between 34 and 67 percent of a 131,072-token context window, depending on the model. That cost is paid on every turn, regardless of whether the task needs two tools or two hundred.

Tool Search, a progressive tool disclosure feature merged into Hermes Agent on May 29, 2026, hides MCP and plugin tools behind three bridge tools when the catalog exceeds a configurable threshold. The model sees four routing stubs instead of the full catalog. A BM25 retrieval layer finds relevant tools on demand. An optional embedding reranker, proposed the following day, improves recall when query vocabulary and tool vocabulary diverge.

How it works

When the deferrable tool surface surpasses 10 percent of the model's context window (configurable as threshold_pct), Tool Search activates automatically. Core Hermes tools - terminal, read_file, write_file, patch, search_files, browser_*, send_message, and messaging primitives - are never deferred. Only MCP tools and non-core plugin tools enter the searchable catalog.

The model receives three bridge tools:

  • tool_search - retrieves relevant tools from the catalog using BM25 keyword matching
  • tool_describe - returns the full JSON schema for a specific tool
  • tool_call - invokes the underlying tool, with the executor unwrapping the bridge so guardrails, plugin hooks, and tool-progress callbacks see the real tool name

The catalog is rebuilt from live tool definitions on every assembly - there is no session-keyed cache that could carry stale state across turns. This design explicitly addresses a known class of failure from earlier progressive-disclosure implementations, where isolated cron turns could lose access to requested tools because the catalog carried forward from a previous state.

Leo Ge, who runs GBrain with 80-plus MCP tools, noted:

"This finally lets us keep 80+ GBrain MCP tools fully available without flooding the model with every schema on every turn. Full capability, lower schema tax."

Token reduction

The numbers come from a live 226-tool deployment measured during the feature's development.

Mode Tools visible to model Tokens
Full catalog (baseline) 226 53,994
Tool Search active ~4 routing stubs 2,289
Reduction 222 tools deferred 95.8%

The feature activates with a log line in production: INFO tools.tool_search: tool_search activated: 0 core/visible tools kept, 34 deferred (~6116 tokens, threshold ~5242). The threshold is calibrated against the model's announced context window. For a 1M-context model, a threshold_pct of 0.5 triggers at roughly 5,242 tokens.

BM25 retrieval and the lexical gap

BM25 is a bag-of-words ranking function - it scores tool descriptions by term overlap with the model's search query. This works well when query terms appear in tool names or descriptions. It fails when they do not.

Zoe Park, who tested the feature, described the gap:

"Tool search is the next frontier for agent efficiency. Watched my own agents waste cycles on irrelevant context. BM25 progressive disclosure makes sense. 49-74% gains are real if you measure."

The lexical mismatch is concrete: a query like "remind me tonight" scores zero against a tool named create_calendar_event. The words do not overlap. On the 98-query evaluation corpus, BM25 alone achieved an R@5 of 0.634 - meaning the correct tool was in the top five results about 63 percent of the time.

Embedding reranker

An optional embedding reranker, proposed in PR #35457, addresses the lexical gap by running cosine similarity over tool description embeddings. It embeds both the query and all tool descriptions using a task-prefixed nomic-compatible endpoint, then reranks the BM25 shortlist.

The reranker is implemented with zero new pip dependencies - urllib and stdlib only. Tool embeddings are MD5-cached at process start. Per-query cost is one embedding call. Any endpoint failure falls through silently to BM25.

Metric BM25 baseline + Embedding reranker Bar
R@5 overall 0.634 0.810 0.800
R@5 semantic - 0.849 0.840
R@5 lexical - 1.000 0.950
R@5 ambiguous - 0.513 0.500
MRR overall - 0.811 -

On queries where user vocabulary does not appear in tool names - the semantic mismatch case - the reranker recovered four out of five correct tools versus BM25's one out of five. Token overhead for each reranker call: roughly 144ms median, 168ms at p95, measured against a local nomic-embed-text-v2-moe endpoint.

Architecture decisions

The feature was merged as PR #34493 on May 29, a scoped version of an earlier PR #31163 that was closed in favor of the narrower implementation. Three architectural choices stand out.

Core tools never enter the catalog. terminal, read_file, write_file, patch, search_files, and other essential tools stay visible to the model on every turn. The model is never asked to search for them. This avoids the retrieval-miss problem where a critical tool is invisible when the agent needs it.

Stateless catalog. The catalog is rebuilt from live tool definitions on every assembly. There is no session-keyed map that could carry stale state. Isolated cron turns and subagent sessions each get a fresh catalog.

Toolset scoping. The scoped PR fixed a gap where restricted-toolset sessions (subagents, kanban workers, curated gateway sessions) could search and call tools outside their grant. Before the fix, a session scoped to mcp-github could discover 26 tools (the full registry) and successfully call any registered plugin. After the fix, tool_search returns only the scoped catalog, and tool_call rejects out-of-scope tools with "not available in this session."

Configuration

Tool Search ships with three modes:

tools:
  tool_search:
    enabled: auto       # auto | on | off
    threshold_pct: 10   # activate when deferred tools exceed this % of context
    search_default_limit: 5
    max_search_limit: 20

The default auto mode activates only when the deferrable tool surface crosses the threshold. Setting enabled: true activates it unconditionally - useful for profiles with large MCP tool counts where the automatic threshold may not trigger on high-context-window models.

The embedding reranker is off by default and configurable separately:

tools:
  tool_search:
    reranker:
      enabled: false
      endpoint: http://localhost:11434/v1/embeddings
      mode: rerank
      model: "nomic-embed-text-v2-moe"

Scope and limitations

Ambiguous-query recall at 0.513 is the lowest-scoring category. The root cause is that some tools lack intent-synonyms in their descriptions. The fix is adding synonym fields to tool descriptions rather than changes to the reranker algorithm.

Token growth in long sessions is not caused by Tool Search. It results from tool-result accumulation in conversation history combined with prompt-cache-policy gaps on certain providers, tracked separately.

For 1M-context models, the default threshold_pct: 10 yields roughly 104,857 tokens - high enough that Tool Search may never activate. The PR author recommends threshold_pct: 0.5 for large-context models with many tools.

[^1]: PR #34493 - feat(tools): progressive tool disclosure for MCP and plugin tools (scoped). Merged May 29, 2026. [^2]: PR #35457 - feat(tool_search): optional embedding reranker for progressive tool disclosure. Open, created May 30, 2026. [^3]: PR #31163 - feat(tools): progressive tool disclosure for MCP and plugin tools. Closed in favor of #34493. [^4]: Leo Ge. "Huge thanks to the Hermes Agent team for shipping Tool Search." X. May 30, 2026. [^5]: Zoe Park. "Hermes agent cuts context bloat with BM25 search." X. May 30, 2026.

Termagotchi
_

Ryan Underdown

Autodidact. Rarely listens to advice.

Follow on X @catamarammed or GitHub @underdown