Claude Cowork just got 10x more powerful!
Glean benchmarked centralized vs federated MCP in Claude Cowork. Same harness, same model, same queries, different context layer.
The federated approach: Each data source (Gmail, Slack, Drive, Salesforce) has its own MCP server. Claude calls each one separately. That's 5-10 tool calls per query. Each source returns results with different quality and ranking. Claude over-fetches to compensate for weak search. Then it filters and synthesizes everything with LLM reasoning. Often needs retry loops when results miss. Burns 50-80k tokens per query.
The centralized approach: All data from every source gets indexed into one unified layer. Knowledge graph connects entities across sources. Claude makes one MCP call. Gets back the top ranked results. No over-fetching, minimal filtering needed. Uses 42-44k tokens consistently.
The results: Centralized indexing preferred 2.5x more often. Federated consumed 30% more tokens on average. When federated finally got correct answers, it burned 83k tokens vs 43k for centralized.
The gap widened as tasks got more complex. Simple tasks: centralized won 66% of the time. Complex tasks: 73%.
Why centralized wins: Over-fetching doesn't just cost tokens. It dilutes the context window with noise and contradictory information. Models have finite attention. Cramming 50-100 items hoping the right ones are in there doesn't work as well as getting the right 5-10 upfront.
Federated search also loses cross-application signals. Things like document relationships, who authored what, and how content is used across the enterprise. These signals improve ranking but they only exist when data is indexed together in one layer.
The compounding problem: In multi-step tasks, each missed or incorrect retrieval compounds. By the time you reach the final output, you're working with flawed data. More tool calls and reasoning loops don't fix this. They just burn more tokens trying to recover.
You can't brute-force around bad search. More tool calls, more data fetching, more reasoning loops don't fix poor context quality. They just burn more tokens.
Why this matters: Token costs are surging. Reasoning models cost more. Companies are burning through AI budgets faster. Federated search compounds the problem.
Better search architecture beats more compute.
I've shared the link in the replies!
显示更多