MCP Forge

How to cut your MCP token usage

Context bloat, measured and fixed. Updated 2026.

Here is something most people miss about the Model Context Protocol. Every server you connect loads its full set of tool definitions into the context window on every single request. Those schemas are not free. Each tool costs a few hundred tokens, and they are sent before the model reads a word of your prompt.

Five typical servers, with a dozen or more tools each, commonly add up to 50,000 to 75,000 tokens of overhead per request. That is real money on every call, and it is latency you feel on every turn. It also crowds out the context you actually want the model to use.

Step 1: measure it

You cannot cut what you cannot see. A quick way to get a number is to count your servers and tools and multiply. As a rough rule, budget around 200 tokens per tool plus a small per-server overhead. The free auditor below prints an estimate for your real config.

pipx install git+https://github.com/alih552/mcp-audit
mcp-audit
# -> 7 server(s) - ~13,160 context tokens - score 0/100

Step 2: turn off what you are not using

This is the biggest lever and the easiest. Most people leave servers connected that they touched once. If you are not actively using a server this session, disable it. You can always turn it back on. Going from seven always-on servers to the two you actually use can reclaim tens of thousands of tokens.

Step 3: remove redundant servers

Two search servers. Two file servers. A memory server you forgot about. Pick one per capability. Overlap is pure waste because both sets of tool schemas load every time.

Step 4: trim tool surface on servers you build

If you write your own MCP server, expose fewer, sharper tools. Ten focused tools beat thirty overlapping ones, both for token cost and for the model's accuracy in choosing the right one. Keep descriptions tight. Every word in a tool description is a word in every request.

Step 5: load niche servers on demand

For servers you only need occasionally, connect them when the task calls for it rather than keeping them always on. The default of "everything connected all the time" is what creates the bloat.

See your real token cost in one command (free)

mcp-audit estimates the context tokens your servers load per request, flags redundant servers, and checks security too. Zero dependencies, fully local.

mcp-audit on GitHub

Related: How to add authentication to your MCP server · The MCP Server Security Checklist