After weeks of building production MCP infrastructure, here are the decisions that mattered, the ones that didn't, and what the community is actually fighting with.

After spending weeks building production MCP infrastructure, here are the decisions that mattered most β and the ones I'd do differently. What started as a Reddit post became a surprisingly deep conversation with other builders fighting the same battles. This is the expanded version, with community insights woven in.
Auto-generating MCP manifests from OpenAPI specs is straightforward. Map paths to tools, extract schemas, it's a one minute job. The real complexity turned out to be auth. OAuth 2.1 + RFC 9728 (PRM) + PKCE are specs you need to get right before a single tool call works.
If you're building multiple MCP servers, do NOT implement OAuth in each one. Centralize it. One gateway. One place to get it right.
The community validated this. One builder shipping a game-server hosting platform via agents told me auth took "3x the time of everything else on the API combined." Another pointed out that even managed platforms like AWS AgentCore split auth into two categories β inbound (user β gateway) and outbound (gateway β target) β with completely different approaches for each. The consensus: auth eats your roadmap if you are not careful.
Never let the MCP client see the upstream API key. Issue each end user their own OAuth client_id / client_secret. The dispatch layer validates the OAuth token, looks up the user's encrypted credential, decrypts it, and injects it into the upstream request. The MCP server is a proxy β it receives identity headers and forwards to the API.
Benefits:
One community member took this further: for agents that can trigger financial transactions (provisioning servers, placing orders), they issue per-session tokens with a pre-authorized spend cap and TTL. The agent never sees the actual payment method. If the agent goes off the rails β purchase loops, wrong region, wrong tier β the cap blocks it before the card auth call fires. Same pattern, bounded by spend rather than time.
OpenMM (open-source MCP for financial exchanges) added another layer: per-tool credential tagging. If place_order needs a different OAuth scope than read_positions, the manifest declares it, and the dispatch layer negotiates which credential to inject per tool call rather than per session. The MCP server stays stateless; the gateway does the thinking.
OAuth gets you authentication ("who is this?"). Authorization ("can they call this tool?") is entirely custom. The cleanest approach: check permissions at the dispatch layer before the request reaches the MCP server. One place to enforce policy, consistent across all MCPs.
This means the MCP server never sees a rejected request. You can say "Bob can read invoices, not create them" at the edge, and the tool never fires if Bob tries something he shouldn't.
Quotas are simpler at this layer too: per-user rate limiting at the gateway, with monthly call counters per MCP. A free-tier user hits the wall at 1,000 calls and gets a 429. No impact on other tenants.
The hardest part honestly isn't the tech β it's defining the permission model in a way that's simple enough for the MCP owner to configure without becoming an admin. Still iterating on that one.
STDIO is single-client by design. Under concurrent load it falls apart. The MCP spec moving to Streamable HTTP was the right call β stateless, standard HTTP, no SSE complexity.
Deployment approaches vary. Some run MCPs as K8s pods with their own ingresses and services. Others use managed gateways like AWS AgentCore. The pattern that worked for us: gateway-first. Every MCP routes through a shared dispatch layer. The MCPs themselves are thin, they receive an already-authenticated request and forward to the upstream API. Works well for "wrap an existing API as MCP." If you need custom logic per tool call, containers make more sense. Depends whether the MCP is the product or the API surface.
A 1:1 OpenAPIβMCP mapping gives you 30+ tools from a typical REST API. That's thousands of tokens of tool schemas in every context window. Multiply by every call, every session. It adds up fast.
Two approaches emerged from the community:
Dynamic filtering at dispatch: Only expose the tools the user actually needs. Cut schema waste by 60-70%.
Hierarchical tool discovery: Let the agent call list-games first (returns a tiny handle list), then get-plans-for-game(handle). The agent picks its own path, filtering for you. Costs one extra round-trip per session, which doesn't matter at agent timescales. One game-server builder went this route β 26 games Γ 5 regions Γ plan tiers would explode the schema otherwise.
Both work, different constraints. If your users know the hierarchy, let them navigate it. If you're multi-tenant and can't assume that, filter at the gateway.
This one split the room. Several builders tried auto-generating MCP tools from OpenAPI specs and threw it out β "way too much low-value tool noise." Every CRUD endpoint becomes a tool. Response schemas flatten into noise.
The fix isn't abandoning auto-gen. It's adding an editorial layer: parse the OpenAPI spec, apply heuristics and server-side overrides, strip list endpoints when the agent only needs create/read, collapse nested params, keep only the fields the agent actually needs. The output is curated programmatically. Without that layer, it's noise. With it, you get the speed of auto-gen and the quality of hand-picked tools.
invalid_grant with no context is miserable.The MCP protocol is young. The tooling around it β auth, multi-tenancy, governance β is even younger. What's clear from talking to builders in production is that the infrastructure layer is where the real work lives.
The protocol itself is the easy part.