MCP servers and Claude: five questions we answer before writing a line of code

Model Context Protocol is a hot topic. Wiring Claude to a database, however, doesn't make a product. After several prototypes, we use a five-question methodology before a single line of MCP server code gets written — plus one uncomfortable question about admin UIs.

AIMCPClaudeMethodology

MCP (Model Context Protocol) is in every AI newsletter lately. The reason is fair — Anthropic shipped a standard for the thing every team was cobbling together anyway: a controlled path between a model and a foreign system's data and actions. Once there's a standard, libraries show up, tutorials show up, and everyone with a Claude API key starts building a proof of concept.

What we see over and over: 80% of those prototypes conflate “Claude can see my database” with “Claude does something useful with my database.” There's a chasm between them. The wiring is trivial. Usefulness isn't.

Over the past three months we've built several internal MCP prototypes at Codedock — for our own project data, for our operational systems, and one proof of concept with a specific client. The first production deployment goes live in the coming weeks. Out of those prototypes a methodology crystallized: five questions we answer before a line of MCP server code gets written.

1. Who, specifically, is this for?

“Our users” is a useless answer. An MCP server without a concrete workflow won't design itself — you'll hand Claude a hundred tools and it won't know which one is for what.

The first question is always: name three specific jobs the AI should do. Not “an assistant for our product” but “summarize a customer conversation and propose a follow-up” or “turn a CRM note into a draft proposal in template X.” The more specific the use case, the more the tools design themselves.

Rule of thumb we arrived at: if you can't write a three-turn example dialogue, you don't know the use case well enough to design a good MCP server yet. Dialogue first. Schema after.

2. Reads first, writes last

The MCP protocol distinguishes resources (things to read) and tools (actions — typically including writes). The temptation is to map both at once. Don't.

The rule: expose resources only at first. Let Claude play with them. Watch what it searches for, what it misses, where it flails. Only then add tools — and those only in a mode where human-in-the-loop approval is mandatory.

The reason is trivial: LLM guardrails fail. Prompt injection, muddled instructions, a hallucinated conversational tangent — any of them gets the model to call a destructive tool at the wrong moment. A read-only surface only risks the model reading something it shouldn't. A write surface means the email goes to the wrong person, the invoice is issued for nonsense, or a database record gets wiped.

3. Tool granularity

This is the question people don't ask and then run into. You have two extremes:

Too fine-grained: a hundred tools like getUser, getUserOrders, getOrderLineItems, … — the model gets lost picking between them, the schema prompt blows up the context window, and latency compounds with every tool hop.
Too coarse: a single runQuery(sql) tool — the model has all the power and no control, because SQL is Turing-complete.

The working middle is topical: one tool per user intent, not per database table. getCustomerProfileForSupport beats four tools a support agent has to chain. searchOrders(filters) beats five per-filter variants.

Working heuristic: if your best agent runs the same sequence of three calls five times a day, there's a tool hiding in there waiting to be named. Merge them.

4. Trust boundary and scope

Every MCP server is a new authentication boundary. Sounds obvious; done badly in practice. Typical failure: the MCP server runs as a service account with admin scope. Claude calls a tool “on behalf of user X” but the scope is effectively the entire system.

The right question: what user is the MCP server impersonating, and what scope does it carry at that moment? In production, the MCP server must assume the identity of the caller — OAuth flow, session token, per-tenant API key — and tool calls validate against that identity, not the service account.

The second thing people miss: audit trail. Every tool call needs a log line with who-when-what-why (tool name, arguments, caller identity, conversation ID). When someone asks in six months “who deleted invoice #12345,” you'd better have the answer in less than two days.

5. How do we know it's working?

The last question — and paradoxically the most underestimated. An MCP server isn't a REST API with deterministic semantics. Every tool call depends on how the model interpreted the user message. “It works” isn't something you read off unit tests.

What we measure (or at least plan to measure before it goes live):

Tool selection accuracy: a tracked set of user prompts where we know which tool was the right call. % of correct selections across prompt and model versions.
Latency per turn: tokens aren't free and every tool hop extends the response. A per-conversation budget is the thing to set and track.
Safe failure: what happens when the model gets nonsense input? Does it return an error, or start hallucinating tool calls? The latter is a bug.
Human oversight rate: what % of write operations got approved vs. rejected? If 100% approved, your human-in-the-loop is likely theater.

Without an evals set your MCP server isn't different from every AI demo someone showed off and then went silent on for a year.

A precedent question: do we still build admin UIs?

One question this work pushes to the surface: if an MCP server covers the workflows an in-app admin dashboard would otherwise handle, is it even worth building the admin UI for humans anymore?

The honest answer: we ask this again on every project. Admin UIs are ergonomically strong for tasks a human does a hundred times a week — bulk operations, visual tables, snap filtering. LLMs lose that ergonomics. They excel where the task is infrequent but requires broad context (write a summary, prepare a draft, join three systems, explain an unusual state).

In practice: for some categories of internal operations — reporting, customer onboarding, incident response, per-ticket data analysis — MCP + chat beats a custom React admin. For others (daily ops, high-volume manual steps, metrics dashboards) the reverse holds. Decide on frequency and task ergonomics, not on what's trendy right now.

Watch out for one trap: a chat interface removes the direct visibility of system state. A traditional admin UI shows you “this exists.” A chat only shows what you asked for. If chat is the only way in, parts of the system state will quietly drop out of operational awareness until someone points them out. In design you then want the MCP server to also expose “overview” resources that Claude walks through as part of a routine daily prompt — and flags anomalies on its own.

Where next

At Codedock this is what our current pipeline is built on — the prototypes run internally, the methodology crystallizes from individual findings, and the first production client project launches in the next few weeks. It's domain-specific (skipping details for now), but we had all five questions answered before a line of server code got written.

If you're weighing MCP in an enterprise context, one thing worth holding onto: in 80% of cases a simple tool built around one workflow beats a generic “AI sees everything” server. The latter is impressive in a demo and useless in production.

LinkedIn X

Working on something similar?

Book a 30-minute technical call. No sales process — direct architectural feedback.

Our service:

AI and data pipelines that deliver real engineering value →

Pick a time