Skip to content

Sampling

Sampling lets your server-side code request LLM completions from the MCP client. Instead of calling an AI API directly, your tool sends a sampling request through MCP, and the client (Claude Desktop, Cursor, etc.) handles the actual LLM call.

This is powerful because:

  • Your tool code doesn’t need API keys or AI SDK dependencies
  • The client controls model selection, rate limiting, and cost
  • The user sees and approves the LLM interaction

How Sampling Works

Your Code pmcp (bridge) MCP Client
│ │ │
├─ SamplingRequest ───►│ │
│ ├─ sampling/createMessage ►│
│ │ ├─ LLM call
│ │ │◄─ response
│ │◄─ CreateMessageResult ───┤
│◄─ SamplingResponse ──┤ │

The flow is bidirectional: your code initiates the request, pmcp forwards it to the MCP client, the client calls the LLM, and the response flows back.

Python

Use ctx.sample() inside a tool handler to request an LLM completion:

from protomcp import tool, ToolResult, ToolContext
@tool("Translate text to another language")
def translate(ctx: ToolContext, text: str, target_language: str) -> ToolResult:
response = ctx.sample(
messages=[
{"role": "user", "content": f"Translate the following to {target_language}:\n\n{text}"}
],
system_prompt=f"You are a translator. Translate accurately to {target_language}.",
max_tokens=1000,
)
if response.get("error"):
return ToolResult(result=f"Translation failed: {response['error']}", is_error=True)
return ToolResult(result=response["content"])

Sampling Parameters

ParameterTypeDescription
messageslist[dict]Conversation messages (role + content)
system_promptstrSystem prompt for the LLM
max_tokensintMaximum tokens to generate
model_preferencesdictHints about model selection (client may ignore)

Event-Driven Sampling

Sampling isn’t limited to tool calls. Your server can trigger LLM calls in response to events:

import threading
from protomcp import tool, ToolResult, ToolContext
# Store the sampling function for use outside tool calls
_sampler = None
@tool("Start monitoring for anomalies")
def start_monitor(ctx: ToolContext, threshold: float) -> ToolResult:
global _sampler
_sampler = ctx.sample # Save for later use
def monitor_loop():
while True:
data = check_for_anomalies()
if data.score > threshold and _sampler:
response = _sampler(
messages=[{"role": "user", "content": f"Anomaly detected: {data}. Suggest remediation."}],
max_tokens=500,
)
log_remediation(response)
threading.Thread(target=monitor_loop, daemon=True).start()
return ToolResult(result="Monitoring started")

When to Use Sampling vs. Direct API Calls

Use Sampling WhenUse Direct API Calls When
You want the client to control costsYou need specific model guarantees
You want the user to see/approve LLM usageYou need low-latency, high-volume calls
Your tool is running in a client’s environmentYou’re running a standalone server
You don’t want to manage API keysYou need provider-specific features

Client Support

Sampling requires the MCP client to support the sampling capability. Not all clients support it. If sampling isn’t available, ctx.sample() will return an error.