Sampling
Sampling lets your server-side code request LLM completions from the MCP client. Instead of calling an AI API directly, your tool sends a sampling request through MCP, and the client (Claude Desktop, Cursor, etc.) handles the actual LLM call.
This is powerful because:
- Your tool code doesn’t need API keys or AI SDK dependencies
- The client controls model selection, rate limiting, and cost
- The user sees and approves the LLM interaction
How Sampling Works
Your Code pmcp (bridge) MCP Client │ │ │ ├─ SamplingRequest ───►│ │ │ ├─ sampling/createMessage ►│ │ │ ├─ LLM call │ │ │◄─ response │ │◄─ CreateMessageResult ───┤ │◄─ SamplingResponse ──┤ │The flow is bidirectional: your code initiates the request, pmcp forwards it to the MCP client, the client calls the LLM, and the response flows back.
Python
Use ctx.sample() inside a tool handler to request an LLM completion:
from protomcp import tool, ToolResult, ToolContext
@tool("Translate text to another language")def translate(ctx: ToolContext, text: str, target_language: str) -> ToolResult: response = ctx.sample( messages=[ {"role": "user", "content": f"Translate the following to {target_language}:\n\n{text}"} ], system_prompt=f"You are a translator. Translate accurately to {target_language}.", max_tokens=1000, )
if response.get("error"): return ToolResult(result=f"Translation failed: {response['error']}", is_error=True)
return ToolResult(result=response["content"])Sampling Parameters
| Parameter | Type | Description |
|---|---|---|
messages | list[dict] | Conversation messages (role + content) |
system_prompt | str | System prompt for the LLM |
max_tokens | int | Maximum tokens to generate |
model_preferences | dict | Hints about model selection (client may ignore) |
Event-Driven Sampling
Sampling isn’t limited to tool calls. Your server can trigger LLM calls in response to events:
import threadingfrom protomcp import tool, ToolResult, ToolContext
# Store the sampling function for use outside tool calls_sampler = None
@tool("Start monitoring for anomalies")def start_monitor(ctx: ToolContext, threshold: float) -> ToolResult: global _sampler _sampler = ctx.sample # Save for later use
def monitor_loop(): while True: data = check_for_anomalies() if data.score > threshold and _sampler: response = _sampler( messages=[{"role": "user", "content": f"Anomaly detected: {data}. Suggest remediation."}], max_tokens=500, ) log_remediation(response)
threading.Thread(target=monitor_loop, daemon=True).start() return ToolResult(result="Monitoring started")When to Use Sampling vs. Direct API Calls
| Use Sampling When | Use Direct API Calls When |
|---|---|
| You want the client to control costs | You need specific model guarantees |
| You want the user to see/approve LLM usage | You need low-latency, high-volume calls |
| Your tool is running in a client’s environment | You’re running a standalone server |
| You don’t want to manage API keys | You need provider-specific features |
Client Support
Sampling requires the MCP client to support the sampling capability. Not all clients support it. If sampling isn’t available, ctx.sample() will return an error.