Sampling
Sampling lets an action handler ask the agent's LLM to produce a response mid-handler. The LLM is the agent's - not your own - so sampling doesn't require an API key from your side, and it counts against the user's agent budget.
When to use it
Section titled “When to use it”- Natural-language reformatting - turn a list of rows into a readable summary.
- Classification - given a free-text comment, pick a category from an enum.
- Structured extraction - pull the fields your action needs out of a fuzzy input.
Don't use sampling for:
- Raw chatbot replies. Your action should have a clear return type.
- Very long generations. Sampling is subject to depth limits (max 3 by default) and counts against the agent budget - keep it targeted.
Calling sample
Section titled “Calling sample”import { z } from 'zod';
tesseron.action('classifyComment') .input(z.object({ text: z.string() })) .output(z.object({ sentiment: z.enum(['positive', 'neutral', 'negative']), confidence: z.number() })) .handler(async ({ text }, ctx) => { const result = await ctx.sample({ prompt: `Classify the sentiment of this comment: """${text}"""`, schema: z.object({ sentiment: z.enum(['positive', 'neutral', 'negative']), confidence: z.number().min(0).max(1), }), maxTokens: 80, }); return result; });Wire format
Section titled “Wire format”Request, app → gateway:
{ "jsonrpc": "2.0", "id": 9, "method": "sampling/request", "params": { "invocationId": "inv_abc", "prompt": "Classify the sentiment …", "schema": { "type": "object", "properties": { "sentiment": { "enum": ["positive", "neutral", "negative"] }, "confidence": { "type": "number" } } }, "maxTokens": 80 }}Response, gateway → app:
{ "jsonrpc": "2.0", "id": 9, "result": { "content": { "sentiment": "positive", "confidence": 0.82 } }}If you passed a schema, the SDK validates result.content against it before returning from ctx.sample. If the model's response doesn't parse, you get a validation error and can retry.
Depth limit
Section titled “Depth limit”Sampling is recursive by construction: the agent is a Claude session that called your action, and you're asking that same Claude to think again. Without a cap, a malicious or buggy chain could spiral.
The MCP gateway enforces maxSamplingDepth = 3. Each request from a handler that was itself invoked via sampling increments the counter. Exceeded → error -32008 SamplingDepthExceeded.
Capability gate
Section titled “Capability gate”Not every MCP client supports sampling. Before calling ctx.sample, check the capability:
if (!ctx.agentCapabilities.sampling) { // Fall back: return something useful without the LLM.}const result = await ctx.sample({ /* ... */ });Or let the SDK throw SamplingNotAvailableError (error code -32006) and catch it. Pick whichever fits your UX.
Client compatibility
Section titled “Client compatibility”Sampling only works when the connected MCP client advertises capabilities.sampling during the MCP initialize handshake. Tesseron captures the client's capabilities at that point and flows them to every SDK session as ctx.agentCapabilities.sampling — so a handler always sees the real answer, even when a particular client (for example, Claude Code as of this writing) has not yet implemented sampling/createMessage. If a handler calls ctx.sample() anyway on such a client, the SDK throws a structured SamplingNotAvailableError including the client name (when available) instead of a raw JSON-RPC -32601 Method not found, so callers can branch on error instanceof SamplingNotAvailableError and return a graceful fallback.
Next: elicitation - same shape, but with the user instead of the model.