Skip to content

Chapter 9 — QueryEngine & SDK Interface

What You'll Learn

By the end of this chapter, you will be able to:

  • Explain why QueryEngine exists as a class on top of the stateless query() function and what problem it solves
  • Read QueryEngineConfig and describe the purpose of every field, including the three budget controls, the structured output hook, and the elicitation callback
  • Trace a complete call to submitMessage() through its ten logical stages, from per-turn reset to the final SDKResultMessage
  • Distinguish the slash-command short-circuit path from the full query() loop path and explain when each fires
  • Identify every SDKMessage variant by type and subtype, and know when each is emitted and what its key fields contain
  • Write a self-contained TypeScript program that drives QueryEngine programmatically and collects structured results
  • Describe the public type surface exported from agentSdkTypes.ts and explain the three-submodule split
  • Explain what isNonInteractiveSession: true changes compared to interactive mode and why the distinction matters

9.1 The Role of QueryEngine

The agentic loop in src/query.ts is deliberately stateless. Every call to query() takes a complete snapshot of messages, a system prompt, tools, and configuration, runs its iterator to completion, and returns a terminal value. It does not remember what happened between calls, it does not own a conversation history, and it does not know whether it is running inside a terminal UI or a background automation process.

That statelessness is a virtue for testing and composition, but it creates an immediate practical problem: most real-world uses of Claude Code are not single-shot. A user types several messages in sequence. An automated pipeline submits follow-up prompts after inspecting earlier results. A CI job resumes a session after a partial failure. All of these require state to persist across turns — specifically the growing list of Message objects that forms the conversation history.

QueryEngine is the class that owns that state. It is defined in src/QueryEngine.ts and can be summarised in one sentence: it is a session manager for headless (non-interactive) mode that holds the conversation's mutable message list, wraps query() with per-turn bookkeeping, and emits a typed stream of SDKMessage events for each submitted prompt.

The relationship between QueryEngine and query() parallels the relationship between a stateful HTTP session handler and a stateless request-processing function. query() processes one turn; QueryEngine manages the session across many turns.


9.2 QueryEngineConfig: Every Field Explained

The constructor accepts a single QueryEngineConfig object. Understanding each field is the fastest way to understand what QueryEngine can and cannot do.

typescript
// src/QueryEngine.ts:130-173
export type QueryEngineConfig = {
  cwd: string
  tools: Tools
  commands: Command[]
  mcpClients: MCPServerConnection[]
  agents: AgentDefinition[]
  canUseTool: CanUseToolFn
  getAppState: () => AppState
  setAppState: (f: (prev: AppState) => AppState) => void
  initialMessages?: Message[]
  readFileCache: FileStateCache
  customSystemPrompt?: string
  appendSystemPrompt?: string
  userSpecifiedModel?: string
  fallbackModel?: string
  thinkingConfig?: ThinkingConfig
  maxTurns?: number
  maxBudgetUsd?: number
  taskBudget?: { total: number }
  jsonSchema?: Record<string, unknown>
  verbose?: boolean
  replayUserMessages?: boolean
  handleElicitation?: ToolUseContext['handleElicitation']
  includePartialMessages?: boolean
  setSDKStatus?: (status: SDKStatus) => void
  abortController?: AbortController
  orphanedPermission?: OrphanedPermission
  snipReplay?: (
    yieldedSystemMsg: Message,
    store: Message[],
  ) => { messages: Message[]; executed: boolean } | undefined
}

Identity and working directory. cwd sets the working directory for the session. It is passed to setCwd() at the start of every submitMessage() call, ensuring that relative file paths resolve correctly even if the Node.js process changes its own working directory between calls.

Tool and command registries. tools is the full set of tool definitions the model is allowed to call. commands is the slash-command registry (see Chapter 8). mcpClients provides any Model Context Protocol server connections, and agents is a list of sub-agent definitions used when the model needs to delegate a subtask.

Permission gate. canUseTool is a function the engine calls before executing any tool. It receives the tool name, the proposed input, and the call context, and returns either allow or a denial reason. QueryEngine wraps this function internally — more on that in Section 9.4.1 — to record every denial in a list that is attached to the final result message.

Application state accessors. getAppState and setAppState give the engine read and write access to the broader application state store. These are used by tools and by the system prompt assembly path to read user preferences, permission modes, and session flags without coupling the engine to any specific state implementation.

Conversation seeding. initialMessages lets callers pre-populate the conversation history before the first submitMessage() call. This is used for session resume: a caller reads a saved transcript, passes the messages as initialMessages, and the engine continues from that point without repeating earlier work.

File dedup cache. readFileCache is a FileStateCache instance that tracks which file versions have already been read during the session. When the same file is read again at the same content hash, the cache suppresses the duplicate read from being appended to the context. This prevents the context from filling up with redundant file contents during long sessions where the same source files are consulted repeatedly.

System prompt customisation. customSystemPrompt replaces the default system prompt entirely. appendSystemPrompt adds content after the default prompt without replacing it. Using customSystemPrompt is appropriate when the caller wants full control over the model's instruction set; appendSystemPrompt is more appropriate for adding project-specific context while preserving the default safety and behaviour constraints.

Model selection. userSpecifiedModel is the primary model identifier. If it is omitted, the default main-loop model is used. fallbackModel is tried if the primary model is unavailable or rate-limited. thinkingConfig controls the extended thinking budget when using models that support it.

Turn and budget limits. Three independent controls cap how much work the engine can do. maxTurns is an integer ceiling on the number of agentic loop iterations per submitMessage() call. maxBudgetUsd is a dollar limit expressed as a float; the session is aborted if cumulative API spend exceeds it. taskBudget carries a total field in token units and is passed directly into the query() call as the budget context that drives the checkTokenBudget() logic described in Chapter 5.

Structured output. jsonSchema is a JSON Schema object. When provided, the engine instructs the model to produce a final tool call whose output conforms to that schema. The tool result is then extracted and returned as the result field of the final SDKResultMessage. This is the primary mechanism for programmatic callers that want machine-readable output rather than free text.

Diagnostics and replay. verbose enables detailed logging to the console. replayUserMessages causes the engine to re-yield user messages as SDKUserMessageReplay events, which is useful for clients that want to reconstruct the full conversation from the stream.

Elicitation callback. handleElicitation is a function the model can call when it needs to ask the user a structured question mid-task. In interactive mode, this renders a prompt in the terminal. In SDK mode, the caller provides this function so that automated pipelines can handle questions programmatically — for example by looking up a value in a config file or returning a default.

Partial message inclusion. includePartialMessages controls whether in-progress streaming events are forwarded to the SDK stream during tool execution. When false (the default), the caller only sees complete, finalized messages. When true, the caller receives streaming fragments as they arrive, useful for building progress displays.

Status reporting. setSDKStatus is a callback that the engine calls with status transitions (running, awaiting_input, completed, etc.) so that a supervisor process can track the session lifecycle without consuming the message stream.

Abort and orphaned permission. abortController lets the caller cancel an in-progress submitMessage(). orphanedPermission carries a pending permission request from a previous session that was interrupted before the user could respond; the engine re-presents it at startup rather than dropping it silently.

Snip replay. snipReplay is an advanced callback used when a conversation is resumed after a context compaction. It receives the system message that marks the compaction boundary and the current message store, and returns a replacement set of messages that can be fed back to the model without repeating the original context in full.


9.3 Class Structure and Private State

The class declaration at src/QueryEngine.ts:184 reveals its private fields:

typescript
// src/QueryEngine.ts:184-207
export class QueryEngine {
  private config: QueryEngineConfig
  private mutableMessages: Message[]         // conversation history, persisted across turns
  private abortController: AbortController
  private permissionDenials: SDKPermissionDenial[]
  private totalUsage: NonNullableUsage
  private hasHandledOrphanedPermission = false
  private readFileState: FileStateCache      // file dedup cache
  private discoveredSkillNames = new Set<string>()
  private loadedNestedMemoryPaths = new Set<string>()

  constructor(config: QueryEngineConfig) {
    this.mutableMessages = config.initialMessages ?? []
    this.abortController = config.abortController ?? createAbortController()
    this.permissionDenials = []
    this.readFileState = config.readFileCache
    this.totalUsage = EMPTY_USAGE
  }
}

mutableMessages is the heart of the class. It is a plain array of Message objects that grows with every turn. Every call to submitMessage() appends new user and assistant messages to this array, and the entire array is passed to query() on each call so the model has full conversation history. The array is mutable — submitMessage() updates it in place via a setMessages callback that replaces the reference.

permissionDenials accumulates across the entire session. Each time canUseTool returns a non-allow result, the denial is appended here. At the end of every submitMessage() call, the full list is embedded in the SDKResultMessage so the caller can audit what was blocked.

totalUsage is a running counter of token consumption. It is updated after each turn by merging the turn's usage into the cumulative total, giving the caller accurate lifetime costs rather than per-turn costs.

discoveredSkillNames and loadedNestedMemoryPaths are per-turn caches that are cleared at the start of each submitMessage() call. They prevent redundant skill discovery and memory loading when the same directories are revisited across turns.

hasHandledOrphanedPermission is a one-shot flag. The orphaned permission from the previous session is presented exactly once, during the first submitMessage() call, and the flag prevents it from being re-presented on subsequent turns.


9.4 submitMessage(): The Complete Flow

submitMessage() is an async generator method. Its return type is AsyncGenerator<SDKMessage, void, unknown>, meaning it yields a sequence of SDKMessage values and then terminates. The caller iterates it with for await ... of.

The method orchestrates ten distinct stages. The diagram below shows the happy path through a complete turn.

9.4.1 Per-turn Reset and canUseTool Wrapping

The first thing submitMessage() does is clear discoveredSkillNames and call setCwd(cwd). This reset ensures that skill discovery results from a previous turn do not bleed into the current one, and that any tool that resolves file paths starts from the correct working directory even if the Node.js process has moved.

Immediately after the reset, submitMessage() creates a new function wrappedCanUseTool that closes over the real canUseTool from config:

typescript
// src/QueryEngine.ts (conceptual reconstruction)
const wrappedCanUseTool: CanUseToolFn = async (tool_name, tool_use_id, tool_input, context) => {
  const result = await canUseTool(tool_name, tool_use_id, tool_input, context)
  if (result.behavior !== 'allow') {
    this.permissionDenials.push({ tool_name, tool_use_id, tool_input })
  }
  return result
}

The wrapper does not modify the result; it only intercepts denials. The original canUseTool still makes the actual decision. This separation of concerns keeps the permission system clean: the policy lives in canUseTool, while the audit trail lives in QueryEngine.

9.4.2 System Prompt Assembly

submitMessage() calls fetchSystemPromptParts() with the tool list, the resolved model name, and the MCP client connections. This function returns three components:

defaultSystemPrompt is the array of system prompt blocks that Claude Code generates from its built-in templates. It includes the agent's core behaviour instructions, tool descriptions, and safety constraints. userContext carries user-level customisations such as the contents of CLAUDE.md files found in the project hierarchy. systemContext carries environment information such as the current date, the working directory, and the OS platform.

The engine then assembles the final system prompt:

typescript
const systemPrompt = asSystemPrompt([
  ...(customPrompt ? [customPrompt] : defaultSystemPrompt),
  ...(memoryMechanicsPrompt ? [memoryMechanicsPrompt] : []),
  ...(appendSystemPrompt ? [appendSystemPrompt] : []),
])

The logic is a priority cascade. If customSystemPrompt was provided in config, it replaces defaultSystemPrompt entirely. The memory mechanics prompt is injected only when the CLAUDE_COWORK_MEMORY_PATH_OVERRIDE environment variable is set, enabling the coworking memory system. Finally, appendSystemPrompt is always appended last, regardless of which base prompt was chosen.

9.4.3 User Input Processing and Transcript Persistence

With the system prompt ready, submitMessage() constructs a ProcessUserInputContext and calls processUserInput():

typescript
// src/QueryEngine.ts (conceptual reconstruction)
const { messages: messagesFromUserInput, shouldQuery, allowedTools, model, resultText } =
  await processUserInput({
    input: prompt,
    mode: 'prompt',
    context: { ...processUserInputContext, messages: this.mutableMessages },
    querySource: 'sdk',
  })

The ProcessUserInputContext is constructed with isNonInteractiveSession: true. This single flag changes multiple behaviours downstream: the UI rendering path is skipped, interactive confirmation dialogs are suppressed, and certain tools that require a live terminal are disabled. Everything that follows is aware it is in headless mode.

processUserInput() returns a set of messages that represent the user's turn — typically a single user message wrapping the prompt text, but potentially more if the prompt triggered pre-processing. The returned shouldQuery flag indicates whether the engine should proceed to call the model, or whether the response was produced locally (see Section 9.4.4).

After appending messagesFromUserInput to this.mutableMessages, the engine writes the updated history to the session transcript before sending anything to the API. This ordering is deliberate: if the process is killed between sending the request and receiving the response, the user message is already persisted. On resume, the caller can detect the incomplete turn and retry from a consistent state.

9.4.4 The Slash Command Short-Circuit Path

When the user's prompt is a slash command that can be handled locally — such as /clear, /help, or a custom local command — processUserInput() sets shouldQuery = false and places the command's output in resultText. The engine does not call the model at all.

In this case, submitMessage() follows the short-circuit path:

  1. Yield SDKSystemInitMessage as usual (the caller always receives this first).
  2. If replayUserMessages is set, yield the user message as an SDKUserMessageReplay event.
  3. Package resultText into an SDKAssistantMessage and yield it.
  4. Yield a terminal SDKResultMessage with subtype: 'success' and result: resultText.

The short-circuit path is important for programmatic callers because it means they can use slash commands without the caller needing to special-case them. The SDKResultMessage always arrives, regardless of which path was taken.

9.4.5 The query() Loop and SDKMessage Mapping

When shouldQuery is true, the engine enters the main path. It yields the SDKSystemInitMessage and then opens a for await ... of loop over the query() generator:

typescript
for await (const message of query({
  messages,
  systemPrompt, userContext, systemContext,
  canUseTool: wrappedCanUseTool,
  toolUseContext: processUserInputContext,
  querySource: 'sdk',
  maxTurns, taskBudget,
})) {
  // translate internal Message types to SDKMessage types
}

The internal query() generator yields several message types, and submitMessage() maps each one to its corresponding SDK type:

An assistant role message containing the model's response is mapped to SDKAssistantMessage. The content blocks inside — text, tool use requests, thinking blocks — are preserved as-is.

A user role message containing tool results is mapped to SDKUserMessage. Each tool result in the message represents the output of one tool call.

A compact_boundary message marks where context compaction occurred. It is passed through as SDKCompactBoundaryMessage. A caller that stores messages for session resume needs this boundary to know which messages were produced after compaction.

A tombstone message indicates that a message was removed from the conversation during compaction. QueryEngine removes it from mutableMessages rather than yielding it, keeping the stored history consistent.

Progress and streaming fragment events are yielded only when includePartialMessages: true is set in config. Otherwise they are silently consumed.

After each complete message, the engine calls accumulateUsage() to merge the turn's token counts into this.totalUsage, ensuring the lifetime usage counter stays current.

9.4.6 The Final SDKResultMessage

When the query() generator completes, submitMessage() yields a single SDKResultMessage that summarises the entire turn:

typescript
yield {
  type: 'result',
  subtype: 'success',
  is_error: false,
  duration_ms: Date.now() - startTime,
  duration_api_ms: getTotalAPIDuration(),
  num_turns: ...,
  result: structuredOutputFromTool ?? resultText ?? '',
  stop_reason: lastStopReason,
  session_id: getSessionId(),
  total_cost_usd: getTotalCost(),
  usage: this.totalUsage,
  modelUsage: getModelUsage(),
  permission_denials: this.permissionDenials,
}

The result field contains the final text output. When jsonSchema was provided in config, structuredOutputFromTool holds the parsed JSON object extracted from the structured-output tool call, and it takes priority over resultText. This is how programmatic callers receive machine-readable responses.

stop_reason conveys why the model stopped: end_turn (model decided it was done), max_turns (the maxTurns ceiling was reached), tool_use (the last message contained tool calls that were not executed — typically a budget cutoff), or other values defined by the API.

permission_denials is the complete list of tools that were blocked during this turn. Each entry carries the tool name, the tool-use ID, and the attempted input, giving the caller full visibility into what was refused and why.

If an error occurred during execution, the subtype changes to 'error_during_execution' and is_error becomes true. If the model exceeded its turn limit, subtype becomes 'error_max_turns'. The caller should always check subtype before trusting result.


9.5 SDKMessage Variants

Every value yielded by submitMessage() conforms to the SDKMessage union type. The table below lists all variants, when they are emitted, and which fields are worth inspecting.

typesubtypeWhen emittedKey fields
systeminitFirst message of every submitMessage() callsession_id, model, tools, mcp_servers, permissionMode, apiKeySource
assistantEach time the model produces a responsemessage.content (array of text, tool_use, thinking blocks)
userEach time tool results are fed back to the modelmessage.content (array of tool_result blocks)
userreplayWhen replayUserMessages: true and the loop replays a prior user messagemessage.content
systemcompact_boundaryWhen context compaction occurs mid-sessionsummary (the compressed context text)
resultsuccessTurn completed normallyresult, usage, total_cost_usd, duration_ms, stop_reason, permission_denials
resulterror_during_executionAn unhandled exception occurredis_error: true, result (error message text)
resulterror_max_turnsmaxTurns was reached before the model stopped naturallyis_error: true, num_turns
resulterror_during_executionAbort signal firedis_error: true, result: 'Aborted'

The system/init message deserves special attention. It is always the first message in the stream, and it is the only message that carries session metadata. A caller that stores messages for replay must save this message separately from the conversation history, because it describes the session context rather than the conversation content.

The result message is always the last message in the stream. A caller can use it as a sentinel to know that the generator has finished. If the caller is only interested in the final answer and not the intermediate steps, it can drain the generator and inspect only the last message.


9.6 Programmatic Usage Example

The following example shows how to drive QueryEngine from a TypeScript program. It submits a single prompt, collects the stream, and prints the final result along with token usage.

typescript
import { QueryEngine } from './src/QueryEngine.js'
import { getTools } from './src/tools/index.js'
import { getCommands } from './src/commands/index.js'
import { createFileStateCache } from './src/utils/fileStateCache.js'
import { createAppState, useAppStateStore } from './src/AppContext.js'

async function runHeadlessQuery(prompt: string): Promise<string> {
  // Build a minimal config for a headless, single-turn query.
  const engine = new QueryEngine({
    cwd: process.cwd(),
    tools: await getTools(),
    commands: await getCommands(),
    mcpClients: [],
    agents: [],
    canUseTool: async () => ({ behavior: 'allow' }),
    getAppState: () => useAppStateStore.getState(),
    setAppState: f => useAppStateStore.setState(f(useAppStateStore.getState())),
    readFileCache: createFileStateCache(),
    maxTurns: 10,
    verbose: false,
  })

  let finalResult = ''

  for await (const message of engine.submitMessage(prompt)) {
    if (message.type === 'result') {
      if (message.is_error) {
        throw new Error(`QueryEngine error: ${message.subtype} — ${message.result}`)
      }
      finalResult = message.result
      console.log(`Cost: $${message.total_cost_usd.toFixed(6)}`)
      console.log(`Input tokens: ${message.usage.input_tokens}`)
      console.log(`Output tokens: ${message.usage.output_tokens}`)
      console.log(`Turns: ${message.num_turns}`)
      if (message.permission_denials.length > 0) {
        console.warn('Blocked tools:', message.permission_denials.map(d => d.tool_name))
      }
    } else if (message.type === 'assistant') {
      // Print text blocks as they arrive for a streaming feel.
      for (const block of message.message.content) {
        if (block.type === 'text') process.stdout.write(block.text)
      }
    }
  }

  return finalResult
}

// Multi-turn example: reuse the same engine instance across turns.
async function runMultiTurnSession() {
  const engine = new QueryEngine({ /* ... same config ... */ })

  // First turn: ask a question.
  for await (const msg of engine.submitMessage('List the files in the src directory.')) {
    if (msg.type === 'result') console.log('Turn 1 done:', msg.result)
  }

  // Second turn: follow up. The engine retains the conversation history.
  for await (const msg of engine.submitMessage('Which of those files is the largest?')) {
    if (msg.type === 'result') console.log('Turn 2 done:', msg.result)
  }
}

The critical thing to note in the multi-turn example is that the engine instance is reused. this.mutableMessages accumulates both turns' exchanges, so the second call to submitMessage() gives the model the full context of the first turn. Creating a new QueryEngine instance for each turn would lose the history and force the model to work without context.

For structured JSON output, pass a jsonSchema field to the config:

typescript
const engine = new QueryEngine({
  // ... other fields ...
  jsonSchema: {
    type: 'object',
    properties: {
      files: { type: 'array', items: { type: 'string' } },
      count: { type: 'integer' },
    },
    required: ['files', 'count'],
  },
})

for await (const msg of engine.submitMessage('List all TypeScript files in src/')) {
  if (msg.type === 'result' && !msg.is_error) {
    // msg.result is the JSON string of the structured output.
    const data = JSON.parse(msg.result) as { files: string[]; count: number }
    console.log(`Found ${data.count} TypeScript files`)
  }
}

9.7 The Public SDK Type Surface: agentSdkTypes.ts

src/entrypoints/agentSdkTypes.ts is the single file that external consumers should import from. It re-exports from three submodules, each with a distinct responsibility.

src/entrypoints/sdk/coreTypes.ts contains the serializable types: the SDKMessage union and all its variants, the HOOK_EVENTS constant array listing every lifecycle event name the SDK can fire, and the EXIT_REASONS constant array listing valid session termination reasons. These types are pure data — they carry no functions and no class instances, making them safe to serialize to JSON and send over a network boundary.

The full list of hook events defined at src/entrypoints/sdk/coreTypes.ts:25-53 shows the breadth of the lifecycle:

typescript
export const HOOK_EVENTS = [
  'PreToolUse', 'PostToolUse', 'PostToolUseFailure', 'Notification',
  'UserPromptSubmit', 'SessionStart', 'SessionEnd', 'Stop', 'StopFailure',
  'SubagentStart', 'SubagentStop', 'PreCompact', 'PostCompact',
  'PermissionRequest', 'PermissionDenied', 'Setup',
  'TeammateIdle', 'TaskCreated', 'TaskCompleted',
  'Elicitation', 'ElicitationResult', 'ConfigChange',
  'WorktreeCreate', 'WorktreeRemove', 'InstructionsLoaded',
  'CwdChanged', 'FileChanged',
] as const

These constants are used by the hook system to route lifecycle notifications to registered handlers. A programmatic caller that wants to observe tool usage can register on PreToolUse and PostToolUse. A caller managing multiple worktrees can listen on WorktreeCreate and WorktreeRemove. The Elicitation and ElicitationResult pair covers the mid-task question flow.

src/entrypoints/sdk/runtimeTypes.ts contains the non-serializable types: the Options object accepted by the top-level query() function, and the Query interface that query() returns. These types include function references and AsyncIterable interfaces and cannot be serialized. Keeping them in a separate module makes it easy for build tools to tree-shake them away in contexts where only serializable types are needed.

src/entrypoints/sdk/settingsTypes.generated.ts exports the Settings type, which is generated from the settings schema. It is imported with export type rather than export * to prevent the generated constants from polluting the public namespace.

src/entrypoints/sdk/toolTypes.ts exports the tool-definition types and helpers. The most important export is the tool() factory function, which takes a name, description, input schema, and handler function and returns an SdkMcpToolDefinition. This is the standard way for external callers to define tools that the engine can invoke:

typescript
// agentSdkTypes.ts re-exports
export function tool(
  name: string,
  description: string,
  inputSchema: Record<string, unknown>,
  handler: (input: unknown) => Promise<unknown>,
  extras?: ToolExtras,
): SdkMcpToolDefinition

export function createSdkMcpServer(options: SdkMcpServerOptions): McpSdkServerConfigWithInstance

export class AbortError extends Error {}

export function query(params: {
  prompt: string | AsyncIterable<SDKUserMessage>
  options?: Options
}): Query

The top-level query() exported from agentSdkTypes.ts is a higher-level convenience function distinct from the internal query() in src/query.ts. It accepts either a simple string prompt or an async iterable of SDKUserMessage objects for streaming input, and it returns a Query interface that is itself an async iterable of SDKMessage objects. This is the function that most external SDK consumers will use when they do not need to manage session state themselves.

AbortError is a typed error subclass that is thrown when the caller's AbortController fires. Callers should catch this type explicitly to distinguish intentional aborts from unexpected errors.


9.8 Headless vs Interactive Mode

The distinction between headless and interactive mode is not a single flag — it is a constellation of behavioural differences that flow from the isNonInteractiveSession: true setting placed in processUserInputContext at src/QueryEngine.ts.

Rendering. In interactive mode, assistant messages are rendered through Ink, React's terminal renderer. Tool results appear as formatted boxes, permission requests open interactive dialogs, and the UI updates in real time as tokens stream in. In headless mode, none of this happens. QueryEngine does not import Ink, does not render any JSX, and has no concept of a terminal cursor position. The output is pure data: SDKMessage objects yielded from a generator.

Permission requests. In interactive mode, when a tool requires a permission the user has not pre-granted, the engine pauses and presents a confirmation prompt. The user types y or n. In headless mode, the canUseTool function passed in config makes the decision programmatically. If it returns allow, the tool runs. If it returns a denial, the tool is blocked and the denial is recorded. There is no pause, no human in the loop.

Elicitation. In interactive mode, when the model asks a mid-task question via the elicitation mechanism, the engine renders a form in the terminal and waits. In headless mode, the handleElicitation callback from config is called instead. If no callback was provided, the elicitation resolves with a null answer.

Tool availability. Some tools are not available in non-interactive sessions. Any tool that checks isNonInteractiveSession before running will short-circuit when called from QueryEngine. This is intentional: tools that open a file in the user's editor, or tools that display a visual diff in a GUI pane, make no sense in a headless context.

Slash command handling. In interactive mode, local slash commands (those with type: 'local') can render arbitrary JSX in the terminal. In headless mode, the JSX rendering path is skipped, and only the text output of the command is captured and yielded as a plain SDKAssistantMessage.

Message stream vs UI events. In interactive mode, the component tree subscribes to the message store via React state and re-renders when messages arrive. The caller never sees raw Message objects. In headless mode, the caller receives SDKMessage events directly and is responsible for any display or storage logic.

Understanding this distinction matters for callers that want to replicate some interactive behaviour in a headless context. If you want progress updates, set includePartialMessages: true. If you want to handle permission requests with a custom policy, provide a rich canUseTool implementation. If you want to handle elicitation, provide handleElicitation. The SDK surface gives you hooks for all of these; none are automatic.


Key Takeaways

QueryEngine is a thin stateful shell around the stateless query() function. Its only durable state is the growing mutableMessages array and the cumulative totalUsage counter. Everything else is reconstructed fresh on each submitMessage() call.

QueryEngineConfig is the complete specification of a headless session. The three budget controls — maxTurns, maxBudgetUsd, and taskBudget — operate at different levels of abstraction: iteration count, dollar spend, and token count respectively. All three can be active simultaneously.

submitMessage() always yields exactly one SDKSystemInitMessage as its first event, and exactly one SDKResultMessage as its last event. Callers can always rely on this invariant regardless of whether the turn used the short-circuit path or the full query() loop.

The permission_denials field in SDKResultMessage is the audit trail for the session. In automated environments where canUseTool enforces a policy programmatically, this list tells the caller exactly what was blocked and with what inputs, enabling downstream logging and policy review.

The split between coreTypes.ts (serializable), runtimeTypes.ts (non-serializable), and toolTypes.ts (tool helpers) in the SDK entry point is a deliberate design that lets consumers import only what they need and enables the serializable types to be shared across process boundaries without pulling in Node.js-specific dependencies.

The isNonInteractiveSession: true flag is not a single switch but a propagating signal. It flows through ProcessUserInputContext into every subsystem that checks it — tool availability, permission handling, UI rendering, elicitation — and transforms each one from a human-facing interface into a programmatic one. The entire headless SDK is built on this one field being true.

Built for learners who want to read Claude Code like a real system.