Chapter 18: The Services Layer — API, Analytics, and LSP
What You'll Learn
The previous chapters explored Claude Code's tool system, session management, compaction, and plugin infrastructure. All of those features rest on a quieter but equally important foundation: the services/ directory. It is neither the user-facing interface nor the task-executing tools — it is the connective tissue: API communication, observability, the Language Server Protocol, authentication, and a handful of background intelligence services.
This chapter is a reference map. Rather than exhaustively stepping through every file, the goal is to give you a coherent mental model: what each subdirectory is responsible for, where the key interfaces live, and how the modules work together. With that map in hand, you can navigate the source confidently whenever a deeper dive is needed.
The chapter covers:
services/api/— the Anthropic model API clientservices/analytics/— the observability layer, including feature flags and event trackingservices/lsp/— Language Server Protocol integrationservices/oauth/— OAuth2 authentication- Background services: SessionMemory and autoDream
Directory Overview
services/
├── api/ # Anthropic API client
│ ├── client.ts # Multi-provider client factory (core)
│ ├── claude.ts # BetaMessageStreamParams assembly & streaming
│ ├── withRetry.ts # Retry logic and backoff
│ ├── usage.ts # Utilization queries (Max/Pro plans)
│ ├── errors.ts # Error type definitions
│ └── ...
├── analytics/ # Observability
│ ├── index.ts # Public logEvent API (zero dependencies)
│ ├── sink.ts # Routes events to Datadog / 1P
│ ├── growthbook.ts # GrowthBook feature flags
│ ├── datadog.ts # Datadog batch upload
│ └── firstPartyEventLogger.ts
├── lsp/ # Language Server Protocol
│ ├── LSPClient.ts # LSP client wrapper (vscode-jsonrpc)
│ ├── LSPServerManager.ts # Multi-server instance management
│ ├── LSPServerInstance.ts# Single server lifecycle
│ ├── LSPDiagnosticRegistry.ts
│ ├── manager.ts # Global singleton
│ └── config.ts # Loads LSP server config from plugins
├── oauth/ # OAuth2 authentication
│ ├── client.ts # Auth URL construction, token exchange & refresh
│ ├── auth-code-listener.ts # Local HTTP listener for callback
│ ├── crypto.ts # PKCE code challenge generation
│ └── index.ts
├── SessionMemory/ # Session memory extraction
├── autoDream/ # Background memory consolidation
├── compact/ # Context compaction (see Chapter 14)
├── mcp/ # MCP protocol (see Chapter 15)
└── plugins/ # Plugin system (see Chapter 17)The diagram below shows how these service modules relate to the application core:
services/api/: The Anthropic API Client
Core Responsibility
This is Claude Code's sole gateway to the language model. Every conversation turn, every post-tool-call inference request, flows through here. Its central challenge is that a single high-level interface must transparently support four entirely different backend providers.
The Multi-Provider Client Factory
getAnthropicClient() in client.ts is the entry point for the entire API layer. It inspects environment variables to decide which SDK client to instantiate, and uniformly injects request headers, proxy settings, and timeout configuration:
// services/api/client.ts (simplified)
export async function getAnthropicClient({ maxRetries, model, ... }) {
// Four branches, one per provider
if (isEnvTruthy(process.env.CLAUDE_CODE_USE_BEDROCK)) {
const { AnthropicBedrock } = await import('@anthropic-ai/bedrock-sdk')
return new AnthropicBedrock({ awsRegion, ...ARGS }) as unknown as Anthropic
}
if (isEnvTruthy(process.env.CLAUDE_CODE_USE_FOUNDRY)) {
const { AnthropicFoundry } = await import('@anthropic-ai/foundry-sdk')
return new AnthropicFoundry({ azureADTokenProvider, ...ARGS }) as unknown as Anthropic
}
if (isEnvTruthy(process.env.CLAUDE_CODE_USE_VERTEX)) {
const { AnthropicVertex } = await import('@anthropic-ai/vertex-sdk')
return new AnthropicVertex({ region, googleAuth, ...ARGS }) as unknown as Anthropic
}
// Default: direct Anthropic API (supports both OAuth and API key auth)
return new Anthropic({ apiKey, authToken, ...ARGS })
}A few design decisions worth noting. Each provider SDK is loaded via dynamic import(), so users who never touch Bedrock or Vertex do not pay the bundle cost of those libraries. All four paths share the same ARGS object that carries a unified timeout (600 seconds by default), proxy configuration, and custom headers. Every request automatically includes x-claude-code-session-id, which lets the backend correlate logs across retries.
For Vertex AI, region selection follows a strict priority chain: per-model environment variables take precedence over the global CLOUD_ML_REGION, which takes precedence over a configured default, which falls back to us-east5. This addresses the common pain point of multi-region model deployments.
Retry Logic
withRetry.ts is the resilience backbone of the system. It is an AsyncGenerator: while waiting between attempts it yields a system message so the REPL can display a "retrying…" status indicator, then performs the actual backoff sleep:
// services/api/withRetry.ts (excerpt)
export async function* withRetry<T>(
getClient: () => Promise<Anthropic>,
operation: (client, attempt, context) => Promise<T>,
options: RetryOptions,
): AsyncGenerator<SystemAPIErrorMessage, T> {
for (let attempt = 1; attempt <= maxRetries + 1; attempt++) {
try {
return await operation(client, attempt, retryContext)
} catch (error) {
// Force a fresh client on auth errors (401, revoked OAuth, Bedrock 403, etc.)
if (needsFreshClient(error)) {
client = await getClient()
}
// Yield a progress message so the UI can display the wait state
yield createSystemAPIErrorMessage(error, delayMs, attempt, maxRetries)
await sleep(delayMs, signal)
}
}
}The backoff strategy has several noteworthy behaviors.
Differentiated 529 (overload) handling. Foreground queries — those the user is actively waiting for — retry up to three times. Background queries (summary generation, title inference, suggestion classifiers) bail immediately on 529, because retrying them during a capacity cascade multiplies gateway load without any user-visible benefit.
Opus-to-Sonnet model fallback. After three consecutive 529 errors, if a fallback model is configured, the retry loop throws FallbackTriggeredError rather than continuing. The caller interprets this signal and restarts the query on the fallback model. The responsibility for switching models stays outside the retry loop, keeping the loop's scope clean.
Context overflow self-repair. A 400 "input length and max_tokens exceed context limit" error is parsed with a regex to extract the actual token counts. The loop calculates a safe max_tokens value and retries automatically, rather than surfacing a cryptic error to the user.
// Exponential backoff with 25% random jitter
export function getRetryDelay(
attempt: number,
retryAfterHeader?: string | null,
maxDelayMs = 32000,
): number {
if (retryAfterHeader) {
return parseInt(retryAfterHeader, 10) * 1000
}
const baseDelay = Math.min(BASE_DELAY_MS * Math.pow(2, attempt - 1), maxDelayMs)
const jitter = Math.random() * 0.25 * baseDelay
return baseDelay + jitter
}Usage Tracking
usage.ts provides quota utilization queries for claude.ai subscribers via /api/oauth/usage, returning 5-hour and 7-day window utilization percentages. These numbers are consumed by the REPL's status display.
services/analytics/: The Observability Layer
Design Philosophy: Zero-Dependency Event Queue
analytics/index.ts is the public API for the entire observability layer, and it has a deliberate constraint: this module has no internal project dependencies. The reasoning is straightforward — nearly every other module needs to log events. If analytics/index.ts imported from those same modules, the result would be circular dependencies. The solution is a queue-then-sink architecture:
// services/analytics/index.ts (excerpt)
let sink: AnalyticsSink | null = null
const eventQueue: QueuedEvent[] = []
// Before the sink is attached, all events are queued
export function logEvent(eventName: string, metadata: LogEventMetadata): void {
if (sink === null) {
eventQueue.push({ eventName, metadata, async: false })
return
}
sink.logEvent(eventName, metadata)
}
// Called during app startup to attach the actual routing logic
export function attachAnalyticsSink(newSink: AnalyticsSink): void {
if (sink !== null) return // idempotent
sink = newSink
// Drain asynchronously to avoid blocking the startup path
queueMicrotask(() => { /* drain eventQueue */ })
}This design allows any module to safely call logEvent() before application initialization completes, with zero risk of losing early events.
The codebase also uses the type system as a privacy guardrail:
// Forces an explicit cast, making the developer consciously verify
// that the value contains no code snippets or file paths
export type AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS = neverBecause the type is never, any string value being passed into event metadata must be explicitly cast with this lengthy type name. This makes accidental logging of sensitive data a compile-time error rather than a runtime surprise.
GrowthBook Feature Flags
growthbook.ts encapsulates integration with GrowthBook, handling everything from feature rollouts to A/B experiments. It provides two core read semantics:
getFeatureValue_CACHED_MAY_BE_STALE(feature, default) reads from an in-memory or disk cache synchronously. It is appropriate for startup-critical paths and hot render loops where the overhead of an async call is unacceptable. The value may have been written by a previous process.
getDynamicConfig_BLOCKS_ON_INIT(feature, default) awaits GrowthBook initialization and returns a fresh server value. It is appropriate for security gates and subscription checks where a stale false could unfairly deny access.
The disk cache is written after every successful remote payload fetch — both at initialization and during periodic 6-hour refreshes. The write is a complete replacement, not a merge, so features deleted server-side are also pruned from disk on the next successful payload.
Datadog Integration
datadog.ts implements a lightweight batch-upload mechanism. Events accumulate in a memory queue and are flushed either every 15 seconds or when the batch reaches 100 entries. Only events in the DATADOG_ALLOWED_EVENTS allowlist are forwarded, preventing inadvertent transmission of unapproved event types.
Third-party provider users (Bedrock, Vertex, Foundry) are excluded from Datadog tracking, since those events have no analytical value for Anthropic's own infrastructure monitoring.
User identification uses a privacy-preserving bucketing scheme: the user ID is SHA-256 hashed and reduced modulo 30. This lets monitoring alerts fire on "number of affected buckets" — a proxy for unique users — without storing any user-identifiable data.
Event flow:
user action → logEvent() → sink.logEvent() → two backends
├── trackDatadogEvent() → Datadog (batched)
└── logEventTo1P() → First-party event logservices/lsp/: Language Server Protocol
What LSP Provides to Claude Code
The Language Server Protocol is a standard defined by Microsoft for editor-to-language-tool communication. Claude Code uses it to receive three categories of information:
- Diagnostics — compiler and linter errors and warnings. These are automatically injected as attachments at the start of the next conversation turn, giving the model concrete awareness of "there is a type error here."
- Code actions — quick-fix suggestions from the language server.
- Symbol resolution — go-to-definition responses that help the model understand code structure.
Connection Lifecycle
The LSP integration is organized in a four-layer stack:
manager.ts (global singleton, lifecycle management)
└── LSPServerManager (routes requests across multiple servers by file extension)
├── LSPServerInstance (single server process + state machine)
│ └── LSPClient (vscode-jsonrpc stdio wrapper)
└── LSPDiagnosticRegistry (stores async-arriving diagnostics)manager.ts owns the global singleton and initializes it asynchronously during startup without blocking the main thread:
// services/lsp/manager.ts (simplified)
export function initializeLspServerManager(): void {
if (isBareMode()) return // No LSP in --bare / non-interactive mode
lspManagerInstance = createLSPServerManager()
initializationState = 'pending'
initializationPromise = lspManagerInstance.initialize()
.then(() => {
initializationState = 'success'
registerLSPNotificationHandlers(lspManagerInstance) // wire up diagnostic listener
})
.catch(error => {
initializationState = 'failed'
lspManagerInstance = undefined // unusable instance is discarded
})
}LSPServerManager maps file extensions to server instances. All server configuration comes from plugins (see Chapter 17), rather than being hardcoded:
// services/lsp/config.ts
export async function getAllLspServers() {
const { enabled: plugins } = await loadAllPluginsCacheOnly()
// Load configurations from all plugins in parallel; later plugins win on collision
const results = await Promise.all(plugins.map(p => getPluginLspServers(p, errors)))
return { servers: mergedServerConfigs }
}Servers are started lazily on first use (ensureServerStarted()), not during initialization. This means an LSP plugin that covers TypeScript does not launch tsserver until a TypeScript file is actually opened.
Diagnostic Injection
LSPDiagnosticRegistry.ts follows the same pattern as the async hook registry: diagnostics arrive asynchronously via publishDiagnostics notifications, are stored in a map keyed by file URI, and are delivered as message attachments when getAttachments() is called at the start of the next turn.
Two constants bound the volume of diagnostic content injected into the context window: a maximum of 10 diagnostics per file, and 30 total across all files. An LRU cache tracks which diagnostics have already been delivered so that the same error is not re-reported on every subsequent turn in a long session.
services/oauth/: Authentication
OAuth2 PKCE Flow
Claude Code supports two authentication paths: a static API key and OAuth2 for claude.ai subscribers. The OAuth2 implementation uses PKCE (Proof Key for Code Exchange), the standard countermeasure against authorization code interception attacks in CLI tools:
User CLI Authorization Server
| | |
| /login | |
|------------->| |
| | generate code_verifier |
| | SHA256(verifier)→challenge |
| | |
| | buildAuthUrl() opens browser |
| |----------------------------->|
| | |
| user grants | |
|<-------------------------------code---------|
| | |
| | exchangeCodeForTokens() |
| | (code + verifier) |
| |----------------------------->|
| |<------- access_token --------|
| | refresh_token |oauth/crypto.ts generates a cryptographically random code_verifier. buildAuthUrl() constructs the authorization URL embedding the code_challenge (the SHA-256 hash of the verifier). The local callback is received by a temporary HTTP server on localhost:PORT/callback started by auth-code-listener.ts, which shuts itself down immediately after receiving the code.
Token Refresh Strategy
refreshOAuthToken() contains an important optimization: if the global config already holds complete profile data (including subscription type), it skips the extra /api/oauth/profile round-trip. The comment in the source estimates this eliminates roughly 7 million requests per day at fleet scale.
// services/oauth/client.ts (excerpt)
export async function refreshOAuthToken(refreshToken, { scopes } = {}) {
const response = await axios.post(TOKEN_URL, {
grant_type: 'refresh_token',
refresh_token: refreshToken,
client_id: CLIENT_ID,
scope: CLAUDE_AI_OAUTH_SCOPES.join(' '),
})
// Skip profile fetch when full profile cache is already populated
if (hasFullProfileCache()) {
return buildTokensFromResponse(response.data, cachedProfile)
}
// Otherwise fetch latest profile (subscription type, quota info, etc.)
const profile = await getOauthProfileFromOauthToken(accessToken)
return buildTokensFromResponse(response.data, profile)
}Token expiration is checked proactively before API requests and before GrowthBook initialization, preventing unnecessary 401 errors by refreshing ahead of time.
Other Background Services
SessionMemory: Session-Scoped Notes
services/SessionMemory/ is a background "note-taker." It registers a postSamplingHook that fires after each conversation turn and checks whether it is time to extract and update the session memory file. The extraction itself runs in a forked subagent, isolated from the main conversation's context window. Memory files are stored as Markdown under ~/.claude/memory/, one per project.
Trigger thresholds are configurable: extraction only begins after a minimum number of tool calls (initialization threshold), and then runs periodically every N tool calls thereafter (update threshold). This prevents extraction from running too frequently and consuming quota on short sessions.
autoDream: Cross-Session Memory Consolidation
services/autoDream/ handles long-horizon memory consolidation across sessions. It applies three gates in cheapest-first order:
- Time gate — has at least
minHourselapsed since the last consolidation? (Singlestatcall, minimal cost.) - Session gate — have at least
minSessionsnew transcripts appeared since then? (Directory listing with mtime filtering.) - Lock gate — is another process already consolidating? (File-based mutex preventing concurrent writes.)
Only when all three gates pass does autoDream launch a forked subagent that reads historical session summaries and writes synthesized insights into the user's memory files (CLAUDE.md or a configured path). Both success and failure are reported via logEvent() for monitoring.
compact, mcp, plugins
These three service modules have dedicated chapters:
services/compact/— context compaction, Chapter 14services/mcp/— Model Context Protocol, Chapter 15services/plugins/— plugin installation and management, Chapter 17
How the Modules Collaborate
During a single conversation turn, the services layer participates as follows:
User input
│
▼
QueryEngine.query()
│
├──► analytics/index.logEvent('tengu_query_*') # report query start
│
├──► lsp/manager.getLspServerManager() # fetch pending diagnostics
│ └── LSPDiagnosticRegistry.checkForLSPDiagnostics()
│
├──► oauth (via client.ts getAnthropicClient) # verify token validity
│
├──► api/withRetry → api/client.getAnthropicClient # invoke the model
│ ├── retry loop (yields progress messages)
│ └── streaming response consumption
│
├──► analytics/index.logEvent('tengu_api_success') # report result
│
└──► SessionMemory (postSamplingHook) # async memory updateEach service module is independently responsible for its domain and exposes a focused interface. The api/ layer does not know about analytics; analytics/index.ts has no dependencies on api/. This isolation is what allows the codebase to scale without the coupling problems that typically accumulate in large applications.
Key Takeaways
The services/ directory is Claude Code's infrastructure layer. Each subdirectory has a well-defined scope, and understanding those scopes is more valuable than memorizing individual function signatures.
api/ is the sole model communication outlet. The factory pattern in client.ts provides transparent multi-provider support, and the AsyncGenerator design in withRetry.ts cleanly separates the retry state machine from the progress-reporting concern.
analytics/ solves the circular-dependency problem with a queue-then-sink separation. The never-typed metadata marker turns a potential privacy vulnerability into a compile-time enforcement mechanism, a rare example of the type system doing policy work.
lsp/ brings editor-grade language intelligence to the terminal. The async diagnostic registry bridges the gap between LSP's push-based notification model and the conversation's pull-based attachment model.
oauth/ implements a complete PKCE flow and optimizes routine token refreshes to minimize backend load, demonstrating how infrastructure-level decisions can have meaningful fleet-wide cost implications.
When you encounter calls to these services while reading other parts of the source, return to this chapter's map. The services are designed to be used without needing to understand their internals — but when you do need to understand them, the structure described here is the entry point.