fix(ai): patch ai@6.0.134 — drop O(n²) partialOutput accumulation causing heap OOM on long agent runs (#184 )

Production OOM'd (JS heap 1.85 GB / 2 GB limit) during a ~20-step, ~28k-chunk autonomous agent turn. Heap snapshot analysis (memlab) showed a single DefaultStreamTextResult retaining ~1.7 GB via the never-consumed leftover tee() branch of its internal baseStream. Root cause in ai@6.0.134: streamText substitutes the default text() output strategy even when the caller passes NO `output` option. Its createOutputTransformStream then accumulates the ENTIRE turn text and, on EVERY text-delta, enqueues `{ part, partialOutput }` where partialOutput is a flat snapshot of all text so far (JSON.stringify flattens the cons-string) — O(n²) memory across the turn. Every consumer accessor tees baseStream and keeps the second branch as the new baseStream; the final leftover branch is never read, so its controller queue holds every chunk (28,225 x ~164 KB in the OOM'd run) for the life of the turn. Fix (pnpm patch on both dist/index.js and dist/index.mjs): - pass the raw, possibly-undefined `output` option into createOutputTransformStream instead of defaulting to text() - when output == null, publish each text-delta immediately without accumulating turn text or producing partialOutput snapshots; streaming granularity is unchanged, and callers that DO request an output strategy keep the original behavior Our server never uses partialOutputStream / experimental_output / the output option, so no behavior changes for us beyond memory. Regression spec ai-sdk-partial-output.patch.spec.ts drives the real patched SDK with MockLanguageModelV3: asserts per-delta textStream granularity, an EMPTY experimental_partialOutputStream (tripwire — yields one cumulative partial per delta when unpatched), and the PATCH(docmost marker in both installed dist bundles. Also documents the patch in AGENTS.md (must be re-created when bumping `ai`) and CHANGELOG.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-05 02:13:17 +03:00
6 changed files with 179 additions and 7 deletions
@@ -294,7 +294,7 @@ Vite SPA. Code is organized by feature under `apps/client/src/features/*` (mirro
 - **Errors must never be swallowed or shown as generic messages.** Every caught error MUST (1) be logged in full to the console/logger — error name, message, stack, `cause`, and (for HTTP/provider failures) the status code and response body — and (2) be surfaced to the user with a *specific, human-readable explanation of what actually went wrong*, never a bare generic string like "Something went wrong" / "Could not start recording" / "Transcription failed". Include the real reason (the underlying error/provider message) in the user-facing text. On the server, wrap third-party/provider failures with `describeProviderError` (or equivalent) and rethrow as a meaningful HTTP status + message — never let them collapse into an opaque 500. On the client, `console.error(<context>, err)` the raw error AND show the extracted reason (e.g. `err.response?.data?.message`, or the error `name: message`) in the notification.
 - The version string shown in the UI comes from `APP_VERSION` (CI/Docker) or `git describe --tags --always` (local), resolved in `vite.config.ts` — not from `package.json`.
 - Server TS config is permissive (`noImplicitAny: false`, `strictNullChecks: false`, `no-explicit-any` lint disabled). Follow the existing relaxed style rather than tightening types broadly.
- Dependency versions are heavily pinned via `pnpm.overrides` and `pnpm.patchedDependencies` (`scimmy`, `yjs`) in the root `package.json`. Don't bump pinned/patched deps casually; the patches and overrides exist for compatibility/security reasons.
+- Dependency versions are heavily pinned via `pnpm.overrides` and `pnpm.patchedDependencies` (`scimmy`, `yjs`, `ai`) in the root `package.json`. Don't bump pinned/patched deps casually; the patches and overrides exist for compatibility/security reasons. The `ai@6.0.134` patch disables the SDK's O(n²) cumulative `partialOutput` accumulation when no output strategy is requested (server heap OOM on long agent runs, #184; tripwire test: `apps/server/src/integrations/ai/ai-sdk-partial-output.patch.spec.ts`) — it MUST be re-created via `pnpm patch` when bumping `ai`.
 - **Adding/renaming/removing an MCP tool requires updating `SERVER_INSTRUCTIONS`** in `packages/mcp/src/index.ts` — the intent-routing guide MCP clients receive on initialize. This applies both to inline `server.registerTool(...)` calls in `index.ts` and to specs in `packages/mcp/src/tool-specs.ts`. Enforced by `packages/mcp/test/unit/server-instructions.test.mjs`, which fails when a registered tool is not mentioned in the guide (deliberate opt-outs go into its `EXCEPTIONS` list). `packages/mcp/build/` is gitignored and rebuilt in CI/Docker via `pnpm build` (same convention as `git-sync`/`prosemirror-markdown`) — never commit it; rebuild locally after editing to run the tests.

 ## CI / release
@@ -169,6 +169,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ### Fixed

+- **The server no longer runs out of heap during long autonomous agent runs.** A
+  new pnpm patch on `ai@6.0.134` stops the SDK from building a cumulative
+  snapshot of the ENTIRE turn text on every streamed text-delta when no output
+  strategy was requested (our server never requests one). Unpatched, those
+  O(n²) `partialOutput` snapshots piled up in a never-consumed internal
+  `tee()` branch of the stream result — a ~20-step, ~28k-chunk agent run
+  retained ~1.7 GB and OOM'd the 2 GB JS heap. Streaming granularity is
+  unchanged; the patch must be re-created if `ai` is ever bumped. (#184)
 - **Internal links in exported Markdown no longer lose their visible text.** A
  link whose target page name had no file extension (e.g. a bare title) was
  collapsed to empty text during export, producing an unclickable, label-less
@@ -0,0 +1,92 @@
+import { readFileSync } from 'fs';
+import { streamText } from 'ai';
+import { MockLanguageModelV3, simulateReadableStream } from 'ai/test';
+
+/**
+ * Regression tests for patches/ai@6.0.134.patch (server heap OOM on long
+ * autonomous agent runs, #184).
+ *
+ * Unpatched ai@6.0.134 substitutes the default text() output strategy even
+ * when the caller passes NO `output` option. Its createOutputTransformStream
+ * then accumulates the ENTIRE turn text and, on EVERY text-delta, enqueues a
+ * flat snapshot of all text so far as `partialOutput` (O(n^2) memory). Those
+ * snapshots pile up in the never-consumed leftover tee() branch of
+ * DefaultStreamTextResult.baseStream, which is what OOM'd production during a
+ * ~28k-chunk agent turn. The pnpm patch skips partialOutput production
+ * entirely when no output strategy was requested, while keeping per-delta
+ * streaming granularity.
+ */
+describe('ai@6.0.134 pnpm patch: no partialOutput accumulation without an output strategy', () => {
+  const makeModel = () =>
+    new MockLanguageModelV3({
+      doStream: async () => ({
+        stream: simulateReadableStream({
+          chunks: [
+            { type: 'stream-start' as const, warnings: [] },
+            { type: 'text-start' as const, id: '1' },
+            { type: 'text-delta' as const, id: '1', delta: 'Hello' },
+            { type: 'text-delta' as const, id: '1', delta: ', ' },
+            { type: 'text-delta' as const, id: '1', delta: 'world!' },
+            { type: 'text-end' as const, id: '1' },
+            {
+              type: 'finish' as const,
+              finishReason: { unified: 'stop' as const, raw: 'stop' },
+              usage: {
+                inputTokens: {
+                  total: 1,
+                  noCache: undefined,
+                  cacheRead: undefined,
+                  cacheWrite: undefined,
+                },
+                outputTokens: { total: 1, text: 1, reasoning: undefined },
+              },
+            },
+          ],
+        }),
+      }),
+    });
+
+  it('preserves per-delta streaming granularity in textStream', async () => {
+    const result = streamText({ model: makeModel(), prompt: 'hi' });
+
+    const deltas: string[] = [];
+    for await (const delta of result.textStream) {
+      deltas.push(delta);
+    }
+
+    // The patch must NOT coalesce or drop deltas: three model deltas arrive
+    // as three separate textStream chunks.
+    expect(deltas).toEqual(['Hello', ', ', 'world!']);
+  });
+
+  it('emits NO partialOutput values when the caller did not request an output strategy', async () => {
+    const result = streamText({ model: makeModel(), prompt: 'hi' });
+
+    // Fully consume the primary stream first (mirrors production usage).
+    for await (const _ of result.textStream) {
+      // drain
+    }
+
+    const partials: unknown[] = [];
+    for await (const partial of result.experimental_partialOutputStream) {
+      partials.push(partial);
+    }
+
+    // TRIPWIRE: on unpatched ai@6.0.134 the default text() output strategy
+    // yields one cumulative partial per text-delta here (['Hello', 'Hello, ',
+    // 'Hello, world!']). An empty stream proves the patch is applied and no
+    // cumulative snapshots are being produced (and thus none can pile up in
+    // the leftover internal tee branch).
+    expect(partials).toEqual([]);
+  });
+
+  it('both installed dist builds (CJS and ESM) carry the patch marker', () => {
+    // Secondary guard: pins the patch to BOTH bundles the SDK ships, since
+    // the NestJS server consumes CJS while other tooling may load ESM.
+    const cjsPath = require.resolve('ai');
+    const mjsPath = cjsPath.replace(/index\.js$/, 'index.mjs');
+    expect(cjsPath).toMatch(/index\.js$/);
+    expect(readFileSync(cjsPath, 'utf8')).toContain('PATCH(docmost');
+    expect(readFileSync(mjsPath, 'utf8')).toContain('PATCH(docmost');
+  });
+});
@@ -96,7 +96,8 @@
  "pnpm": {
    "patchedDependencies": {
      "scimmy@1.3.5": "patches/scimmy@1.3.5.patch",
-      "yjs@13.6.30": "patches/yjs@13.6.30.patch"
+      "yjs@13.6.30": "patches/yjs@13.6.30.patch",
+      "ai@6.0.134": "patches/ai@6.0.134.patch"
    },
    "overrides": {
      "prosemirror-changeset": "2.4.0",
@@ -0,0 +1,68 @@
+diff --git a/dist/index.js b/dist/index.js
+index ae447a12f7823ec0a00837ee9f0eb809a610d5f8..a3402b2c2d021ef432cfa76e35d370073d525135 100644
+--- a/dist/index.js
+++ b/dist/index.js
+@@ -6578,9 +6578,19 @@ function createOutputTransformStream(output) {
+         controller.enqueue({ part: chunk, partialOutput: void 0 });
+         return;
+       }
+-      text2 += chunk.text;
+       textChunk += chunk.text;
+       textProviderMetadata = (_a21 = chunk.providerMetadata) != null ? _a21 : textProviderMetadata;
+      if (output == null) {
+        // PATCH(docmost #OOM): no output strategy requested -> publish each
+        // text-delta immediately and do NOT build cumulative partialOutput
+        // snapshots. Unpatched, the default text() output snapshots the ENTIRE
+        // accumulated turn text on every delta (O(n^2) memory) and those
+        // snapshots pile up in the never-consumed leftover tee branch of
+        // DefaultStreamTextResult.baseStream -> heap OOM on long agent turns.
+        publishTextChunk({ controller });
+        return;
+      }
+      text2 += chunk.text;
+       const result = await output.parsePartialOutput({ text: text2 });
+       if (result !== void 0) {
+         const currentJson = JSON.stringify(result.partial);
+@@ -6959,7 +6969,7 @@ var DefaultStreamTextResult = class {
+         })
+       );
+     }
+-    this.baseStream = stream.pipeThrough(createOutputTransformStream(output != null ? output : text())).pipeThrough(eventProcessor);
+    this.baseStream = stream.pipeThrough(createOutputTransformStream(output)).pipeThrough(eventProcessor);
+     const { maxRetries, retry } = prepareRetries({
+       maxRetries: maxRetriesArg,
+       abortSignal
+diff --git a/dist/index.mjs b/dist/index.mjs
+index 663875332e3f9a9bd167c25583c515876f42951b..b840b0502c9894df983e0154805abb80e70e6331 100644
+--- a/dist/index.mjs
+++ b/dist/index.mjs
+@@ -6501,9 +6501,19 @@ function createOutputTransformStream(output) {
+         controller.enqueue({ part: chunk, partialOutput: void 0 });
+         return;
+       }
+-      text2 += chunk.text;
+       textChunk += chunk.text;
+       textProviderMetadata = (_a21 = chunk.providerMetadata) != null ? _a21 : textProviderMetadata;
+      if (output == null) {
+        // PATCH(docmost #OOM): no output strategy requested -> publish each
+        // text-delta immediately and do NOT build cumulative partialOutput
+        // snapshots. Unpatched, the default text() output snapshots the ENTIRE
+        // accumulated turn text on every delta (O(n^2) memory) and those
+        // snapshots pile up in the never-consumed leftover tee branch of
+        // DefaultStreamTextResult.baseStream -> heap OOM on long agent turns.
+        publishTextChunk({ controller });
+        return;
+      }
+      text2 += chunk.text;
+       const result = await output.parsePartialOutput({ text: text2 });
+       if (result !== void 0) {
+         const currentJson = JSON.stringify(result.partial);
+@@ -6882,7 +6892,7 @@ var DefaultStreamTextResult = class {
+         })
+       );
+     }
+-    this.baseStream = stream.pipeThrough(createOutputTransformStream(output != null ? output : text())).pipeThrough(eventProcessor);
+    this.baseStream = stream.pipeThrough(createOutputTransformStream(output)).pipeThrough(eventProcessor);
+     const { maxRetries, retry } = prepareRetries({
+       maxRetries: maxRetriesArg,
+       abortSignal
@@ -44,6 +44,9 @@ overrides:
  ip-address: 10.1.1

 patchedDependencies:
+  ai@6.0.134:
+    hash: f60bfc3357e01e1f3978c6c40fdd65aeb33fefaad7179cde8676465b6c5ff4d9
+    path: patches/ai@6.0.134.patch
  scimmy@1.3.5:
    hash: 775d80f86830b2c5dd1a250c9802c10f8fc3da3c7898373de5aa0c23993d1673
    path: patches/scimmy@1.3.5.patch
@@ -623,10 +626,10 @@ importers:
        version: 8.3.0(socket.io-adapter@2.5.4)
      ai:
        specifier: ^6.0.134
-        version: 6.0.134(zod@4.3.6)
+        version: 6.0.134(patch_hash=f60bfc3357e01e1f3978c6c40fdd65aeb33fefaad7179cde8676465b6c5ff4d9)(zod@4.3.6)
      ai-sdk-ollama:
        specifier: ^3.8.1
-        version: 3.8.1(ai@6.0.134(zod@4.3.6))(zod@4.3.6)
+        version: 3.8.1(ai@6.0.134(patch_hash=f60bfc3357e01e1f3978c6c40fdd65aeb33fefaad7179cde8676465b6c5ff4d9)(zod@4.3.6))(zod@4.3.6)
      bcrypt:
        specifier: ^6.0.0
        version: 6.0.0
@@ -16355,17 +16358,17 @@ snapshots:

  agent-base@7.1.4: {}

-  ai-sdk-ollama@3.8.1(ai@6.0.134(zod@4.3.6))(zod@4.3.6):
+  ai-sdk-ollama@3.8.1(ai@6.0.134(patch_hash=f60bfc3357e01e1f3978c6c40fdd65aeb33fefaad7179cde8676465b6c5ff4d9)(zod@4.3.6))(zod@4.3.6):
    dependencies:
      '@ai-sdk/provider': 3.0.8
      '@ai-sdk/provider-utils': 4.0.21(zod@4.3.6)
-      ai: 6.0.134(zod@4.3.6)
+      ai: 6.0.134(patch_hash=f60bfc3357e01e1f3978c6c40fdd65aeb33fefaad7179cde8676465b6c5ff4d9)(zod@4.3.6)
      jsonrepair: 3.13.3
      ollama: 0.6.3
    transitivePeerDependencies:
      - zod

-  ai@6.0.134(zod@4.3.6):
+  ai@6.0.134(patch_hash=f60bfc3357e01e1f3978c6c40fdd65aeb33fefaad7179cde8676465b6c5ff4d9)(zod@4.3.6):
    dependencies:
      '@ai-sdk/gateway': 3.0.77(zod@4.3.6)
      '@ai-sdk/provider': 3.0.8