docs(html-embed): correct the encode-catch comment (returns '', not raw) (#78)

The encode catch comment promised 'fall back to raw' but the code returns '';
returning raw source wouldn't help anyway (un-encoded markup can't be atob-decoded
downstream, so decode would yield '' regardless), and a raw value in data-source
breaks the inert-storage guarantee. '' is the correct decode-symmetric failure —
fix the misleading comment to say so. Adds a codec test for the encode-throw path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
claude code agent 227
2026-06-21 03:17:37 +03:00
parent 099d31f594
commit afbc6b2202
2 changed files with 24 additions and 1 deletions

View File

@@ -103,6 +103,21 @@ describe("html-embed codec — Node Buffer fallback branch", () => {
});
});
describe("html-embed codec — encode failure fallback", () => {
it("returns '' (not raw source) when encoding throws", () => {
// Force the catch branch: a btoa that throws (e.g. simulating the
// Latin1-boundary error). The codec must NOT return the raw source —
// raw markup in data-source would fail to decode and undermine inert
// storage — it drops to "" symmetrically with the decode side.
const src = "<script>alert(1)</script>";
// @ts-expect-error — stub btoa with a throwing impl for this test.
globalThis.btoa = () => {
throw new Error("boom");
};
expect(encodeHtmlEmbedSource(src)).toBe("");
});
});
describe("html-embed codec — decode of malformed input (browser branch)", () => {
it("returns '' for input atob rejects (catch branch)", () => {
// atob throws on characters outside the base64 alphabet; the codec catches

View File

@@ -39,7 +39,15 @@ export function encodeHtmlEmbedSource(source: string): string {
// Node fallback (server-side schema parsing has no global btoa).
return Buffer.from(encodeURIComponent(source), "utf-8").toString("base64");
} catch {
// Never swallow silently in a way that loses data: fall back to raw.
// On an encoding error we drop to "" rather than returning the raw source.
// Returning raw markup here is NOT a safe fallback: the value is stored in
// the `data-source` attribute and read back through decodeHtmlEmbedSource,
// which base64-decodes it — raw (un-encoded) HTML would make atob/
// decodeURIComponent throw and decode to "" anyway, and an un-encoded value
// sitting in the attribute defeats the inert-storage guarantee (it could
// become an injection vector). So "" is the correct, decode-symmetric
// failure mode. In practice this is essentially unreachable: btoa runs on
// the output of encodeURIComponent, which is always Latin1-safe ASCII.
return "";
}
}