test(git-sync): exhaustive converter coverage + fix 3 round-trip data-loss bugs

Coder↔reviewer design loop (9 rounds, reviewer verdict: exhaustive) produced
92 specs; implemented +123 tests (465 -> 588 passing). The new round-trip
coverage exposed three genuine data-loss bugs in the Markdown<->ProseMirror
converter, all now FIXED (round-trip is lossless for these):

1. pageBreak was lost on export (no converter case -> rendered to "" and the
   node vanished). Now emits <div data-type="pageBreak"></div>, which the schema
   parses back -> round-trips.
2. A block image between blocks left an empty <p> artifact after import-hoisting,
   producing a phantom blank-gap diff on every sync. markdownToProseMirror now
   strips content-less paragraphs after generateJSON — with a schema-validity
   guard that keeps the obligatory single empty paragraph in `content: "block+"`
   containers (tableCell/tableHeader/blockquote/column/callout/doc), so empty
   cells/quotes never become an invalid `content: []`.
3. The `code` mark combined with another mark was not byte-stable (emitted nested
   HTML that the schema's `code` `excludes:"_"` collapsed on import). The
   converter now emits code-only when `code` co-occurs, matching the editor.

New coverage spans media/diagram/details/columns/math/mention attribute
round-trips, converter emission branches, git error paths, and engine decision
branches. A dedicated test pins the empty-container schema validity (the review
catch on the bug-2 fix).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
claude code agent 227
2026-06-23 06:50:20 +03:00
parent 59a6f65f77
commit 1bc664d25d
18 changed files with 2902 additions and 50 deletions

View File

@@ -68,21 +68,21 @@ export function convertProseMirrorToMarkdown(content: any): string {
let textContent = node.text || "";
// Apply marks (bold, italic, code, etc.)
if (node.marks) {
// Markdown code spans (`...`) cannot carry inner formatting, so when a
// run has the `code` mark alongside ANY other mark, backtick syntax
// would leak literal ** / []() into the code text. In that case emit
// nested HTML (<code> innermost, the other marks wrapping it as HTML)
// so the output is at least well-formed and re-parseable.
//
// NOTE: this does NOT round-trip both marks. The schema's `code` mark
// has `excludes: "_"` (it excludes every other mark), so on import the
// co-occurring mark is always dropped — the run comes back as `code`
// only. We keep the emission simple and accept that the other mark is
// lost; preserving both is impossible while `code` excludes them.
// Only use the backtick form when `code` is the sole mark.
// The schema's `code` mark declares `excludes: "_"` — it excludes every
// other inline mark — so the editor can NEVER produce a text run that
// carries `code` together with another mark, and on import any
// co-occurring mark is always dropped (the run comes back as code-only).
// The lossless, byte-stable behavior is therefore: when a run has the
// `code` mark, emit ONLY the backtick code span and ignore every other
// mark, so md1 is already code-only and md2 === md1. Runs WITHOUT a code
// mark are rendered exactly as before.
const markTypes = node.marks.map((m: any) => m.type);
const hasCode = markTypes.includes("code");
const codeCombined = hasCode && markTypes.length > 1;
if (hasCode) {
textContent = `\`${textContent}\``;
return textContent;
}
const codeCombined = false;
for (const mark of node.marks) {
switch (mark.type) {
case "bold":
@@ -571,6 +571,13 @@ export function convertProseMirrorToMarkdown(content: any): string {
return `<div ${parts.join(" ")}>${inner}</div>`;
}
case "pageBreak":
// Emit the schema-matching div[data-type="pageBreak"] so marked passes
// it through as a block and generateJSON rebuilds the pageBreak atom.
// Without this case the node fell through to `default` and rendered ""
// (the divider silently disappeared and could not round-trip).
return `<div data-type="pageBreak"></div>`;
case "subpages":
return "{{SUBPAGES}}";

View File

@@ -337,6 +337,44 @@ function bridgeTaskLists(html: string): string {
return document.body.innerHTML;
}
/**
* Recursively strip content-less paragraph nodes from a generated doc.
*
* A block-level atom whose markdown form is INLINE (e.g. the block `image`'s
* `![](url)`, or a bare media element) is wrapped by marked in a <p>; the schema
* then HOISTS the block atom out of that paragraph, leaving an EMPTY paragraph
* sibling. On the next export that empty `<p>` renders to "" and the doc "\n\n"
* join injects a phantom blank gap, so the markdown is not byte-stable.
*
* Markdown blank lines are separators, never content, so generateJSON only ever
* produces an empty paragraph as such a hoist artifact — removing them is safe
* and general (it also subsumes the <div>-wrapper workaround the `video` case
* uses). We remove ONLY `type === 'paragraph'` nodes whose `content` is absent
* or an empty array; every other node (including atoms without `content`) is
* preserved, and we recurse into the content of any node that has children.
*/
function stripEmptyParagraphs(node: any): any {
if (!node || !Array.isArray(node.content)) {
// Atom / leaf node (no children to recurse into): keep as-is.
return node;
}
const mapped = node.content.map((child: any) => stripEmptyParagraphs(child));
const isEmptyParagraph = (child: any): boolean =>
!!child &&
child.type === "paragraph" &&
(!Array.isArray(child.content) || child.content.length === 0);
const filtered = mapped.filter((child: any) => !isEmptyParagraph(child));
// Schema-validity guard: several nodes require NON-empty block content
// (`content: "block+"` — tableCell, tableHeader, blockquote, column, callout,
// and the doc root). For an empty one of those, generateJSON materializes a
// single empty paragraph as its OBLIGATORY content — that is not a hoist
// artifact. If stripping would empty the container, keep ONE empty paragraph
// so the result stays schema-valid (an empty cell/quote must not become `[]`).
const cleaned =
filtered.length === 0 && mapped.length > 0 ? [mapped[0]] : filtered;
return { ...node, content: cleaned };
}
/** Convert markdown to a ProseMirror doc using the full Docmost schema. */
export async function markdownToProseMirror(
markdownContent: string,
@@ -345,5 +383,6 @@ export async function markdownToProseMirror(
const withCallouts = await preprocessCallouts(markdownContent);
const html = await marked.parse(withCallouts);
const bridged = bridgeTaskLists(html);
return generateJSON(bridged, docmostExtensions);
const doc = generateJSON(bridged, docmostExtensions);
return stripEmptyParagraphs(doc);
}

View File

@@ -415,3 +415,250 @@ describe('applyPullActions — merge result is surfaced, not swallowed', () => {
expect(res.merge.conflict).toBe(false);
});
});
// ===========================================================================
// R-Pull-2 coverage gaps (review-driven): the suppression warning FORKS for
// `empty-live` and `mass-delete` reasons (pull.ts 278-290), and the
// fault-tolerant `removePath` catch branch (pull.ts 354-364) where `deps.rm`
// REJECTS. The existing block above only exercises the `incomplete-fetch`
// reason and an rm that always succeeds.
//
// Helper: build a deps object whose `rm` rejects for a chosen set of absolute
// paths and resolves otherwise. We override the recording fs's `rm` (a vi.fn)
// in place so `fs.rms` still records the SUCCESSFUL calls only (a rejecting rm
// throws before pushing), matching the real `node:fs/promises` semantics where
// a thrown rm performed no removal.
function makeFsWithRejectingRm(rejectFor: Set<string>) {
const base = makeFs();
base.fs.rm = vi.fn(async (abs: string) => {
if (rejectFor.has(abs)) {
throw new Error(`rm failed for ${abs}`);
}
base.rms.push(abs);
});
return base;
}
describe('applyPullActions — suppression warning forks (empty-live / mass-delete)', () => {
it('emits the empty-live warning (with existingCount) and performs no removals', async () => {
// SPEC §8 empty-live fork: live fetch returned 0 pages but files are
// tracked. Mirrors the incomplete-fetch suppression test, but the message
// text + its `existingCount` interpolation are a DISTINCT branch.
const { client } = makeClient();
const g = makeGit();
const fs = makeFs();
const res = await applyPullActions(
deps(client, g.git, fs),
actions({
toWrite: [{ pageId: 'p1', relPath: 'A.md' }],
toDelete: [], // suppressed -> already empty
deletionDecision: { apply: false, reason: 'empty-live' },
plannedDeleteCount: 3,
existingCount: 4,
}),
VAULT,
);
expect(res.deleted).toBe(0);
expect(fs.rms).toEqual([]);
// The empty-live message names the tracked-file count and "deletions
// suppressed".
expect(console.warn).toHaveBeenCalledWith(
expect.stringMatching(/live fetch returned 0 pages but 4 file\(s\) are tracked/),
);
expect(console.warn).toHaveBeenCalledWith(
expect.stringMatching(/deletions suppressed/),
);
});
it('emits the mass-delete guard warning (with planned AND existing counts) and performs no removals', async () => {
// SPEC §8 mass-delete fork (the final else branch): the message
// interpolates BOTH plannedDeleteCount and existingCount ("would delete N
// of M"), distinct from the other two suppression messages.
const { client } = makeClient();
const g = makeGit();
const fs = makeFs();
const res = await applyPullActions(
deps(client, g.git, fs),
actions({
toWrite: [{ pageId: 'p1', relPath: 'A.md' }],
toDelete: [],
deletionDecision: { apply: false, reason: 'mass-delete' },
plannedDeleteCount: 5,
existingCount: 6,
}),
VAULT,
);
expect(res.deleted).toBe(0);
expect(fs.rms).toEqual([]);
expect(console.warn).toHaveBeenCalledWith(
expect.stringMatching(/plan would delete 5 of 6 tracked file\(s\) \(mass-delete guard\)/),
);
expect(console.warn).toHaveBeenCalledWith(
expect.stringMatching(/deletions suppressed/),
);
});
});
describe('applyPullActions — removePath fault tolerance (rm REJECTS)', () => {
it('does NOT reject, logs the failure, and does not count the failed removal', async () => {
// pull.ts 354-364: when `deps.rm` throws, removePath logs via console.error
// and returns false; the run continues. Existing delete tests use an rm
// that always succeeds, leaving this catch branch uncovered.
const { client } = makeClient();
const g = makeGit();
const fs = makeFsWithRejectingRm(new Set(['/vault/Dead.md']));
const res = await applyPullActions(
deps(client, g.git, fs),
actions({
toWrite: [],
toDelete: ['Dead.md'],
deletionDecision: APPLY,
plannedDeleteCount: 1,
existingCount: 1,
}),
VAULT,
);
// Resolved (not rejected) — the pull is fault-tolerant.
expect(res.deleted).toBe(0);
// removePath's catch logs "pull: failed to delete Dead.md: ...".
expect(console.error).toHaveBeenCalledWith(
expect.stringMatching(/failed to .* Dead\.md/),
expect.anything(),
);
// The (would-be) removal never succeeded, so nothing was recorded.
expect(fs.rms).toEqual([]);
});
it('counts ONLY successful removals on a partial-failure delete batch (1 reject of 3)', async () => {
// pull.ts 388-391 increments `deleted` only when removePath returns true.
// Here Dead1/Dead3 succeed and Dead2 rejects -> deleted === 2, and the
// deleted>0 subject branch (399-400) fires with written=0.
const { client } = makeClient();
const g = makeGit();
const fs = makeFsWithRejectingRm(new Set(['/vault/Dead2.md']));
const res = await applyPullActions(
deps(client, g.git, fs),
actions({
toWrite: [],
moved: [],
toDelete: ['Dead1.md', 'Dead2.md', 'Dead3.md'],
deletionDecision: APPLY,
plannedDeleteCount: 3,
existingCount: 5,
}),
VAULT,
);
expect(res.deleted).toBe(2);
expect(fs.rms).toEqual(['/vault/Dead1.md', '/vault/Dead3.md']);
expect(g.committedSubject).toBe('docmost: sync 0 page(s), 2 deleted');
// Exactly one rejection was logged (Dead2.md).
expect(console.error).toHaveBeenCalledTimes(1);
expect(console.error).toHaveBeenCalledWith(
expect.stringMatching(/failed to .* Dead2\.md/),
expect.anything(),
);
// The run still reached commit + checkout + merge.
expect(g.order).toEqual([
'stageAll',
'commit:docmost: sync 0 page(s), 2 deleted',
'checkout:main',
'merge',
]);
});
});
describe('applyPullActions — move old-path removal rejects vs move-write fails', () => {
it('a move old-path rm REJECTION does not increment movedApplied but an independent delete still succeeds', async () => {
// pull.ts 383 increments movedApplied only when removePath of the old path
// succeeds. Here the new-path write SUCCEEDS (so the page is not in
// failedPageIds and the move loop proceeds to rm) but the old-path rm
// REJECTS — distinct from the move-write-failure guard at 376. An absence
// delete in the same run must still succeed independently.
const { client } = makeClient();
const g = makeGit();
const fs = makeFsWithRejectingRm(new Set(['/vault/Old/M.md']));
const res = await applyPullActions(
deps(client, g.git, fs),
actions({
toWrite: [{ pageId: 'm', relPath: 'New/M.md' }],
moved: [
{
pageId: 'm',
fromRelPath: 'Old/M.md',
toRelPath: 'New/M.md',
removeOldPath: true,
},
],
toDelete: ['Dead.md'],
deletionDecision: APPLY,
plannedDeleteCount: 1,
existingCount: 3,
}),
VAULT,
);
expect(res.written).toBe(1);
expect(res.movedApplied).toBe(0); // old-path rm failed -> not counted
expect(res.deleted).toBe(1); // independent absence delete still succeeded
expect(fs.rms).toEqual(['/vault/Dead.md']); // Old/M.md rm threw, not recorded
expect(g.committedSubject).toBe('docmost: sync 1 page(s), 1 deleted');
// The failure log named the moved old path.
expect(console.error).toHaveBeenCalledWith(
expect.stringMatching(/failed to .* Old\/M\.md/),
expect.anything(),
);
});
it('a move-write FAILURE keeps the old path: rm is never attempted for it (data-loss guard, 374-383)', async () => {
// Distinct branch from the move-old-path rm rejection above: here the
// new-path WRITE itself fails, so `m` enters failedPageIds and the move
// loop short-circuits at line 376 BEFORE calling rm — emitting a
// console.warn and PRESERVING the old path (the only copy).
const { client } = makeClient();
const g = makeGit();
const fs = makeFs({ failWriteFor: new Set(['/vault/New/M.md']) });
const res = await applyPullActions(
deps(client, g.git, fs),
actions({
toWrite: [{ pageId: 'm', relPath: 'New/M.md' }],
moved: [
{
pageId: 'm',
fromRelPath: 'Old/M.md',
toRelPath: 'New/M.md',
removeOldPath: true,
},
],
toDelete: [],
deletionDecision: APPLY,
plannedDeleteCount: 0,
existingCount: 1,
}),
VAULT,
);
expect(res.written).toBe(0);
expect(res.movedApplied).toBe(0);
// The old path was NEVER removed (rm not even attempted for it).
expect(fs.fs.rm).not.toHaveBeenCalledWith('/vault/Old/M.md');
expect(fs.rms).toEqual([]);
// The "keeping old path" warning fired exactly once for `m`.
const warnCalls = (console.warn as unknown as ReturnType<typeof vi.fn>).mock.calls
.map((c: unknown[]) => String(c[0]))
.filter((m: string) => m.includes('move write for m failed'));
expect(warnCalls.length).toBe(1);
expect(warnCalls[0]).toContain('keeping old path Old/M.md');
// deleted === 0 -> no ", N deleted" suffix.
expect(g.committedSubject).toBe('docmost: sync 0 page(s)');
});
});

View File

@@ -0,0 +1,109 @@
import { describe, expect, it } from 'vitest';
import {
convertProseMirrorToMarkdown,
markdownToProseMirror,
docsCanonicallyEqual,
} from 'docmost-client';
// Helper mirroring the convention in markdown-converter.test.ts: wrap atoms in
// a top-level doc node so convertProseMirrorToMarkdown (which requires
// content.content) walks them.
const doc = (...nodes: any[]) => ({ type: 'doc', content: nodes });
describe('diagram round-trip (docmost-schema diagramAttributes)', () => {
// SPEC case 1: drawio carrying the full numeric-attr surface
// (data-width/data-height/data-size/data-aspect-ratio) that it shares with
// audio/video/pdf but which no fixture exercises on a diagram node.
it('drawio round-trips numeric attrs, coercing number -> string via getAttribute', async () => {
const input = doc({
type: 'drawio',
attrs: {
src: '/d.drawio',
attachmentId: 'att-1',
width: 640,
height: 480,
size: 1234,
aspectRatio: 1.777,
align: 'center',
},
});
const md1 = convertProseMirrorToMarkdown(input);
const doc2 = await markdownToProseMirror(md1);
const md2 = convertProseMirrorToMarkdown(doc2);
// Exact serialized form: numbers render as bare data-* values; attribute
// order follows the converter's emit order (src, then width/height/size/
// aspect-ratio/align, then attachment-id).
expect(md1).toBe(
'<div data-type="drawio" data-src="/d.drawio" data-width="640" data-height="480" data-size="1234" data-aspect-ratio="1.777" data-align="center" data-attachment-id="att-1"></div>',
);
// A second export reproduces the first byte-for-byte (drawio align default
// is already "center", so nothing new materializes on import).
expect(md2).toBe(md1);
// Re-import coerces every numeric attr to a STRING because parseHTML reads
// them via getAttribute(). This is the gap the reviewer flagged: the
// number -> string coercion on a diagram node is otherwise untested.
const attrs2 = doc2.content[0].attrs;
expect(attrs2.width).toBe('640');
expect(attrs2.height).toBe('480');
expect(attrs2.size).toBe('1234');
expect(attrs2.aspectRatio).toBe('1.777');
expect(typeof attrs2.width).toBe('string');
expect(typeof attrs2.aspectRatio).toBe('string');
// String attrs pass through unchanged.
expect(attrs2.align).toBe('center');
expect(attrs2.attachmentId).toBe('att-1');
// Canonically NOT equal: the numeric -> string coercion survives
// canonicalization (only align='center' is normalized away via
// KNOWN_DEFAULTS.drawio), so 640 !== '640' makes the docs differ.
expect(docsCanonicallyEqual(input, doc2)).toBe(false);
});
// SPEC case 2: minimal excalidraw atom with ONLY string attrs (no align, no
// numeric attrs). Locks the one-time export divergence (align='center'
// default materializes only on import) plus escapeAttr of title/alt through
// the data-title/data-alt path.
it('excalidraw materializes align default only on import and escapes title/alt', async () => {
const input = doc({
type: 'excalidraw',
attrs: {
src: '/e.excalidraw',
title: 'My "Diagram"',
alt: 'a&b',
},
});
const md1 = convertProseMirrorToMarkdown(input);
const doc2 = await markdownToProseMirror(md1);
const md2 = convertProseMirrorToMarkdown(doc2);
// First export: no align emitted (the input doc carries no align), and the
// " in title becomes &quot;, the & in alt becomes &amp; via escapeAttr.
expect(md1).toBe(
'<div data-type="excalidraw" data-src="/e.excalidraw" data-title="My &quot;Diagram&quot;" data-alt="a&amp;b"></div>',
);
// Second export: align='center' has now materialized (the schema's
// diagramAttributes default), so md2 gains a data-align="center" suffix and
// is NOT byte-equal to md1. This one-time divergence is the diagram quirk.
expect(md2).toBe(
'<div data-type="excalidraw" data-src="/e.excalidraw" data-title="My &quot;Diagram&quot;" data-alt="a&amp;b" data-align="center"></div>',
);
expect(md2).not.toBe(md1);
// Re-import decodes the escaped entities back to the original characters.
const attrs2 = doc2.content[0].attrs;
expect(attrs2.title).toBe('My "Diagram"');
expect(attrs2.alt).toBe('a&b');
expect(attrs2.align).toBe('center');
// Canonically EQUAL: align='center' is normalized away via
// KNOWN_DEFAULTS.excalidraw, and title/alt are non-default strings that
// survive on both sides, so the docs are semantically equal.
expect(docsCanonicallyEqual(input, doc2)).toBe(true);
});
});

View File

@@ -0,0 +1,75 @@
import { describe, expect, it } from 'vitest';
import {
sanitizeCssColor,
clampCalloutType,
} from '../src/lib/docmost-schema.js';
// These tests pin the two security/normalization helpers that Docmost
// interpolates into inline style and the callout banner type on re-render.
// They are the allowlist guard (XSS/style-breakout boundary) and the
// case-insensitive callout normalizer, both otherwise only exercised
// indirectly through parseHTML/renderHTML.
describe('sanitizeCssColor', () => {
it('accepts a plain named color unchanged', () => {
expect(sanitizeCssColor('red')).toBe('red');
});
it('accepts 3-digit and 6-digit hex colors unchanged', () => {
expect(sanitizeCssColor('#abc')).toBe('#abc');
expect(sanitizeCssColor('#aabbcc')).toBe('#aabbcc');
});
it('accepts well-formed functional notation unchanged', () => {
expect(sanitizeCssColor('rgb(1,2,3)')).toBe('rgb(1,2,3)');
expect(sanitizeCssColor('rgba(0,0,0,0.5)')).toBe('rgba(0,0,0,0.5)');
expect(sanitizeCssColor('hsl(120,50%,50%)')).toBe('hsl(120,50%,50%)');
});
it('trims surrounding whitespace before matching', () => {
// ' blue ' trims to 'blue', which is a valid named color.
expect(sanitizeCssColor(' blue ')).toBe('blue');
});
it('rejects a style-injection payload (returns null)', () => {
expect(sanitizeCssColor('red; --x: url(x)')).toBeNull();
});
it('rejects an attribute-breakout payload (returns null)', () => {
expect(sanitizeCssColor('red"><script>')).toBeNull();
});
it('rejects the empty string (returns null)', () => {
expect(sanitizeCssColor('')).toBeNull();
});
it('rejects non-string input via the typeof guard (returns null)', () => {
// @ts-expect-error deliberately passing a non-string to exercise the guard
expect(sanitizeCssColor(123)).toBeNull();
});
});
describe('clampCalloutType', () => {
it('lowercases an uppercase valid type', () => {
expect(clampCalloutType('INFO')).toBe('info');
});
it('lowercases a mixed-case valid type', () => {
expect(clampCalloutType('Warning')).toBe('warning');
});
it('passes through already-lowercase valid types', () => {
expect(clampCalloutType('danger')).toBe('danger');
expect(clampCalloutType('success')).toBe('success');
});
it('falls back to "info" for unknown types', () => {
expect(clampCalloutType('note')).toBe('info');
expect(clampCalloutType('tip')).toBe('info');
});
it('falls back to "info" for empty string and null', () => {
expect(clampCalloutType('')).toBe('info');
expect(clampCalloutType(null)).toBe('info');
});
});

View File

@@ -0,0 +1,198 @@
/**
* Error-path coverage for the `VaultGit` git wrapper (engine/git.ts).
*
* These tests exclusively exercise the NON-ZERO-EXIT / SPAWN-FAILURE branches
* that the rest of the suite leaves untested (reviewer-flagged branch-coverage
* gap): the `run()` unified-error throw, the dedicated per-method throws in
* `listTrackedFiles` / `diffNameStatus`, the `assertGitAvailable` preflight +
* `runRaw` spawn-error (`||`-fallthrough) path, and the `ensureRepo`
* config-pin try/catch wrapper.
*
* Style mirrors git.test.ts: real `git` binary, real temp repos under
* os.tmpdir(), gitAvailable()-gated, temp dirs cleaned in afterEach.
*/
import { execFile } from 'node:child_process';
import { chmod, mkdtemp, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { promisify } from 'node:util';
import { afterEach, beforeAll, describe, expect, it } from 'vitest';
import { VaultGit } from '../src/engine/git';
const execFileAsync = promisify(execFile);
/** True if a usable `git` binary is on PATH (skip the suite otherwise). */
async function gitAvailable(): Promise<boolean> {
try {
await execFileAsync('git', ['--version']);
return true;
} catch {
return false;
}
}
describe('VaultGit error paths (integration; temp repo)', () => {
let available = false;
// Track every temp dir created so afterEach can clean them all, even the
// ones whose .git was chmod'd read-only mid-test.
const dirs: string[] = [];
beforeAll(async () => {
available = await gitAvailable();
});
afterEach(async () => {
while (dirs.length) {
const d = dirs.pop()!;
// Restore perms first: a test may have left .git read-only (0o555),
// which would make rm fail to descend into it.
try {
await chmod(join(d, '.git'), 0o755);
} catch {
/* not every dir has a .git */
}
await rm(d, { recursive: true, force: true });
}
});
/** Make a fresh temp dir for one test (under the OS tmpdir, NOT the repo). */
async function freshDir(): Promise<string> {
const d = await mkdtemp(join(tmpdir(), 'docmost-vault-err-'));
dirs.push(d);
return d;
}
// 1. run() unified non-zero-exit throw, via checkout of a missing branch.
it('checkout rejects with a unified "git checkout ... failed:" error for a missing branch', async () => {
if (!available) return; // skip gracefully when git is unavailable
const vault = await freshDir();
const git = new VaultGit(vault);
await git.ensureRepo();
// The branch was never created, so `git checkout does-not-exist` exits
// non-zero; run() must surface that as a thrown, unified Error (not resolve).
await expect(git.checkout('does-not-exist')).rejects.toThrow(
/git checkout does-not-exist failed:/,
);
// And the underlying git stderr detail must be preserved in the message.
await expect(git.checkout('does-not-exist')).rejects.toThrow(
/pathspec 'does-not-exist' did not match/,
);
});
// 2. diffNameStatus's OWN non-zero-exit throw, via an unresolvable second ref.
it('diffNameStatus rejects with "git diff --name-status failed:" for an unresolvable ref', async () => {
if (!available) return;
const vault = await freshDir();
const git = new VaultGit(vault);
await git.ensureRepo(); // gives us a HEAD (the initial "init vault" commit)
// `refs/does/not/exist` resolves to nothing -> git exits 128; the dedicated
// throw in diffNameStatus (separate from run()) must fire.
await expect(
git.diffNameStatus('HEAD', 'refs/does/not/exist'),
).rejects.toThrow(/git diff --name-status failed:/);
// git's underlying "unknown revision" / "ambiguous argument" detail is kept.
await expect(
git.diffNameStatus('HEAD', 'refs/does/not/exist'),
).rejects.toThrow(/unknown revision|ambiguous argument/);
});
// 3. listTrackedFiles's dedicated non-zero-exit throw, run OUTSIDE a work-tree.
it('listTrackedFiles rejects with "git ls-files failed: ... not a git repository" when the cwd is not a repo', async () => {
if (!available) return;
// Fresh temp dir, deliberately NOT initialized as a git repo (no ensureRepo).
const notARepo = await freshDir();
const git = new VaultGit(notARepo);
// `git ls-files -z` outside a work-tree exits 128 with "not a git repository".
await expect(git.listTrackedFiles()).rejects.toThrow(
/git ls-files failed:/,
);
await expect(git.listTrackedFiles()).rejects.toThrow(
/not a git repository/,
);
});
// 4. assertGitAvailable preflight throw + runRaw spawn-error (`||`) fallthrough.
it('assertGitAvailable rejects with the spawn (ENOENT) message preserved when git cannot be spawned', async () => {
if (!available) return;
const vault = await freshDir();
const git = new VaultGit(vault);
// Point PATH at an empty/garbage directory so spawning `git` fails with
// ENOENT. vaultGitEnv() spreads process.env, so the child inherits this PATH.
// execFile rejects with err.code === 'ENOENT' (a STRING, not a number) and
// an EMPTY-STRING stderr, which is exactly the case that forces runRaw's
// `e.stderr || e.message` fallthrough (|| not ??) to surface e.message.
const savedPath = process.env.PATH;
const garbage = await freshDir(); // an existing dir with no `git` in it
try {
process.env.PATH = garbage;
let err: unknown;
try {
await git.assertGitAvailable();
} catch (e) {
err = e;
}
expect(err).toBeInstanceOf(Error);
const message = (err as Error).message;
// The preflight's actionable wrapper.
expect(message).toContain('git binary not found or not runnable');
// Proof the empty-stderr -> e.message fallthrough preserved the spawn
// error: the "Underlying error:" suffix must carry the ENOENT detail.
expect(message).toContain('Underlying error:');
expect(message).toMatch(/ENOENT/);
} finally {
// ALWAYS restore PATH so the rest of the suite can spawn git again.
if (savedPath === undefined) delete process.env.PATH;
else process.env.PATH = savedPath;
}
});
// 5. ensureRepo config-pin try/catch wrapper.
it('ensureRepo rejects with "failed to pin vault git config" when the config write cannot acquire its lock', async () => {
if (!available) return;
// chmod-based denial does not apply to the superuser, so skip under root.
if (typeof process.getuid === 'function' && process.getuid() === 0) {
return; // running as root: chmod cannot block the write -> nothing to test
}
const vault = await freshDir();
const git = new VaultGit(vault);
await git.ensureRepo(); // first run sets up .git + identity + initial commit
// NOTE(review): the spec proposed `chmod 0o444 .git/config`, but git does
// NOT write config in place — it writes via a `config.lock` file created in
// the `.git` DIRECTORY and renames it over config. So a read-only
// `.git/config` file does NOT block the write (verified: exit 0). To
// actually fail the unconditional `git config core.autocrlf false` write we
// must make the `.git` DIRECTORY non-writable (0o555), which denies creating
// `config.lock` -> git exits 255 with "could not lock config file". The
// assertion below still checks the spec's intended wrapped error
// ("failed to pin vault git config", the vault path, and the writable/locked
// `.git/config` hint), which is the branch under test.
const gitDir = join(vault, '.git');
await chmod(gitDir, 0o555);
try {
let err: unknown;
try {
// Second ensureRepo(): identity is already set (reads pass), so the
// FIRST write it attempts is the SPEC §11 config-pin block, which now
// cannot lock -> the try/catch rethrows the actionable error.
await git.ensureRepo();
} catch (e) {
err = e;
}
expect(err).toBeInstanceOf(Error);
const message = (err as Error).message;
expect(message).toContain('failed to pin vault git config');
// References the vault path and the writable/locked .git/config hint.
expect(message).toContain(vault);
expect(message).toContain('.git/config');
expect(message).toMatch(/writable|locked/);
} finally {
// Restore perms so afterEach (and rm) can descend into .git.
await chmod(gitDir, 0o755);
}
});
});

View File

@@ -234,3 +234,92 @@ describe('VaultGit integration gaps (temp repo)', () => {
expect(globalName).not.toBe(LOCAL_NAME);
});
});
// Parser/error-fallback gaps for `git.ts` exercised WITHOUT a real git binary by
// monkey-patching the private `runRaw` primitive (every git invocation funnels
// through it, per the module header). These pin defensive arms the accepted
// integration specs above could not reach: the unknown-status consume in the
// `-z` walk, and the `|| r.stdout` empty-stderr error-detail fallbacks.
describe('VaultGit parser/error-fallback gaps (runRaw stubbed)', () => {
// --- 1. diffNameStatus: unknown status (T) sandwiched between A and M --------
//
// Protects the default arm of the status switch (git.ts ~lines 497-502): an
// unknown status like `T` (type-change) consumes ONE path token defensively
// but emits nothing. If the walk pulled the wrong count here it would desync
// and misclassify the trailing M row.
it('diffNameStatus swallows an unknown T status mid-stream and stays aligned', async () => {
const git = new VaultGit('/tmp/any');
(git as any).runRaw = async () => ({
code: 0,
// A\0a.md T\0t.md M\0m.md — T is the unknown status mid-stream.
stdout: 'A\0a.md\0T\0t.md\0M\0m.md\0',
stderr: '',
});
const entries = await git.diffNameStatus('X', 'Y');
// The T row's path token 't.md' is consumed but NOT emitted; the walk stays
// aligned so the trailing M/m.md parses cleanly (no off-by-one).
expect(entries).toEqual([
{ status: 'A', path: 'a.md' },
{ status: 'M', path: 'm.md' },
]);
expect(entries.length).toBe(2);
expect(entries.some((e) => e.status === ('T' as any))).toBe(false);
expect(entries.some((e) => e.path === 't.md')).toBe(false);
});
// --- 2. diffNameStatus: unknown status (T) FIRST in the stream --------------
//
// Leading-position variant: a `T` at the head must consume its own path token
// without swallowing the following real A entry.
it('diffNameStatus swallows a leading unknown T status and parses the next A', async () => {
const git = new VaultGit('/tmp/any');
(git as any).runRaw = async () => ({
code: 0,
stdout: 'T\0t.md\0A\0a.md\0',
stderr: '',
});
const entries = await git.diffNameStatus('X', 'Y');
expect(entries.length).toBe(1);
expect(entries[0]).toEqual({ status: 'A', path: 'a.md' });
});
// --- 3. listTrackedFiles: non-zero exit, EMPTY stderr, stdout carries detail -
//
// The thrown message is built from `(r.stderr || r.stdout || '')`. This pins
// the `|| r.stdout` arm (empty stderr, non-empty stdout) — distinct from the
// non-empty-stderr and spawn-ENOENT paths the accepted specs cover.
it('listTrackedFiles uses stdout in the error message when stderr is empty', async () => {
const git = new VaultGit('/tmp/any');
(git as any).runRaw = async () => ({
code: 1,
stderr: '',
stdout: 'some detail',
});
await expect(git.listTrackedFiles()).rejects.toThrow(
'git ls-files failed: some detail',
);
});
// --- 4. diffNameStatus: non-zero exit, EMPTY stderr, stdout carries detail ---
//
// diffNameStatus has its OWN independent `(r.stderr || r.stdout || '').trim()`
// fallback (git.ts ~line 469), separate from listTrackedFiles. Pin the
// empty-stderr/non-empty-stdout arm of THIS branch.
it('diffNameStatus uses stdout in the error message when stderr is empty', async () => {
const git = new VaultGit('/tmp/any');
(git as any).runRaw = async () => ({
code: 1,
stderr: '',
stdout: 'diff detail',
});
await expect(git.diffNameStatus('X', 'Y')).rejects.toThrow(
'git diff --name-status failed: diff detail',
);
});
});

View File

@@ -36,41 +36,35 @@ async function roundTrip(node: any): Promise<{ md1: string; doc2: any; md2: stri
// existing documented `it.fails` bugs in markdown-roundtrip.property.test.ts).
// ---------------------------------------------------------------------------
describe('pageBreak data loss (no converter case — SPEC §11 divergence)', () => {
it('exports a pageBreak node to the empty string (the node disappears)', () => {
// Direct, NON-failing assertion of the lossy emission so the data loss is
// unambiguous: a standalone pageBreak yields "" (the .trim() of nothing).
expect(convertProseMirrorToMarkdown(doc({ type: 'pageBreak' }))).toBe('');
it('exports a pageBreak node to the schema-matching block div', () => {
// FIXED: a standalone pageBreak now emits the block-level HTML div so the
// node survives instead of being erased to "".
expect(convertProseMirrorToMarkdown(doc({ type: 'pageBreak' }))).toBe(
'<div data-type="pageBreak"></div>',
);
});
it('drops a pageBreak sitting BETWEEN two paragraphs on export', () => {
// With surrounding content the lost node leaves no trace at all: the output
// is just the two paragraphs joined as if the page break were never there.
it('keeps a pageBreak sitting BETWEEN two paragraphs on export', () => {
// FIXED: with surrounding content the divider is emitted as its own block
// between the two paragraphs (joined by the doc "\n\n"), no longer dropped.
const out = convertProseMirrorToMarkdown(
doc(para(text('before')), { type: 'pageBreak' }, para(text('after'))),
);
// The pageBreak renders to "", so the only trace it leaves is a doubled
// blank gap from the doc "\n\n" join ("before" + "" + "after"): no marker,
// no placeholder — the divider itself is gone (data loss). The leftover
// blank line is itself a phantom-diff hazard, but the node is unrecoverable.
expect(out).toBe('before\n\n\n\nafter');
expect(out).not.toContain('pageBreak');
expect(out).toBe(
'before\n\n<div data-type="pageBreak"></div>\n\nafter',
);
expect(out).toContain('pageBreak');
});
// KNOWN, DOCUMENTED non-roundtrip data loss (kept honest as it.fails): a
// pageBreak node cannot survive an export -> import -> export cycle because it
// is erased on the FIRST export. The assertion below is what we WISH held (the
// node round-trips); it fails today, which `it.fails` turns green while keeping
// the divergence visible. Source must NOT change — this only documents it.
it.fails(
'BUG: a pageBreak node is lost on export and cannot round-trip',
async () => {
const { md1, doc2 } = await roundTrip({ type: 'pageBreak' });
// What we want: the placeholder is non-empty and the node comes back.
expect(md1).not.toBe('');
const types = (doc2.content || []).map((n: any) => n.type);
expect(types).toContain('pageBreak');
},
);
// FIXED: a pageBreak node now survives an export -> import -> export cycle
// because the FIRST export emits the schema-matching block div, which marked
// passes through and generateJSON rebuilds into a pageBreak node again.
it('a pageBreak node round-trips (export -> import yields a pageBreak)', async () => {
const { md1, doc2 } = await roundTrip({ type: 'pageBreak' });
expect(md1).not.toBe('');
const types = (doc2.content || []).map((n: any) => n.type);
expect(types).toContain('pageBreak');
});
});
// ---------------------------------------------------------------------------
@@ -194,3 +188,591 @@ describe('empty detailsContent (schema allows block*)', () => {
);
});
});
// ===========================================================================
// CONVERTER GAP COVERAGE (specs 1–29)
//
// These describe the converter's exact emission for under-tested branches and,
// for the round-trip cases, pin export byte-stability and/or documented data
// loss. docsCanonicallyEqual is imported here (not at the top) to keep the
// existing block's imports untouched. heading/col are local helpers; doc/text/
// para are reused from the top of the file.
// ===========================================================================
import { docsCanonicallyEqual } from '../src/lib/canonicalize.js';
const heading = (level: number, ...inline: any[]) => ({
type: 'heading',
attrs: { level },
content: inline,
});
// A two-layout columns block carrying a single column with exactly one child —
// the shared shape for the raw-HTML-container round-trip specs (15, 17–29).
const oneColumn = (child: any) => ({
type: 'columns',
attrs: { layout: 'two' },
content: [{ type: 'column', content: [child] }],
});
// Extract the single column's single child node from a round-tripped doc.
const colChildOf = (doc2: any) =>
doc2?.content?.[0]?.content?.[0]?.content?.[0];
describe('converter gap coverage — emission branches (specs 1–11)', () => {
// 1. orderedList renders index+1 and DROPS the start attribute.
it('orderedList start:5 restarts numbering at 1 (start attr ignored)', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'orderedList',
attrs: { start: 5 },
content: [
{ type: 'listItem', content: [para(text('a'))] },
{ type: 'listItem', content: [para(text('b'))] },
],
}),
);
expect(out).toBe('1. a\n2. b');
});
// 2. An empty paragraph contributes an empty segment between two "\n\n" joins.
it('an empty paragraph between two paragraphs yields doubled blank lines', () => {
const out = convertProseMirrorToMarkdown(
doc(para(text('a')), { type: 'paragraph' }, para(text('b'))),
);
expect(out).toBe('a\n\n\n\nb');
});
// 3. A code block inside a blockquote: every physical line gets "> ".
it('a codeBlock inside a blockquote prefixes every fence/code line with "> "', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'blockquote',
content: [
{
type: 'codeBlock',
attrs: { language: 'js' },
content: [text('a\nb')],
},
],
}),
);
expect(out).toBe('> ```js\n> a\n> b\n> ```');
});
// 4. A GFM body cell with TWO block children (paragraph + bulletList): joined
// by a space, the list's newline collapsed so the row stays intact.
it('a GFM body cell with paragraph+list joins them by a space (no "p1- a")', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'table',
content: [
{
type: 'tableRow',
content: [{ type: 'tableHeader', content: [para(text('h'))] }],
},
{
type: 'tableRow',
content: [
{
type: 'tableCell',
content: [
para(text('p1')),
{
type: 'bulletList',
content: [{ type: 'listItem', content: [para(text('a'))] }],
},
],
},
],
},
],
}),
);
expect(out).toBe('| h |\n| --- |\n| p1 - a |');
});
// 5. code + link co-occur: the schema's `code` mark excludes all other marks
// (including link), so the link cannot survive import. The lossless,
// byte-stable behavior is to emit ONLY the backtick code span (code wins).
it('a code+link run emits the backtick code form (code wins, link dropped)', () => {
const out = convertProseMirrorToMarkdown(
doc(
para({
type: 'text',
text: 'x',
marks: [
{ type: 'code' },
{ type: 'link', attrs: { href: 'http://a?b&c"d' } },
],
}),
),
);
expect(out).toBe('`x`');
});
// 6. hardBreak inside a heading: prefix applied once, " \n" between a and b.
it('a hardBreak inside an h2 heading produces "## a \\nb"', () => {
const out = convertProseMirrorToMarkdown(
doc(heading(2, text('a'), { type: 'hardBreak' }, text('b'))),
);
expect(out).toBe('## a \nb');
});
// 7. encodeMdUrl's non-space whitespace sub-path: a newline -> %0A.
it('an image src containing a newline percent-encodes it to %0A', () => {
const out = convertProseMirrorToMarkdown(
doc({ type: 'image', attrs: { alt: 'cap', src: '/a\nb.png' } }),
);
expect(out).toBe('![cap](/a%0Ab.png)');
});
// 8. spanned-table HTML fallback: rowspan>1 AND align cell-attr branches, <td>.
it('a spanned cell with rowspan+align emits <td rowspan align> in that order', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'table',
content: [
{
type: 'tableRow',
content: [
{
type: 'tableCell',
attrs: { rowspan: 2, align: 'center' },
content: [para(text('m'))],
},
],
},
],
}),
);
expect(out).toBe(
'<table><tbody><tr><td rowspan="2" align="center"><p>m</p></td></tr></tbody></table>',
);
});
// 9. taskItem fixed indent width of 2 (NOT prefix.length+1) for a nested sublist.
it('a task item with a nested bullet sublist indents the sublist by 2 columns', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'taskList',
content: [
{
type: 'taskItem',
attrs: { checked: false },
content: [
para(text('top')),
{
type: 'bulletList',
content: [
{ type: 'listItem', content: [para(text('child'))] },
],
},
],
},
],
}),
);
expect(out).toBe('- [ ] top\n - child');
});
// 10. A bulletList inside a blockquote: each list line independently prefixed.
it('a bulletList inside a blockquote prefixes every list line with "> "', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'blockquote',
content: [
{
type: 'bulletList',
content: [
{ type: 'listItem', content: [para(text('x'))] },
{ type: 'listItem', content: [para(text('y'))] },
],
},
],
}),
);
expect(out).toBe('> - x\n> - y');
});
// 11. GFM (non-spanned) cell: multi-block space-join + pipe-escape + newline-collapse.
it('a GFM cell escapes a literal pipe and collapses newlines across two paragraphs', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'table',
content: [
{
type: 'tableRow',
content: [{ type: 'tableHeader', content: [para(text('h'))] }],
},
{
type: 'tableRow',
content: [
{
type: 'tableCell',
content: [para(text('a|b')), para(text('c'))],
},
],
},
],
}),
);
expect(out).toBe('| h |\n| --- |\n| a\\|b c |');
});
});
describe('converter gap coverage — documented round-trip data loss (specs 12–14)', () => {
// 12. A 3-backtick fence inside a codeBlock body is NOT lengthened: the inner
// fence prematurely terminates the block, splitting it into three nodes.
it('a triple-backtick fence inside a codeBlock body is lossy (fence collision)', async () => {
const d = doc({
type: 'codeBlock',
attrs: { language: 'js' },
content: [{ type: 'text', text: '```\ninner\n```' }],
});
const md1 = convertProseMirrorToMarkdown(d);
expect(md1).toBe('```js\n```\ninner\n```\n```');
const doc2 = await markdownToProseMirror(md1);
// The inner fence split the block into THREE top-level nodes.
const top = doc2.content || [];
expect(top).toHaveLength(3);
expect(top[0].type).toBe('codeBlock');
expect(top[0].attrs?.language).toBe('js');
expect(top[0].content?.[0]).toMatchObject({ type: 'text', text: '\n' });
expect(top[1].type).toBe('paragraph');
expect(top[1].content?.[0]).toMatchObject({ type: 'text', text: 'inner' });
expect(top[2].type).toBe('codeBlock');
expect(top[2].attrs?.language).toBeNull();
expect(top[2].content?.[0]).toMatchObject({ type: 'text', text: '\n' });
const md2 = convertProseMirrorToMarkdown(doc2);
expect(md2).not.toBe(md1); // not byte-stable
expect(docsCanonicallyEqual(d, doc2)).toBe(false); // documented data loss
});
// 13. A leading ordered-list marker in paragraph text is NOT escaped, so a
// plain paragraph silently becomes an orderedList on re-import.
it('a paragraph starting with "1. " is promoted to an orderedList on re-import', async () => {
const d = doc({
type: 'paragraph',
content: [{ type: 'text', text: '1. not a list' }],
});
const md1 = convertProseMirrorToMarkdown(d);
expect(md1).toBe('1. not a list'); // no backslash escape
const doc2 = await markdownToProseMirror(md1);
expect(doc2.content?.[0]?.type).toBe('orderedList');
const li = doc2.content[0].content?.[0];
expect(li?.type).toBe('listItem');
expect(li.content?.[0]?.content?.[0]).toMatchObject({
type: 'text',
text: 'not a list', // the "1. " was consumed as a list marker
});
expect(docsCanonicallyEqual(d, doc2)).toBe(false);
});
// 14. The image emitter drops the title attribute (silently lost on round-trip).
it('an image title attribute is dropped on export and lost on re-import', async () => {
const d = doc({
type: 'image',
attrs: { src: '/i.png', alt: 'a', title: 't"q' },
});
const md1 = convertProseMirrorToMarkdown(d);
expect(md1).toBe('![a](/i.png)'); // no title, no quotes
const doc2 = await markdownToProseMirror(md1);
const img = (doc2.content || []).find((n: any) => n.type === 'image');
expect(img).toBeTruthy();
expect(img.attrs?.title).toBeNull(); // the original 't"q' was dropped
expect(img.attrs?.src).toBe('/i.png');
expect(img.attrs?.alt).toBe('a');
expect(docsCanonicallyEqual(d, doc2)).toBe(false);
});
});
describe('converter gap coverage — raw-HTML container round-trips (specs 15–29)', () => {
// 15. image inside a column: imageToHtml width+align arms; byte-stable; no
// literal-markdown text node leaks.
it('an image in a column emits <img> (width/align arms) and round-trips byte-stable', async () => {
const { md1, doc2, md2 } = await roundTrip(
oneColumn({
type: 'image',
attrs: { src: '/i.png', alt: 'cap', width: 320, align: 'center' },
}),
);
expect(md1).toBe(
'<div data-type="columns" data-layout="two"><div data-type="column"><img src="/i.png" alt="cap" width="320" align="center"></div></div>',
);
expect(md2).toBe(md1);
expect(colChildOf(doc2)?.type).toBe('image');
});
// 16. image inside a SPANNED table cell (the other raw-HTML container).
it('an image in a spanned table cell emits <img> (width arm) and round-trips byte-stable', async () => {
const { md1, md2 } = await roundTrip({
type: 'table',
content: [
{
type: 'tableRow',
content: [
{
type: 'tableCell',
attrs: { colspan: 2 },
content: [
{
type: 'image',
attrs: { src: '/i.png', alt: 'x', width: 100 },
},
],
},
],
},
],
});
expect(md1).toBe(
'<table><tbody><tr><td colspan="2"><img src="/i.png" alt="x" width="100"></td></tr></tbody></table>',
);
expect(md2).toBe(md1);
});
// 17. callout inside a column: calloutToHtml lower-cases the type; byte-stable.
it('a callout in a column emits the HTML div (type lower-cased) and round-trips', async () => {
const { md1, doc2, md2 } = await roundTrip(
oneColumn({
type: 'callout',
attrs: { type: 'WARNING' },
content: [para(text('a'))],
}),
);
expect(md1).toBe(
'<div data-type="columns" data-layout="two"><div data-type="column"><div data-type="callout" data-callout-type="warning"><p>a</p></div></div></div>',
);
expect(md2).toBe(md1);
expect(colChildOf(doc2)?.type).toBe('callout');
});
// 18. details tree inside a column: summary via inlineToHtml, content via blockToHtml.
it('a details tree in a column emits <details>/<summary>/<div detailsContent> and round-trips', async () => {
const { md1, doc2, md2 } = await roundTrip(
oneColumn({
type: 'details',
content: [
{ type: 'detailsSummary', content: [text('S')] },
{ type: 'detailsContent', content: [para(text('body'))] },
],
}),
);
expect(md1).toBe(
'<div data-type="columns" data-layout="two"><div data-type="column"><details><summary data-type="detailsSummary">S</summary><div data-type="detailsContent"><p>body</p></div></details></div></div>',
);
expect(md2).toBe(md1);
expect(colChildOf(doc2)?.type).toBe('details');
});
// 19. taskList inside a column: BOTH checked:true and checked:false arms.
it('a taskList in a column emits both data-checked arms and round-trips', async () => {
const { md1, doc2, md2 } = await roundTrip(
oneColumn({
type: 'taskList',
content: [
{
type: 'taskItem',
attrs: { checked: true },
content: [para(text('done'))],
},
{
type: 'taskItem',
attrs: { checked: false },
content: [para(text('todo'))],
},
],
}),
);
expect(md1).toBe(
'<div data-type="columns" data-layout="two"><div data-type="column"><ul data-type="taskList"><li data-type="taskItem" data-checked="true"><p>done</p></li><li data-type="taskItem" data-checked="false"><p>todo</p></li></ul></div></div>',
);
expect(md2).toBe(md1);
expect(colChildOf(doc2)?.type).toBe('taskList');
});
// 20. bare taskItem (no wrapping taskList) inside a column self-wraps.
it('a bare taskItem in a column self-wraps in a single-item taskList and round-trips', async () => {
const { md1, doc2, md2 } = await roundTrip(
oneColumn({
type: 'taskItem',
attrs: { checked: false },
content: [para(text('lone'))],
}),
);
expect(md1).toBe(
'<div data-type="columns" data-layout="two"><div data-type="column"><ul data-type="taskList"><li data-type="taskItem" data-checked="false"><p>lone</p></li></ul></div></div>',
);
expect(md2).toBe(md1);
expect(colChildOf(doc2)?.type).toBe('taskList');
});
// 21. blockquote inside a column: real <blockquote>, not markdown "> q".
it('a blockquote in a column emits <blockquote> and round-trips', async () => {
const { md1, doc2, md2 } = await roundTrip(
oneColumn({ type: 'blockquote', content: [para(text('q'))] }),
);
expect(md1).toBe(
'<div data-type="columns" data-layout="two"><div data-type="column"><blockquote><p>q</p></blockquote></div></div>',
);
expect(md2).toBe(md1);
expect(colChildOf(doc2)?.type).toBe('blockquote');
});
// 22. horizontalRule inside a column: literal <hr>, not markdown "---".
it('a horizontalRule in a column emits <hr> and round-trips', async () => {
const { md1, doc2, md2 } = await roundTrip(
oneColumn({ type: 'horizontalRule' }),
);
expect(md1).toBe(
'<div data-type="columns" data-layout="two"><div data-type="column"><hr></div></div>',
);
expect(md2).toBe(md1);
expect(colChildOf(doc2)?.type).toBe('horizontalRule');
});
// 23. Unknown block type with NON-text block children -> <div>-wrap of children.
it('an unknown block with block children wraps them in <div> (no markdown leak)', () => {
const md1 = convertProseMirrorToMarkdown(
doc(
oneColumn({
type: 'someFutureBlock',
content: [para(text('a')), para(text('b'))],
}),
),
);
expect(md1).toContain('<div><p>a</p><p>b</p></div>');
// No markdown paragraph separator survives inside the raw-HTML column.
expect(md1).toBe(
'<div data-type="columns" data-layout="two"><div data-type="column"><div><p>a</p><p>b</p></div></div></div>',
);
});
// 24. Unknown block with ONLY inline/text children -> <div>inlineToHtml</div>.
it('an unknown block with only inline children renders inline as HTML (marks not markdown)', () => {
const md1 = convertProseMirrorToMarkdown(
doc(
oneColumn({
type: 'someInlineOnlyBlock',
content: [text('hi'), { type: 'text', text: '!', marks: [{ type: 'bold' }] }],
}),
),
);
expect(md1).toContain('<div>hi<strong>!</strong></div>');
});
// 25. mathBlock inside a column delegates through processNode (NOT $$ fence).
it('a mathBlock in a column delegates to processNode (HTML div, no $$ fence)', () => {
const md1 = convertProseMirrorToMarkdown(
doc(oneColumn({ type: 'mathBlock', attrs: { text: 'a^2+b^2' } })),
);
expect(md1).toContain(
'<div data-type="mathBlock" data-katex="true" text="a^2+b^2"></div>',
);
expect(md1).not.toContain('$$');
});
// 26. SPANNED table inside a column delegates to processNode -> raw <table>.
it('a spanned table in a column delegates to raw <table> HTML (no GFM pipes)', () => {
const md1 = convertProseMirrorToMarkdown(
doc(
oneColumn({
type: 'table',
content: [
{
type: 'tableRow',
content: [
{
type: 'tableCell',
attrs: { colspan: 2 },
content: [para(text('x'))],
},
],
},
],
}),
),
);
expect(md1).toContain('<table');
expect(md1).toContain('colspan="2"');
// No GFM pipe-table separator leaked into the raw-HTML column.
expect(md1).not.toContain('| --- |');
});
// 27. list item with TWO block children (paragraph + codeBlock) -> blockChildrenToHtml.
it('a list item with paragraph+codeBlock in a column emits both blocks as HTML', () => {
const md1 = convertProseMirrorToMarkdown(
doc(
oneColumn({
type: 'bulletList',
content: [
{
type: 'listItem',
content: [
para(text('p')),
{
type: 'codeBlock',
attrs: { language: 'js' },
content: [text('a\nb')],
},
],
},
],
}),
),
);
expect(md1).toContain('<p>p</p>');
expect(md1).toContain('<pre><code class="language-js">a\nb</code></pre>');
// The two blocks appear sequentially inside the same <li>.
expect(md1).toContain(
'<li><p>p</p><pre><code class="language-js">a\nb</code></pre></li>',
);
});
// 28. ordered list item whose 2nd block child is a NESTED bulletList.
it('an ordered list item with a nested bulletList in a column emits nested <ul> HTML', () => {
const md1 = convertProseMirrorToMarkdown(
doc(
oneColumn({
type: 'orderedList',
content: [
{
type: 'listItem',
content: [
para(text('p1')),
{
type: 'bulletList',
content: [
{ type: 'listItem', content: [para(text('nested'))] },
],
},
],
},
],
}),
),
);
// NOTE(review): the spec's expected literal said '<ul><li>nested</li></ul>',
// but blockChildrenToHtml renders the nested listItem's paragraph child as a
// real <p>, so the actual (correct) emission is '<ul><li><p>nested</p></li></ul>'.
expect(md1).toContain(
'<ol><li><p>p1</p><ul><li><p>nested</p></li></ul></li></ol>',
);
// No markdown list markers leaked into the raw-HTML column.
expect(md1).not.toContain('1. ');
expect(md1).not.toContain('- nested');
});
// 29. mathInline atom inside a column paragraph -> inlineToHtml delegates via processNode.
it('a mathInline atom in a column paragraph emits schema HTML (no $...$ fence)', () => {
const md1 = convertProseMirrorToMarkdown(
doc(oneColumn(para(text('eq: '), { type: 'mathInline', attrs: { text: 'x_i' } }))),
);
expect(md1).toContain(
'<p>eq: <span data-type="mathInline" data-katex="true" text="x_i"></span></p>',
);
expect(md1).not.toContain('$x_i$');
});
});

View File

@@ -225,3 +225,166 @@ describe('empty / single-column tables', () => {
expect(out).toBe('| Only |\n| --- |\n| v |');
});
});
// ---------------------------------------------------------------------------
// Media / attachment / container full-attribute coverage. The base golden file
// only sets the minimal attrs for each media node (src, or src+name), so the
// optional-attribute emission branches and their exact ORDERING are uncovered.
// These cases pin the full ordered attribute string for video/youtube/embed/
// audio/pdf/attachment plus the all-absent side of every optional guard, and
// the distinct HTML-container (blockToHtml / inlineToHtml) paths for an
// orderedList and a hardBreak inside a column.
// ---------------------------------------------------------------------------
describe('media / attachment / container full-attribute golden coverage', () => {
it('video: emits all optional attrs in source order (alt->aria-label, attachmentId/size/align/aspectRatio->data-*)', () => {
expect(
c({
type: 'video',
attrs: {
src: '/v.mp4',
alt: 'clip',
attachmentId: 'att-1',
width: 640,
height: 480,
size: 1234,
align: 'center',
aspectRatio: 1.777,
},
}),
).toBe(
'<div><video src="/v.mp4" aria-label="clip" data-attachment-id="att-1" width="640" height="480" data-size="1234" data-align="center" data-aspect-ratio="1.777"></video></div>',
);
});
it('video: with only src, every optional guard takes its false branch (src-only <video>, no data-type on wrapper)', () => {
expect(c({ type: 'video', attrs: { src: '/v.mp4' } })).toBe(
'<div><video src="/v.mp4"></video></div>',
);
});
it('youtube + embed: each emits its full optional attr set in source order', () => {
// (a) youtube: width/height/align all present -> data-* in order.
expect(
c({
type: 'youtube',
attrs: { src: 'https://youtu.be/abc', width: 560, height: 315, align: 'right' },
}),
).toBe(
'<div data-type="youtube" data-src="https://youtu.be/abc" data-width="560" data-height="315" data-align="right"></div>',
);
// (b) embed: align/width/height optional branches after src+provider.
expect(
c({
type: 'embed',
attrs: { src: 'https://x.com/e', provider: 'iframe', align: 'left', width: 600, height: 400 },
}),
).toBe(
'<div data-type="embed" data-src="https://x.com/e" data-provider="iframe" data-align="left" data-width="600" data-height="400"></div>',
);
});
it('audio: emits data-attachment-id then data-size after src when both are set', () => {
expect(c({ type: 'audio', attrs: { src: '/a.mp3', attachmentId: 'att-7', size: 9001 } })).toBe(
'<div><audio src="/a.mp3" data-attachment-id="att-7" data-size="9001"></audio></div>',
);
});
it('audio: with attachmentId but no size, data-size is suppressed (size != null false branch)', () => {
expect(c({ type: 'audio', attrs: { src: '/a.mp3', attachmentId: 'att-7' } })).toBe(
'<div><audio src="/a.mp3" data-attachment-id="att-7"></audio></div>',
);
});
it('pdf: emits the full optional attr set in order (data-name, data-attachment-id, data-size, width, height)', () => {
expect(
c({
type: 'pdf',
attrs: {
src: '/d.pdf',
name: 'd.pdf',
attachmentId: 'att-9',
size: 2048,
width: 800,
height: 600,
},
}),
).toBe(
'<div data-type="pdf" src="/d.pdf" data-name="d.pdf" data-attachment-id="att-9" data-size="2048" width="800" height="600"></div>',
);
});
it('attachment: emits data-attachment-name/mime/size/id in order after the always-present url', () => {
expect(
c({
type: 'attachment',
attrs: {
url: '/f.zip',
name: 'f.zip',
mime: 'application/zip',
size: 512,
attachmentId: 'att-3',
},
}),
).toBe(
'<div data-type="attachment" data-attachment-url="/f.zip" data-attachment-name="f.zip" data-attachment-mime="application/zip" data-attachment-size="512" data-attachment-id="att-3"></div>',
);
});
it('attachment: with only a url, no spurious data-attachment-name/mime/size/id appear (all guards false)', () => {
expect(c({ type: 'attachment', attrs: { url: '/f.zip' } })).toBe(
'<div data-type="attachment" data-attachment-url="/f.zip"></div>',
);
});
it('orderedList inside a column renders via blockToHtml as <ol> (start attr DROPPED) with bold->strong, code->code', () => {
const out = c({
type: 'columns',
attrs: { layout: 'two' },
content: [
{
type: 'column',
content: [
{
type: 'orderedList',
attrs: { start: 3 },
content: [
{
type: 'listItem',
content: [para(text('a', [{ type: 'bold' }]))],
},
{
type: 'listItem',
content: [para(text('b', [{ type: 'code' }]))],
},
],
},
],
},
],
});
// blockToHtml orderedList path emits a plain <ol> with no start attribute,
// and inlineToHtml maps bold->strong, code->code.
expect(out).toContain(
'<ol><li><p><strong>a</strong></p></li><li><p><code>b</code></p></li></ol>',
);
// The start:3 attr is NOT preserved in the HTML/column container path.
expect(out).not.toContain('start=');
});
it('hardBreak inside a column renders as <br> via inlineToHtml (not the markdown two-space form)', () => {
const out = c({
type: 'columns',
attrs: { layout: 'two' },
content: [
{
type: 'column',
content: [para(text('a'), { type: 'hardBreak' }, text('b'))],
},
],
});
expect(out).toContain('<p>a<br>b</p>');
// The processNode markdown " \n" hard-break form must NOT appear in the
// raw-HTML column container path.
expect(out).not.toContain(' \n');
});
});

View File

@@ -0,0 +1,223 @@
import { describe, expect, it } from 'vitest';
// Import the converter DIRECTLY from src (NOT the docmost-client barrel, which
// pulls in collaboration.ts and mutates the global DOM at import time), matching
// the other converter unit tests (see markdown-converter-gaps.test.ts).
import { convertProseMirrorToMarkdown } from '../src/lib/markdown-converter.js';
// Minimal ProseMirror builders. The top-level converter joins doc children with
// "\n\n" then .trim()s, so a single-node doc yields exactly that node's rendered
// (trimmed) string.
const doc = (...nodes: any[]) => ({ type: 'doc', content: nodes });
const text = (t: string, marks?: any[]) =>
marks ? { type: 'text', text: t, marks } : { type: 'text', text: t };
const para = (...inline: any[]) => ({ type: 'paragraph', content: inline });
// A columns node carrying a SINGLE column, whose content is the supplied block
// children. columns/column are raw-HTML containers, so their children render via
// blockToHtml -> inlineToHtml (the HTML-mirroring path under test).
const oneColumn = (...blocks: any[]) => ({
type: 'columns',
attrs: { layout: 'two' },
content: [{ type: 'column', content: blocks }],
});
// Extract the inner HTML of the single column from a rendered columns string.
// Output shape is:
// <div data-type="columns" data-layout="two"><div data-type="column">INNER</div></div>
const COLUMN_PREFIX =
'<div data-type="columns" data-layout="two"><div data-type="column">';
const COLUMN_SUFFIX = '</div></div>';
const columnInner = (rendered: string): string => {
expect(rendered.startsWith(COLUMN_PREFIX)).toBe(true);
expect(rendered.endsWith(COLUMN_SUFFIX)).toBe(true);
return rendered.slice(COLUMN_PREFIX.length, rendered.length - COLUMN_SUFFIX.length);
};
// ---------------------------------------------------------------------------
// 1. inlineToHtml mark-mirroring INSIDE a raw-HTML container (columns).
//
// At the TOP level the `text` case emits markdown markers (**, *, ``, ~~) for
// bold/italic/code/strike. But inside columns (and spanned table cells) the
// content is raw HTML that marked will NOT re-parse, so inlineToHtml
// (markdown-converter.ts lines 599-619) MUST mirror each mark to HTML instead:
// bold-><strong>, italic-><em>, code-><code>, strike-><s>, underline-><u>. This
// is a DISTINCT branch from the top-level mark path; if it leaked markdown, the
// literal ** / `` would survive as text on re-import.
// ---------------------------------------------------------------------------
describe('inlineToHtml: bold/italic/code/strike/underline -> HTML inside columns', () => {
it('mirrors each single-mark run to its schema HTML tag (not markdown markers)', () => {
const out = convertProseMirrorToMarkdown(
doc(
oneColumn(
para(
text('b', [{ type: 'bold' }]),
text('i', [{ type: 'italic' }]),
text('c', [{ type: 'code' }]),
text('s', [{ type: 'strike' }]),
text('u', [{ type: 'underline' }]),
),
),
),
);
expect(out).toBe(
'<div data-type="columns" data-layout="two">' +
'<div data-type="column">' +
'<p><strong>b</strong><em>i</em><code>c</code><s>s</s><u>u</u></p>' +
'</div></div>',
);
// Belt-and-suspenders: none of the top-level markdown markers leaked.
expect(out).not.toContain('**');
expect(out).not.toContain('~~');
expect(out).not.toContain('`');
});
});
// ---------------------------------------------------------------------------
// 2. inlineToHtml: link/hardBreak/highlight/textStyle/comment inside columns.
//
// Exercises the remaining inlineToHtml branches that are uncovered inside a
// raw-HTML container: link href escaping via escapeAttr (line 621; & -> &amp;,
// " -> &quot;), hardBreak -> <br> (line 591), highlight WITH vs WITHOUT color
// (624-626), textStyle color (628-630), and comment with data-resolved (632-638).
// ---------------------------------------------------------------------------
describe('inlineToHtml: link/hardBreak/highlight/textStyle/comment inside columns', () => {
it('escapes link hrefs, emits <br>, plain/colored <mark>, span color, and resolved comment', () => {
const out = convertProseMirrorToMarkdown(
doc(
oneColumn(
para(
text('lnk', [{ type: 'link', attrs: { href: 'http://a?b&c"d' } }]),
{ type: 'hardBreak' },
text('hl', [{ type: 'highlight', attrs: { color: '#ff0000' } }]),
text('plain', [{ type: 'highlight' }]),
text('clr', [{ type: 'textStyle', attrs: { color: 'red' } }]),
text('cm', [
{ type: 'comment', attrs: { commentId: 'c1', resolved: true } },
]),
),
),
),
);
expect(columnInner(out)).toBe(
'<p>' +
'<a href="http://a?b&amp;c&quot;d">lnk</a>' +
'<br>' +
'<mark style="background-color: #ff0000">hl</mark>' +
'<mark>plain</mark>' +
'<span style="color: red">clr</span>' +
'<span data-comment-id="c1" data-resolved="true">cm</span>' +
'</p>',
);
});
it('omits data-resolved when the comment is not resolved', () => {
// The resolved sub-branch (632-638) is load-bearing: an unresolved comment
// must emit a bare data-comment-id span with NO data-resolved attribute.
const out = convertProseMirrorToMarkdown(
doc(
oneColumn(
para(
text('cm', [
{ type: 'comment', attrs: { commentId: 'c1', resolved: false } },
]),
),
),
),
);
expect(columnInner(out)).toBe('<p><span data-comment-id="c1">cm</span></p>');
expect(out).not.toContain('data-resolved');
});
});
// ---------------------------------------------------------------------------
// 3. blockToHtml non-paragraph branches inside columns: heading / codeBlock /
// bulletList.
//
// heading -> <hN> (718-721), codeBlock with-language vs no-language class fork
// (730-742; the no-language `cls = ''` branch at 741 yields a BARE <code> with
// no class), and bulletList -> <ul><li><p>...</p></li></ul> (722-725). Code text
// is element TEXT content, so it is escapeHtmlText-escaped (not the attr escaper),
// and embedded newlines are preserved verbatim.
// ---------------------------------------------------------------------------
describe('blockToHtml: heading / codeBlock(lang & no-lang) / bulletList inside columns', () => {
it('emits <hN>, language vs bare <pre><code>, and <ul><li><p>..</p></li>', () => {
const out = convertProseMirrorToMarkdown(
doc(
oneColumn(
{ type: 'heading', attrs: { level: 2 }, content: [text('H')] },
{
type: 'codeBlock',
attrs: { language: 'js' },
content: [text('a\nb')],
},
{ type: 'codeBlock', content: [text('plain')] },
{
type: 'bulletList',
content: [
{ type: 'listItem', content: [para(text('item'))] },
],
},
),
),
);
expect(columnInner(out)).toBe(
'<h2>H</h2>' +
'<pre><code class="language-js">a\nb</code></pre>' +
'<pre><code>plain</code></pre>' +
'<ul><li><p>item</p></li></ul>',
);
// The no-language codeBlock must NOT carry a class attribute (the cls=''
// fork at line 741): its <code> opens bare.
expect(out).toContain('<pre><code>plain</code></pre>');
});
});
// ---------------------------------------------------------------------------
// 4. Spanned-table renderHtmlCell + orderedList block child (HTML fallback).
//
// A colspan>1 cell forces the WHOLE table to the raw-<table> HTML fallback
// (markdown-converter.ts ~287-331). renderHtmlCell emits colspan + align attrs
// (312-316) and renders each block child via blockToHtml. An orderedList child
// hits the blockToHtml orderedList branch (726-729), which emits
// <ol><li><p>..</p></li>..</ol> — the schema's `start` attr is NOT emitted by
// this HTML <ol> branch.
// ---------------------------------------------------------------------------
describe('spanned table: renderHtmlCell colspan/align + orderedList block child', () => {
it('renders the colspan/align cell with an <ol> (start attr is dropped)', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'table',
content: [
{
type: 'tableRow',
content: [
{
type: 'tableCell',
attrs: { colspan: 2, align: 'center' },
content: [
{
type: 'orderedList',
attrs: { start: 3 },
content: [
{ type: 'listItem', content: [para(text('one'))] },
{ type: 'listItem', content: [para(text('two'))] },
],
},
],
},
],
},
],
}),
);
expect(out).toBe(
'<table><tbody><tr>' +
'<td colspan="2" align="center">' +
'<ol><li><p>one</p></li><li><p>two</p></li></ol>' +
'</td>' +
'</tr></tbody></table>',
);
// The HTML <ol> branch does not propagate the ProseMirror `start` attribute.
expect(out).not.toContain('start');
});
});

View File

@@ -59,19 +59,22 @@ describe('convertProseMirrorToMarkdown', () => {
).toBe('`x`');
});
it('code + another mark switches to nested HTML (no backtick form)', () => {
// marks array order drives nesting: bold first wraps, then code wraps that.
it('code + another mark emits the backtick code form (code wins)', () => {
// The schema's `code` mark excludes all other marks, so the editor can
// never produce code+bold on one run and import always drops the co-mark.
// The lossless, byte-stable behavior is to emit ONLY the backtick code
// span and ignore the co-occurring mark.
const out = convertProseMirrorToMarkdown(
doc(para(text('x', [{ type: 'bold' }, { type: 'code' }]))),
);
expect(out).toBe('<code><strong>x</strong></code>');
expect(out).toBe('`x`');
});
it('code + strike combo emits <code> wrapping <s>', () => {
it('code + strike combo emits the backtick code form (code wins)', () => {
const out = convertProseMirrorToMarkdown(
doc(para(text('x', [{ type: 'strike' }, { type: 'code' }]))),
);
expect(out).toBe('<code><s>x</s></code>');
expect(out).toBe('`x`');
});
});
@@ -504,4 +507,145 @@ describe('convertProseMirrorToMarkdown', () => {
expect(out.startsWith('- lvl')).toBe(true);
});
});
// ===========================================================================
// Targeted coverage for marker-width-scaled list indent, the markdown
// link-title escape branch, the markdown callout fence, and the blockquote
// per-line prefixer over a multi-line nested-block child. Grounded against
// the real converter output (verified empirically) — see processListItem /
// indentItemChildren (src 812-843), the link mark branch (src 117-121), the
// callout case (src 373-376), and the blockquote prefixer (src 210-221).
describe('marker-width / link-title / callout / blockquote-nested', () => {
// Spec 1 — two-digit ordered marker scales the continuation indent to 4.
it('indents a nested ordered sublist under item 10 by 4 spaces (marker "10. ")', () => {
// Items 1..10 ("a".."j"); the 10th additionally holds a nested
// orderedList with one paragraph "x".
const items: any[] = [];
for (let i = 0; i < 9; i++) {
items.push({
type: 'listItem',
content: [para(text(String.fromCharCode(97 + i)))], // 'a'..'i'
});
}
items.push({
type: 'listItem',
content: [
para(text('j')),
{
type: 'orderedList',
content: [{ type: 'listItem', content: [para(text('x'))] }],
},
],
});
const out = convertProseMirrorToMarkdown(
doc({ type: 'orderedList', content: items }),
);
// The 10th marker is the 4-column "10. "; the nested sublist line must be
// indented exactly 4 spaces (prefix.length 3 + 1), NOT 3.
expect(out).toContain('10. j\n 1. x');
// Guard against the off-by-one (3-space) regression that would re-parse
// the sublist as loose/sibling content on import.
expect(out).not.toContain('10. j\n 1. x');
// And the single-digit items keep the narrower 3-column marker (no body
// continuation here, but the marker itself must stay "1. ".."9. ").
expect(out.startsWith('1. a\n2. b\n')).toBe(true);
expect(out).toContain('\n9. i\n10. j');
});
// Spec 2 — markdown link-title branch escapes an embedded double quote and
// emits the href raw.
it('escapes an embedded double-quote in a markdown link title and emits href raw', () => {
const out = convertProseMirrorToMarkdown(
doc(
para(
text('lbl', [
{
type: 'link',
attrs: { href: 'http://a', title: 'he said "hi"' },
},
]),
),
),
);
// The title's " is backslash-escaped (.replace(/"/g,'\\"')) so it cannot
// terminate the (url "title") syntax early; the href is RAW (not escaped).
expect(out).toBe('[lbl](http://a "he said \\"hi\\"")');
});
// Spec 3 — markdown callout fence lowercases the type and joins multiple
// paragraph children.
it('lowercases an uppercase callout type and joins its paragraphs', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'callout',
attrs: { type: 'WARNING' },
content: [para(text('line1')), para(text('line2'))],
}),
);
// NOTE(review): the spec predicted ':::warning\nline1\n\nline2\n:::' (a
// blank line between paragraphs, attributed to "marked's paragraph
// blank-line"). The real converter does NOT route callout bodies through
// marked — the callout case (src 374-376) joins its rendered children
// with a single '\n' (calloutContent = nodeContent.map(processNode)
// .join('\n')), and each paragraph renders to just its text. So the ACTUAL
// (and correct-per-source) body is 'line1\nline2' with ONE newline. We
// still pin the two behaviors the spec cares about: the .toLowerCase()
// (WARNING -> warning) and the multi-child join.
expect(out).toBe(':::warning\nline1\nline2\n:::');
// The fence type is lowercased (regression to ':::WARNING' breaks import).
expect(out.startsWith(':::warning\n')).toBe(true);
expect(out).not.toContain(':::WARNING');
// Both paragraph children are present and joined inside the fence.
expect(out).toContain('line1\nline2');
});
// Spec 4 — blockquote per-line prefixer over a multi-line nested callout.
it('prefixes every line of a nested callout child with "> "', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'blockquote',
content: [
{
type: 'callout',
attrs: { type: 'INFO' },
content: [para(text('a')), para(text('b'))],
},
],
}),
);
// NOTE(review): the spec predicted '> :::info\n> a\n>\n> b\n> :::',
// assuming the nested callout body contains a blank line between 'a' and
// 'b' (which would exercise the line.length?'> ':'>' empty-line branch).
// But per Spec 3's finding the callout joins paragraphs with a SINGLE
// '\n', so its rendered output ':::info\na\nb\n:::' has NO blank line.
// The blockquote prefixer (src 214-221) therefore prefixes each of the
// four non-empty lines with '> ', yielding the ACTUAL output below — the
// realistic per-line-prefix loop over a multi-line nested-block child.
expect(out).toBe('> :::info\n> a\n> b\n> :::');
// Every produced line carries the '> ' prefix (no line escapes to col 0).
for (const line of out.split('\n')) {
expect(line.startsWith('>')).toBe(true);
}
});
// The empty-line '>' branch from Spec 4's intent IS reachable — just not via
// the nested callout (whose body has no blank line). A two-paragraph
// blockquote DOES separate its block children with a bare '>' line, which is
// the branch the spec wanted to protect. Pin it directly so the
// (line.length ? '> ' : '>') empty-line path stays covered.
it('maps an internal blank line to a bare ">" (not "> ") in a multi-block quote', () => {
const out = convertProseMirrorToMarkdown(
doc({
type: 'blockquote',
content: [para(text('p1')), para(text('p2'))],
}),
);
expect(out).toBe('> p1\n>\n> p2');
// The separator line is exactly '>' with NO trailing space.
expect(out.split('\n')).toContain('>');
expect(out).not.toContain('> \n');
});
});
});

View File

@@ -641,7 +641,7 @@ describe('markdown <-> ProseMirror round-trip (property-based)', () => {
// { type:'paragraph', content:[{type:'text',text:'q'}] } ] }
// Not "fixed" — the source must not change; documented and exercised here.
// -------------------------------------------------------------------------
it.fails('BUG: a block image between other blocks is not byte-stable', async () => {
it('a block image between other blocks is byte-stable', async () => {
const doc = {
type: 'doc',
content: [
@@ -670,8 +670,8 @@ describe('markdown <-> ProseMirror round-trip (property-based)', () => {
// marks switch): preserving both marks is impossible while `code` excludes
// them. Documented here, not "fixed", because the source must not change.
// -------------------------------------------------------------------------
it.fails(
'BUG: code mark combined with another mark is not byte-stable',
it(
'code mark combined with another mark is byte-stable',
async () => {
const codeComboArb = fc
.tuple(safeTextArb, fc.constantFrom('bold', 'italic', 'strike'))

View File

@@ -6,6 +6,10 @@ import { describe, expect, it } from 'vitest';
// pipeline. Importing this module mutates the global DOM via jsdom (required for
// @tiptap/html under Node) — expected, same as the property test.
import { markdownToProseMirror } from '../src/lib/markdown-to-prosemirror.js';
// The export side (ProseMirror -> markdown) is pulled in for the round-trip
// specs below (underline/sub/sup marks, heading levels, link title). Imported
// directly from src/lib (not the barrel) like the other converter unit tests.
import { convertProseMirrorToMarkdown } from '../src/lib/markdown-converter.js';
// Find every node of a given type anywhere in a ProseMirror doc tree.
const findAll = (node: any, type: string, acc: any[] = []): any[] => {
@@ -151,3 +155,295 @@ describe('bridgeTaskLists: numbered checklist + mixed-list negative', () => {
expect(allText(bulletLists[0])).toContain('plain item');
});
});
// Find the first mark of a given type on a text node anywhere in the tree.
const firstMark = (node: any, markType: string): any => {
if (node?.type === 'text') {
for (const m of node.marks || []) if (m.type === markType) return m;
}
for (const child of node?.content || []) {
const found = firstMark(child, markType);
if (found) return found;
}
return null;
};
// ---------------------------------------------------------------------------
// Spec 1. IMPORT-side color sanitization for the highlight + textStyle marks.
//
// The Highlight.extend / TextStyle parseHTML run attacker-controlled colors
// through sanitizeCssColor when generateJSON re-parses stored HTML. This is the
// real defense that strips a crafted color on IMPORT (the export-side emission
// is tested elsewhere; the parse path was not).
// ---------------------------------------------------------------------------
describe('import: highlight/textStyle color sanitization (parseHTML)', () => {
it('strips the unsafe "--x:1" declaration but keeps the safe "red" background-color', async () => {
const doc = await markdownToProseMirror(
'<mark style="background-color: red; --x:1">x</mark>',
);
const mark = firstMark(doc, 'highlight');
// The highlight mark IS present on the text run.
expect(mark).not.toBeNull();
expect(allText(doc)).toContain('x');
// NOTE(review): Spec 1 expected attrs.color === null for this input. The
// ACTUAL behavior is attrs.color === 'red': the schema's Highlight.extend
// reads the color via getStyleProperty(el, 'background-color'), which
// isolates the `background-color: red` declaration and DROPS the separate
// unsafe `--x:1` declaration. sanitizeCssColor('red') then accepts the bare
// named color. So the injection ('--x:1') is stripped (the defense holds)
// but the legitimate 'red' survives — color is 'red', not null. The
// color-dropped-to-null path is exercised by the data-color variant below,
// where the whole "red; --x:1" string reaches sanitizeCssColor and fails.
expect(mark.attrs.color).toBe('red');
});
it('drops a crafted color carried whole in data-color (sanitizeCssColor -> null)', async () => {
// Here the entire unsafe string is the candidate color (no per-declaration
// splitting), so sanitizeCssColor rejects it and the highlight color is null
// while the highlight mark itself is still applied.
const doc = await markdownToProseMirror(
'<mark data-color="red; --x:1">x</mark>',
);
const mark = firstMark(doc, 'highlight');
expect(mark).not.toBeNull();
expect(mark.attrs.color).toBeNull();
});
it("imports '#ff0000' as the highlight mark color verbatim", async () => {
const doc = await markdownToProseMirror(
'<mark style="background-color: #ff0000">x</mark>',
);
const mark = firstMark(doc, 'highlight');
expect(mark).not.toBeNull();
expect(mark.attrs.color).toBe('#ff0000');
});
it("imports a colored span as a textStyle mark with the sanitized color", async () => {
const doc = await markdownToProseMirror(
'<span style="color: rebeccapurple">y</span>',
);
const mark = firstMark(doc, 'textStyle');
expect(mark).not.toBeNull();
expect(mark.attrs.color).toBe('rebeccapurple');
// It is carried on a real text node containing the span's text.
expect(allText(doc)).toContain('y');
});
});
// ---------------------------------------------------------------------------
// Spec 2. Importing an unsupported callout fence clamps the type to 'info'.
//
// preprocessCallouts emits div[data-type=callout][data-callout-type=tip]; the
// schema's Callout.type parseHTML pipes 'tip' through clampCalloutType, which
// maps the unknown type to the 'info' default. End-to-end import-side clamp.
// ---------------------------------------------------------------------------
describe('import: unsupported callout fence clamps type to info', () => {
it("imports ':::tip' as a callout whose attrs.type === 'info'", async () => {
const doc = await markdownToProseMirror(':::tip\nhello\n:::');
const callouts = findAll(doc, 'callout');
expect(callouts).toHaveLength(1);
expect(callouts[0].attrs.type).toBe('info');
// The body paragraph survived inside the callout.
expect(allText(callouts[0])).toContain('hello');
const paras = findAll(callouts[0], 'paragraph');
expect(paras.length).toBeGreaterThanOrEqual(1);
});
});
// ---------------------------------------------------------------------------
// Spec 3. Importing a columns layout with a string data-width yields a numeric
// column width, and the columns wrapper carries its default layout/widthMode.
// ---------------------------------------------------------------------------
describe('import: columns layout with string data-width -> numeric width', () => {
it('parses data-width="33.5" to the number 33.5 and populates columns defaults', async () => {
const doc = await markdownToProseMirror(
'<div data-type="columns"><div data-type="column" data-width="33.5"><p>a</p></div></div>',
);
const columns = findAll(doc, 'columns');
expect(columns).toHaveLength(1);
// Columns default attrs are populated (not undefined).
expect(columns[0].attrs.widthMode).toBe('normal');
expect(columns[0].attrs.layout).not.toBeNull();
expect(columns[0].attrs.layout).toBe('two_equal');
const cols = findAll(columns[0], 'column');
expect(cols).toHaveLength(1);
// parseFloat('33.5') -> 33.5 as a NUMBER, not the string '33.5'.
expect(cols[0].attrs.width).toBe(33.5);
expect(typeof cols[0].attrs.width).toBe('number');
expect(allText(cols[0])).toContain('a');
});
});
// ---------------------------------------------------------------------------
// Spec 4. Comment mark resolved-attribute boolean coercion on import.
//
// The comment mark's resolved attr parseHTML compares
// el.getAttribute('data-resolved') === 'true', so a missing attribute yields
// false (default) and the literal 'true' yields boolean true.
// ---------------------------------------------------------------------------
describe('import: comment mark commentId + resolved boolean coercion', () => {
it("data-resolved='true' -> resolved:true with the parsed commentId", async () => {
const doc = await markdownToProseMirror(
'<span data-comment-id="c1" data-resolved="true">x</span>',
);
const mark = firstMark(doc, 'comment');
expect(mark).not.toBeNull();
expect(mark.attrs.commentId).toBe('c1');
expect(mark.attrs.resolved).toBe(true);
});
it('a missing data-resolved -> resolved:false (default)', async () => {
const doc = await markdownToProseMirror(
'<span data-comment-id="c2">y</span>',
);
const mark = firstMark(doc, 'comment');
expect(mark).not.toBeNull();
expect(mark.attrs.commentId).toBe('c2');
expect(mark.attrs.resolved).toBe(false);
});
});
// ---------------------------------------------------------------------------
// Spec 5. A NON-numeric truthy data-width reaches parseFloat and yields NaN.
//
// Column.width parseHTML is `value ? parseFloat(value) : null`; 'abc' is truthy
// so parseFloat('abc') -> NaN leaks through as the raw attribute value rather
// than falling back to the null default. (JSON.stringify would serialize NaN to
// null — see the assertion below — so the leak is invisible in serialized JSON.)
// ---------------------------------------------------------------------------
describe('import: malformed non-numeric data-width leaks NaN', () => {
it("data-width='abc' -> column width is NaN (typeof number), not null", async () => {
const doc = await markdownToProseMirror(
'<div data-type="columns"><div data-type="column" data-width="abc"><p>x</p></div></div>',
);
const width = doc.content[0].content[0].attrs.width;
expect(typeof width).toBe('number');
expect(Number.isNaN(width)).toBe(true);
// Document that the leak is masked by JSON serialization: NaN -> null.
expect(JSON.parse(JSON.stringify(doc)).content[0].content[0].attrs.width).toBeNull();
});
});
// ---------------------------------------------------------------------------
// Spec 6. A column with NO data-width attribute lands on the null default.
//
// The else branch of `value ? parseFloat(value) : null` (getAttribute -> null)
// must yield exactly null (not NaN/undefined), and the columns wrapper carries
// its layout/widthMode defaults.
// ---------------------------------------------------------------------------
describe('import: width-less column lands on null default', () => {
it('no data-width -> column width === null, columns defaults populated', async () => {
const doc = await markdownToProseMirror(
'<div data-type="columns"><div data-type="column"><p>y</p></div></div>',
);
expect(doc.content[0].content[0].attrs.width).toBe(null);
expect(doc.content[0].attrs.layout).toBe('two_equal');
expect(doc.content[0].attrs.widthMode).toBe('normal');
});
});
// ---------------------------------------------------------------------------
// Spec 7. A structural callout div with missing/empty data-callout-type clamps
// to 'info' via clampCalloutType (the parseHTML getAttrs fallback), with no icon.
// ---------------------------------------------------------------------------
describe('import: callout div with missing/empty data-callout-type clamps to info', () => {
it('a callout div with NO data-callout-type -> type:info, icon:null', async () => {
const doc = await markdownToProseMirror(
'<div data-type="callout"><p>z</p></div>',
);
expect(doc.content[0].type).toBe('callout');
expect(doc.content[0].attrs.type).toBe('info');
expect(doc.content[0].attrs.icon).toBeNull();
});
it('a callout div with EMPTY data-callout-type -> type:info, icon:null', async () => {
const doc = await markdownToProseMirror(
'<div data-type="callout" data-callout-type=""><p>w</p></div>',
);
expect(doc.content[0].type).toBe('callout');
expect(doc.content[0].attrs.type).toBe('info');
expect(doc.content[0].attrs.icon).toBeNull();
});
});
// ---------------------------------------------------------------------------
// Spec 8. A plain <td> with no align/colspan/rowspan/colwidth lands on the
// schema defaults (align null via the `||` fallback arm; spans default to 1).
// ---------------------------------------------------------------------------
describe('import: span/align-less table cell lands on defaults', () => {
it('a bare td -> align:null, colspan:1, rowspan:1, colwidth:null', async () => {
const doc = await markdownToProseMirror(
'<table><tbody><tr><td><p>c</p></td></tr></tbody></table>',
);
const cells = findAll(doc, 'tableCell');
expect(cells).toHaveLength(1);
const attrs = cells[0].attrs;
expect(attrs.align).toBeNull();
expect(attrs.colspan).toBe(1);
expect(attrs.rowspan).toBe(1);
expect(attrs.colwidth).toBeNull();
expect(allText(cells[0])).toContain('c');
});
});
// ---------------------------------------------------------------------------
// Spec 9. underline/subscript/superscript marks survive import and re-export.
// (inlineToHtml src 611-619 renders them back to <u>/<sub>/<sup>.)
// ---------------------------------------------------------------------------
describe('import+export: underline/subscript/superscript marks round-trip', () => {
it('<u>/<sub>/<sup> import to the right marks and re-export unchanged', async () => {
const doc = await markdownToProseMirror('<p><u>a</u><sub>b</sub><sup>c</sup></p>');
const para = findAll(doc, 'paragraph')[0];
const texts = (para.content || []).filter((n: any) => n.type === 'text');
expect(texts).toHaveLength(3);
expect(texts[0].text).toBe('a');
expect((texts[0].marks || []).map((m: any) => m.type)).toEqual(['underline']);
expect(texts[1].text).toBe('b');
expect((texts[1].marks || []).map((m: any) => m.type)).toEqual(['subscript']);
expect(texts[2].text).toBe('c');
expect((texts[2].marks || []).map((m: any) => m.type)).toEqual(['superscript']);
const md = convertProseMirrorToMarkdown(doc);
expect(md).toContain('<u>a</u>');
expect(md).toContain('<sub>b</sub>');
expect(md).toContain('<sup>c</sup>');
});
});
// ---------------------------------------------------------------------------
// Spec 10. Heading level attribute fidelity (h1/h2/h6) on import and re-export.
// ---------------------------------------------------------------------------
describe('import+export: heading levels 1/2/6 round-trip', () => {
it('parses # / ## / ###### to level 1/2/6 and re-emits them', async () => {
const doc = await markdownToProseMirror('# H1\n\n## H2\n\n###### H6');
const headings = findAll(doc, 'heading');
expect(headings).toHaveLength(3);
expect(headings[0].attrs.level).toBe(1);
expect(headings[1].attrs.level).toBe(2);
expect(headings[2].attrs.level).toBe(6);
const md = convertProseMirrorToMarkdown(doc);
const blocks = md.split('\n\n');
expect(blocks).toContain('# H1');
expect(blocks).toContain('## H2');
expect(blocks).toContain('###### H6');
});
});
// ---------------------------------------------------------------------------
// Spec 11. Link mark recovers BOTH href and title on import and round-trips.
// ---------------------------------------------------------------------------
describe('import+export: link mark href + title round-trip', () => {
it('parses [lbl](http://a "the title") with href+title and re-emits it', async () => {
const doc = await markdownToProseMirror('[lbl](http://a "the title")');
const mark = firstMark(doc, 'link');
expect(mark).not.toBeNull();
expect(mark.attrs.href).toBe('http://a');
expect(mark.attrs.title).toBe('the title');
expect(allText(doc)).toContain('lbl');
const md = convertProseMirrorToMarkdown(doc);
expect(md).toContain('[lbl](http://a "the title")');
});
});

View File

@@ -0,0 +1,275 @@
import { describe, expect, it } from 'vitest';
import {
convertProseMirrorToMarkdown,
markdownToProseMirror,
docsCanonicallyEqual,
} from 'docmost-client';
// ---------------------------------------------------------------------------
// Media / atom node round-trip coverage (audio, video, pdf, attachment, embed,
// youtube). The existing specs (corpus + property test) exercise the EXPORT
// direction of these nodes only; their parseHTML branches (the INVERSE parse of
// the exported HTML) are otherwise unprotected. Each test runs the full
// export -> import -> export pipeline and pins:
// - the exact md1 byte string the converter emits,
// - whether md2 is byte-stable (md2 === md1) or grows by a materialized
// schema default on the first import,
// - the re-parsed doc2 attrs (NOTE: parseHTML reads via getAttribute and so
// returns STRINGS for numeric attrs, which is what breaks naive canonical
// equality), and
// - docsCanonicallyEqual(doc, doc2) where the spec asserts a specific result.
//
// `convertProseMirrorToMarkdown` requires a full doc ({type:'doc', content:[]}),
// so each spec's `doc=[...]` content array is wrapped via mkDoc().
// ---------------------------------------------------------------------------
/** Wrap a content array (as the specs express `doc`) into a real PM doc. */
const mkDoc = (content: any[]) => ({ type: 'doc', content });
/** export -> import -> export, returning both markdowns and the re-parsed doc. */
async function roundTrip(doc: any) {
const md1 = convertProseMirrorToMarkdown(doc);
const doc2 = await markdownToProseMirror(md1);
const md2 = convertProseMirrorToMarkdown(doc2);
return { md1, md2, doc2 };
}
/** Find the first node of a given type anywhere in a PM doc tree. */
const findFirst = (node: any, type: string): any => {
if (node && node.type === type) return node;
for (const child of node?.content || []) {
const hit = findFirst(child, type);
if (hit) return hit;
}
return null;
};
describe('media atom round-trip (audio/video/pdf/attachment/embed/youtube)', () => {
// 1. audio with ALL optional attrs ---------------------------------------
it('audio with src+attachmentId+size: byte-stable, size re-parses to the STRING "9001"', async () => {
const doc = mkDoc([
{ type: 'audio', attrs: { src: '/a.mp3', attachmentId: 'att-7', size: 9001 } },
]);
const { md1, md2, doc2 } = await roundTrip(doc);
expect(md1).toBe(
'<div><audio src="/a.mp3" data-attachment-id="att-7" data-size="9001"></audio></div>',
);
// Byte-stable: a second export reproduces the first exactly.
expect(md2).toBe(md1);
const audio = findFirst(doc2, 'audio');
expect(audio).not.toBeNull();
expect(audio.type).toBe('audio');
expect(audio.attrs.src).toBe('/a.mp3');
expect(audio.attrs.attachmentId).toBe('att-7');
// NOTE: the schema's data-size parseHTML returns getAttribute() -> a STRING,
// so the number 9001 comes back as the string '9001'.
expect(audio.attrs.size).toBe('9001');
});
// 2. fully-populated video -----------------------------------------------
it('video with all attrs: byte-stable; numeric attrs re-parse to STRINGS; canonical equality FALSE', async () => {
const doc = mkDoc([
{
type: 'video',
attrs: {
src: '/v.mp4',
alt: 'clip',
attachmentId: 'att-1',
width: 640,
height: 480,
size: 1234,
align: 'center',
aspectRatio: 1.777,
},
},
]);
const { md1, md2, doc2 } = await roundTrip(doc);
expect(md1).toBe(
'<div><video src="/v.mp4" aria-label="clip" data-attachment-id="att-1" width="640" height="480" data-size="1234" data-align="center" data-aspect-ratio="1.777"></video></div>',
);
expect(md2).toBe(md1);
const video = findFirst(doc2, 'video');
expect(video).not.toBeNull();
expect(video.attrs.alt).toBe('clip');
// All numeric attrs come back as STRINGS via getAttribute().
expect(video.attrs.width).toBe('640');
expect(video.attrs.height).toBe('480');
expect(video.attrs.size).toBe('1234');
expect(video.attrs.aspectRatio).toBe('1.777');
// Byte-stable export but NOT canonically equal: the numeric width/height/
// size/aspectRatio came back as strings, so deep-equal of the canonical
// forms fails (align:'center' is normalized away, the numbers are not).
expect(docsCanonicallyEqual(doc, doc2)).toBe(false);
});
// 3. minimal video (only src) --------------------------------------------
it('minimal video (src only): NOT byte-stable (gains data-align="center") but canonically equal', async () => {
const doc = mkDoc([{ type: 'video', attrs: { src: '/v.mp4' } }]);
const { md1, md2, doc2 } = await roundTrip(doc);
expect(md1).toBe('<div><video src="/v.mp4"></video></div>');
// video.align has a non-null schema default 'center' that materializes on
// import; the converter only emits data-align when set, so export #2 grows
// by data-align="center" exactly once (the documented one-time asymmetry).
expect(md2).toBe('<div><video src="/v.mp4" data-align="center"></video></div>');
expect(md2).not.toBe(md1);
// align:'center' is normalized away via KNOWN_DEFAULTS.video, so despite the
// byte growth the documents ARE canonically equal.
expect(docsCanonicallyEqual(doc, doc2)).toBe(true);
});
// 4. pdf with no numeric attrs (positive control) -------------------------
it('pdf with src+name+attachmentId (no numerics): byte- AND canonically-stable', async () => {
const doc = mkDoc([
{ type: 'pdf', attrs: { src: '/d.pdf', name: 'd.pdf', attachmentId: 'att-9' } },
]);
const { md1, md2, doc2 } = await roundTrip(doc);
expect(md1).toBe(
'<div data-type="pdf" src="/d.pdf" data-name="d.pdf" data-attachment-id="att-9"></div>',
);
expect(md2).toBe(md1);
const pdf = findFirst(doc2, 'pdf');
expect(pdf).not.toBeNull();
expect(pdf.attrs.src).toBe('/d.pdf');
expect(pdf.attrs.name).toBe('d.pdf');
expect(pdf.attrs.attachmentId).toBe('att-9');
// No numeric attrs to coerce to strings, so the round-trip is BOTH byte- and
// canonically-stable (the positive control vs. the numeric-divergence cases).
expect(docsCanonicallyEqual(doc, doc2)).toBe(true);
});
// 5. attachment with numeric size ----------------------------------------
it('attachment with url+name+mime+size+attachmentId: byte-stable; size STRING; canonical FALSE', async () => {
const doc = mkDoc([
{
type: 'attachment',
attrs: {
url: '/f.zip',
name: 'f.zip',
mime: 'application/zip',
size: 512,
attachmentId: 'att-3',
},
},
]);
const { md1, md2, doc2 } = await roundTrip(doc);
expect(md1).toBe(
'<div data-type="attachment" data-attachment-url="/f.zip" data-attachment-name="f.zip" data-attachment-mime="application/zip" data-attachment-size="512" data-attachment-id="att-3"></div>',
);
expect(md2).toBe(md1);
const att = findFirst(doc2, 'attachment');
expect(att).not.toBeNull();
expect(att.attrs.url).toBe('/f.zip');
expect(att.attrs.name).toBe('f.zip');
expect(att.attrs.mime).toBe('application/zip');
expect(att.attrs.attachmentId).toBe('att-3');
// data-attachment-size parseHTML -> getAttribute() -> STRING.
expect(att.attrs.size).toBe('512');
// The numeric size coerced to a string breaks canonical equality.
expect(docsCanonicallyEqual(doc, doc2)).toBe(false);
});
// 6. embed WITH explicit width/height/align (byte-stable) ----------------
it('embed with explicit src+provider+align+width+height: byte-stable; width/height STRINGS', async () => {
const doc = mkDoc([
{
type: 'embed',
attrs: {
src: 'https://x.com/e',
provider: 'iframe',
align: 'left',
width: 600,
height: 400,
},
},
]);
const { md1, md2, doc2 } = await roundTrip(doc);
expect(md1).toBe(
'<div data-type="embed" data-src="https://x.com/e" data-provider="iframe" data-align="left" data-width="600" data-height="400"></div>',
);
expect(md2).toBe(md1);
const embed = findFirst(doc2, 'embed');
expect(embed).not.toBeNull();
expect(embed.attrs.src).toBe('https://x.com/e');
expect(embed.attrs.provider).toBe('iframe');
expect(embed.attrs.align).toBe('left');
// data-width / data-height parseHTML -> getAttribute() -> STRINGS.
expect(embed.attrs.width).toBe('600');
expect(embed.attrs.height).toBe('400');
});
// 7. minimal embed (only src+provider) -----------------------------------
it('minimal embed (src+provider): NOT byte-stable; defaults width/height materialize as NUMBERS 800/600', async () => {
const doc = mkDoc([
{ type: 'embed', attrs: { src: 'https://x.com/e', provider: 'iframe' } },
]);
const { md1, md2, doc2 } = await roundTrip(doc);
expect(md1).toBe(
'<div data-type="embed" data-src="https://x.com/e" data-provider="iframe"></div>',
);
// embed has non-null schema defaults align='center', width=800, height=600
// that the converter never emits on export #1 but materialize on import, so
// export #2 grows by three data-* attrs (a one-time divergence).
expect(md2).toBe(
'<div data-type="embed" data-src="https://x.com/e" data-provider="iframe" data-align="center" data-width="800" data-height="600"></div>',
);
expect(md2).not.toBe(md1);
const embed = findFirst(doc2, 'embed');
expect(embed).not.toBeNull();
expect(embed.attrs.align).toBe('center');
// NOTE: these come from the addAttributes default (NOT parseHTML), so on the
// FIRST import they are the NUMBERS 800/600, not strings — parseHTML only
// runs when the attribute is actually present on the imported element.
expect(embed.attrs.width).toBe(800);
expect(embed.attrs.height).toBe(600);
});
// 8. youtube with src+width+height+align ---------------------------------
it('youtube with src+width+height+align(right): byte-stable; width/height STRINGS; canonical FALSE', async () => {
const doc = mkDoc([
{
type: 'youtube',
attrs: {
src: 'https://youtu.be/abc',
width: 560,
height: 315,
align: 'right',
},
},
]);
const { md1, md2, doc2 } = await roundTrip(doc);
expect(md1).toBe(
'<div data-type="youtube" data-src="https://youtu.be/abc" data-width="560" data-height="315" data-align="right"></div>',
);
expect(md2).toBe(md1);
const yt = findFirst(doc2, 'youtube');
expect(yt).not.toBeNull();
expect(yt.attrs.src).toBe('https://youtu.be/abc');
expect(yt.attrs.align).toBe('right');
// data-width / data-height parseHTML -> getAttribute() -> STRINGS.
expect(yt.attrs.width).toBe('560');
expect(yt.attrs.height).toBe('315');
// Numeric width/height coerced to strings; align='right' is non-default so
// it is kept (not in KNOWN_DEFAULTS.youtube's normalization). Canonical FALSE.
expect(docsCanonicallyEqual(doc, doc2)).toBe(false);
});
});

View File

@@ -6,6 +6,11 @@ import {
convertProseMirrorToMarkdown,
markdownToProseMirror,
} from 'docmost-client';
// Import canonical-equality DIRECTLY from src so we exercise the real
// implementation alongside the converter pair above (the barrel re-exports the
// same symbol; importing from src keeps these round-trip assertions pinned to
// the package source rather than the published surface).
import { docsCanonicallyEqual } from '../src/lib/canonicalize.js';
// Resolve the fixture relative to this test file so the test is CWD-independent.
const here = dirname(fileURLToPath(import.meta.url));
@@ -27,3 +32,137 @@ describe('round-trip idempotency (SPEC §11)', () => {
expect(md2).toBe(md1);
});
});
// ---------------------------------------------------------------------------
// Full export -> import -> export round-trips for the schema's HTML-carried
// atoms/blocks (math, mention, details). The existing markdown-converter unit
// tests only assert the one-way emit string; here we additionally pin that the
// re-import (generateJSON via the docmost schema) rebuilds the correct node and
// that a second export reproduces the first byte-for-byte. Helpers mirror the
// converter unit tests (a single-node doc renders exactly that node, trimmed).
// ---------------------------------------------------------------------------
const doc = (...nodes: any[]) => ({ type: 'doc', content: nodes });
const text = (t: string) => ({ type: 'text', text: t });
const para = (...inline: any[]) => ({ type: 'paragraph', content: inline });
// Run the canonical export -> import -> export cycle for a single block node.
async function roundTrip(
node: any,
): Promise<{ md1: string; doc2: any; md2: string }> {
const md1 = convertProseMirrorToMarkdown(doc(node));
const doc2 = await markdownToProseMirror(md1);
const md2 = convertProseMirrorToMarkdown(doc2);
return { md1, doc2, md2 };
}
describe('math round-trip (mathBlock + mathInline)', () => {
it('mathBlock survives export -> import -> export with LaTeX recovered', async () => {
const source = { type: 'mathBlock', attrs: { text: 'a^2+b^2' } };
const { md1, doc2, md2 } = await roundTrip(source);
// One-way emit: LaTeX rides in the `text` HTML attribute, data-katex flag set.
expect(md1).toBe(
'<div data-type="mathBlock" data-katex="true" text="a^2+b^2"></div>',
);
// Byte-stable: the second export reproduces the first exactly.
expect(md2).toBe(md1);
// The re-imported doc's only block is a mathBlock whose LaTeX was recovered
// from the text= attribute by the schema's default parser.
const block = doc2.content[0];
expect(block.type).toBe('mathBlock');
expect(block.attrs.text).toBe('a^2+b^2');
// Canonical equality: source and re-imported doc are the same node.
expect(docsCanonicallyEqual(doc(source), doc2)).toBe(true);
});
it('mathInline (inside a paragraph) survives export -> import -> export', async () => {
const source = para({ type: 'mathInline', attrs: { text: 'x_i' } });
const { md1, doc2, md2 } = await roundTrip(source);
expect(md1).toBe(
'<span data-type="mathInline" data-katex="true" text="x_i"></span>',
);
expect(md2).toBe(md1);
// The re-imported paragraph's child is a mathInline with the LaTeX recovered.
const paragraph = doc2.content[0];
expect(paragraph.type).toBe('paragraph');
const inline = paragraph.content[0];
expect(inline.type).toBe('mathInline');
expect(inline.attrs.text).toBe('x_i');
expect(docsCanonicallyEqual(doc(source), doc2)).toBe(true);
});
});
describe('mention round-trip', () => {
it('mention survives export -> import -> export with data-* re-parsed', async () => {
const source = para({
type: 'mention',
attrs: { id: 'u1', label: 'Alice', entityType: 'user' },
});
const { md1, doc2, md2 } = await roundTrip(source);
// One-way emit: schema span with data-* attrs and the visible '@Alice' text.
expect(md1).toBe(
'<span data-type="mention" data-id="u1" data-label="Alice" data-entity-type="user">@Alice</span>',
);
// Byte-stable.
expect(md2).toBe(md1);
// The visible '@Alice' is cosmetic; generateJSON rebuilds a mention node from
// the data-* attributes. The unset attrs fall back to their schema defaults.
const paragraph = doc2.content[0];
expect(paragraph.type).toBe('paragraph');
const mention = paragraph.content[0];
expect(mention.type).toBe('mention');
expect(mention.attrs.id).toBe('u1');
expect(mention.attrs.label).toBe('Alice');
expect(mention.attrs.entityType).toBe('user');
expect(mention.attrs.entityId).toBeNull();
expect(mention.attrs.slugId).toBeNull();
expect(mention.attrs.creatorId).toBeNull();
expect(mention.attrs.anchorId).toBeNull();
expect(docsCanonicallyEqual(doc(source), doc2)).toBe(true);
});
});
describe('details open-attribute round-trip', () => {
it('the markdown details fence never carries an open flag and stays byte-stable', async () => {
// Source details is OPEN (attrs.open: ''), but the top-level markdown path
// emits a plain '<details>' fence (no 'open' attribute) — see converter
// case "detailsSummary" which hardcodes '<details>\n<summary>...'.
const source = {
type: 'details',
attrs: { open: '' },
content: [
{ type: 'detailsSummary', content: [text('S')] },
{ type: 'detailsContent', content: [para(text('body'))] },
],
};
const { md1, doc2, md2 } = await roundTrip(source);
// The emitted fence drops the open flag entirely.
expect(md1).toBe('<details>\n<summary>S</summary>\n\nbody\n</details>');
expect(md1).not.toContain('open');
// Byte-stable: re-export reproduces the same fence.
expect(md2).toBe(md1);
// NOTE(review): the spec text says doc2's details attrs.open should be
// `null` (the raw return of el.getAttribute('open') on a plain <details>,
// schema src ~L438). In practice generateJSON applies the schema attribute
// default when the parseHTML result is null, so the materialised node carries
// attrs.open === false (the declared default at src ~L437), NOT null. We
// assert the ACTUAL value. The load-bearing point of the spec still holds:
// a plain <details> import does NOT recover the open flag (no truthy value),
// so renderHTML's `attrs.open ? {open:''} : {}` keeps the round-trip clean.
const details = doc2.content[0];
expect(details.type).toBe('details');
expect(details.attrs.open).toBe(false);
expect(details.attrs.open).toBeFalsy();
});
});

View File

@@ -396,3 +396,139 @@ describe('runPush — base selection (last-pushed else docmost)', () => {
expect(calls.diffNameStatus[0].from).toBe(DOCMOST_BRANCH);
});
});
// Coverage for two narrow, otherwise-untested branches in `applyPushActions`
// (driven end-to-end via `runPush --apply`, the only write path):
// 1. `errMessage` (push.ts line 762-763) NON-Error branch — `String(err)`.
// 2. `createPage` partial-meta fallbacks (push.ts line 583-584) — `?? ''`.
describe('runPush --apply — applyPushActions edge branches', () => {
it('records a thrown NON-Error (a string) via String(err), not "undefined"', async () => {
// One UPDATE (file carries a pageId), whose collab write throws the raw
// STRING 'boom'. Every other failure test throws an Error, so the
// `String(err)` fallback in errMessage (push.ts:763) is otherwise uncovered.
const file =
'<!-- docmost:meta\n{"version":1,"pageId":"p-7"}\n-->\n\nbody\n';
const { git, calls } = makeGit({
lastPushed: 'base-sha',
changes: [{ status: 'M', path: 'Doc.md' }],
});
const fs = makeFs({ 'Doc.md': file });
const client = makeClientFake();
// Throw a bare string (NON-Error) from the update path.
(client.importPageMarkdown as any).mockImplementation(async () => {
throw 'boom';
});
const { deps } = makeDeps(git, fs, client);
// runPush must COMPLETE (the failure is isolated), not reject.
const res = await runPush(deps, { dryRun: false });
expect(res.mode).toBe('apply');
expect(res.applied?.updated).toBe(0);
expect(res.failures).toHaveLength(1);
const failure = res.failures![0];
expect(failure.kind).toBe('update');
expect(failure.pageId).toBe('p-7');
expect(failure.path).toBe('Doc.md');
// String(err) of the thrown string 'boom' — NOT 'undefined' and NOT
// '[object Object]'. This is the load-bearing assertion for line 763.
expect(failure.error).toBe('boom');
// A failure means the refs are NOT advanced (partial push, SPEC §12).
expect(calls.updateRef).toEqual([]);
expect(calls.fastForwardBranch).toEqual([]);
});
it('records a thrown NON-Error OBJECT via String(err) too (no implicit message)', async () => {
// A thrown object literal -> String({}) === '[object Object]'. Pins down that
// errMessage stringifies (not reads a .message) for non-Error throwables.
const file =
'<!-- docmost:meta\n{"version":1,"pageId":"p-8"}\n-->\n\nbody\n';
const { git } = makeGit({
lastPushed: 'base-sha',
changes: [{ status: 'M', path: 'Doc.md' }],
});
const fs = makeFs({ 'Doc.md': file });
const client = makeClientFake();
(client.importPageMarkdown as any).mockImplementation(async () => {
throw { code: 500 };
});
const { deps } = makeDeps(git, fs, client);
const res = await runPush(deps, { dryRun: false });
expect(res.failures).toHaveLength(1);
// String({ code: 500 }) — the object's default stringification.
expect(res.failures![0].error).toBe('[object Object]');
});
it('createPage gets title="" (and parentPageId=undefined) when meta has a spaceId but NO title', async () => {
// A brand-new local file whose meta has a (truthy) spaceId — REQUIRED for the
// planner to emit a CREATE (computePushActions case "A": `else if (meta?.spaceId)`,
// push.ts:249) — but NO title and NO parentPageId. This exercises the
// `meta?.title ?? ''` fallback (push.ts:583) and `parentPageId ?? undefined`
// (push.ts:585) on the real createPage call.
//
// NOTE(review): The spec for this case asked for meta = ONLY `{version:1}`
// (no title AND no spaceId) to exercise BOTH `?? ''` fallbacks at once. That
// input is UNREACHABLE through runPush: the PURE planner (computePushActions,
// push.ts:254-262) SKIPS an added file with no usable spaceId
// (reason 'create-without-spaceId'), so it never becomes a CREATE action and
// applyPushActions' create branch never runs. A separate test below pins that
// skip. Hence `meta?.spaceId ?? ''` can never actually fall back to '' via the
// planner — only `meta?.title ?? ''` is reachable, which this test covers.
const newFile = serializeDocmostMarkdownBody({ version: 1, spaceId: 'sp-1' }, 'fresh body');
const { git } = makeGit({
lastPushed: 'base-sha',
mainSha: 'main-1',
changes: [{ status: 'A', path: 'New.md' }],
});
const fs = makeFs({ 'New.md': newFile });
const client = makeClientFake({ createId: 'page-new' });
const { deps } = makeDeps(git, fs, client);
const res = await runPush(deps, { dryRun: false });
expect(res.mode).toBe('apply');
expect(res.applied?.created).toBe(1);
expect(client.createPage).toHaveBeenCalledTimes(1);
const [title, content, spaceId, parentPageId] = (client.createPage as any).mock
.calls[0];
// `meta?.title ?? ''` -> '' (no title in meta).
expect(title).toBe('');
// The body is passed as content...
expect(content).toBe('fresh body');
// ...and the present spaceId flows through (it is NOT replaced by '').
expect(spaceId).toBe('sp-1');
// `meta?.parentPageId ?? undefined` -> undefined (absent in meta).
expect(parentPageId).toBe(undefined);
});
it('an added file with meta {version:1} only (no spaceId, no title) is SKIPPED, never created', async () => {
// Documents WHY the spec's "only {version:1}" create input is unreachable:
// the planner skips it (create-without-spaceId), so createPage is never called
// and `meta?.spaceId ?? ''` cannot fall back to '' via runPush.
const file = serializeDocmostMarkdownBody({ version: 1 }, 'fresh body');
const { git, calls } = makeGit({
lastPushed: 'base-sha',
changes: [{ status: 'A', path: 'Orphan.md' }],
});
const fs = makeFs({ 'Orphan.md': file });
const client = makeClientFake();
const { deps } = makeDeps(git, fs, client);
const res = await runPush(deps, { dryRun: false });
expect(res.planned).toEqual({
creates: 0,
updates: 0,
deletes: 0,
renamesMoves: 0,
skipped: 1,
});
expect(client.createPage).not.toHaveBeenCalled();
expect(res.applied?.created).toBe(0);
expect(res.applied?.skipped).toEqual([
{ path: 'Orphan.md', status: 'A', reason: 'create-without-spaceId' },
]);
});
});

View File

@@ -77,6 +77,79 @@ describe('sanitizeTitle', () => {
});
});
describe('sanitizeTitle — boundary trim and nullish input', () => {
// Spec case 1: the length-cap branch (sanitize.ts lines ~79-81) does
// `slice(0, MAX_LENGTH).trim()`. The inner `.trim()` after the cap only
// does observable work when the 120-char slice boundary lands on whitespace.
// Existing length tests use all-'x' input where that trim is a no-op, so the
// "trim after cap" sub-branch is otherwise unexercised.
//
// NOTE(review): The spec's literal example input
// 'x'.repeat(118) + ' ' + 'yyyyyyyyyy'
// does NOT yield the spec's stated expected output 'x'.repeat(118). Whitespace
// runs are collapsed (`/\s+/g` -> single space) BEFORE the length cap, so the
// three spaces fold to one: the collapsed string is
// 'x'.repeat(118) + ' ' + 'y'.repeat(10) (length 129)
// and the char at the slice boundary (index 119) is a 'y', not whitespace.
// The actual result is 'x'.repeat(118) + ' y' (length 120) — the inner trim is
// a no-op for that exact input. We assert that ACTUAL behavior first (so the
// discrepancy is documented and locked down), then use a corrected input that
// genuinely lands the cut inside whitespace to exercise the intended sub-branch.
it('collapses the spec literal before capping, so its inner trim is a no-op', () => {
const input = 'x'.repeat(118) + ' ' + 'y'.repeat(10);
const out = sanitizeTitle(input);
// Whitespace-run collapse happens before the cap, so the boundary is a 'y'.
expect(out).toBe('x'.repeat(118) + ' y');
expect(out.length).toBe(120);
});
it('drops a boundary space via the post-cap trim (lines ~79-81)', () => {
// To genuinely land the slice(0,120) boundary ON whitespace AFTER collapse,
// put a single token boundary at index 119: 119 non-space chars, then a run
// of spaces (collapsed to one surviving space at index 119), then more text.
// slice(0,120) === 'x'.repeat(119) + ' ', and the post-cap .trim() removes
// that trailing space -> 'x'.repeat(119) (length 119, no trailing space).
const input = 'x'.repeat(119) + ' '.repeat(5) + 'y'.repeat(10);
const out = sanitizeTitle(input);
expect(out).toBe('x'.repeat(119));
expect(out.length).toBe(119);
expect(out.endsWith(' ')).toBe(false);
// The inner trim genuinely fired: without it the result would be
// 'x'.repeat(119) + ' ' (length 120, trailing space).
expect(out).not.toBe('x'.repeat(119) + ' ');
});
// Spec case 2: the function guards input with `(title ?? '')` (line ~74). The
// nullish-coalescing branch — title being null/undefined rather than '' — is
// not exercised by the existing tests (which pass '' and ' '). This is the
// path that protects against a missing page title.
it('returns "_" for null input without throwing', () => {
let out!: string;
expect(() => {
out = sanitizeTitle(null as any);
}).not.toThrow();
expect(out).toBe('_');
// No path separators in the produced name.
expect(out).not.toContain('/');
expect(out).not.toContain('\\');
});
it('returns "_" for undefined input without throwing', () => {
let out!: string;
expect(() => {
out = sanitizeTitle(undefined as any);
}).not.toThrow();
expect(out).toBe('_');
expect(out).not.toContain('/');
expect(out).not.toContain('\\');
});
it('null and undefined inputs collapse to the same empty-name guard result', () => {
expect(sanitizeTitle(null as any)).toBe(sanitizeTitle(undefined as any));
expect(sanitizeTitle(null as any)).toBe(sanitizeTitle(''));
});
});
describe('disambiguate', () => {
it('appends a stable ~slugId suffix', () => {
expect(disambiguate('Notes', 'abc123')).toBe('Notes ~abc123');

View File

@@ -0,0 +1,57 @@
import { describe, it, expect } from "vitest";
import { getSchema } from "@tiptap/core";
import { markdownToProseMirror } from "../src/lib/markdown-to-prosemirror";
import { docmostExtensions } from "../src/lib/docmost-schema";
// REGRESSION LOCK for the stripEmptyParagraphs schema-validity guard.
//
// markdownToProseMirror removes empty `paragraph` nodes that the import leaves
// behind when a block atom (e.g. a block image) is hoisted out of marked's
// wrapping <p> — they cause phantom blank-gap diffs on every sync. But several
// schema nodes REQUIRE non-empty block content (`content: "block+"`): tableCell,
// tableHeader, blockquote, column, callout, and the doc root. For an empty one of
// those, generateJSON materializes a single empty paragraph as its OBLIGATORY
// content. Stripping that would produce a schema-INVALID doc (`content: []`),
// which crashes any consumer that validates the public markdownToProseMirror
// output via ProseMirror's Node.check() / nodeFromJSON. The guard keeps one empty
// paragraph when removal would empty such a container; these tests pin that.
const schema = getSchema(docmostExtensions as any);
/** Throws if the JSON doc is not valid against the Docmost schema. */
function assertSchemaValid(doc: unknown): void {
schema.nodeFromJSON(doc).check();
}
describe("stripEmptyParagraphs keeps the import schema-valid", () => {
it("an empty GFM table cell round-trips to a schema-valid doc", async () => {
const doc = await markdownToProseMirror(
"| a | |\n|---|---|\n| x | y |\n",
);
expect(() => assertSchemaValid(doc)).not.toThrow();
});
it("an empty blockquote stays schema-valid", async () => {
const doc = await markdownToProseMirror("> \n");
expect(() => assertSchemaValid(doc)).not.toThrow();
});
it("an empty document stays schema-valid", async () => {
const doc = await markdownToProseMirror("\n\n");
expect(() => assertSchemaValid(doc)).not.toThrow();
});
it("still removes the empty hoist-artifact paragraph beside a block image", async () => {
const doc = await markdownToProseMirror("p\n\n![x](http://a.aa)\n\nq\n");
const emptyParas = ((doc as { content?: any[] }).content ?? []).filter(
(n: any) =>
n.type === "paragraph" &&
(!Array.isArray(n.content) || n.content.length === 0),
);
// The artifact paragraph must be gone (no phantom blank-gap on re-export)...
expect(emptyParas).toHaveLength(0);
// ...and the result is still a valid doc.
expect(() => assertSchemaValid(doc)).not.toThrow();
});
});