Implements all reviewer comments (code-review, red-team, and test-strategy audit), accepting the recommended variants. Server — realtime service (ai-realtime.service.ts): - SSRF: pin the validated IP via a WebSocket `lookup` hook that re-checks every resolved address with isIpAllowed (mirrors external-mcp buildPinnedDispatcher), closing the TOCTOU/DNS-rebinding window; fix the misleading comment. - no-silent-loss: on Stop, drain the in-flight segment (bounded 2.5s) and deliver the final via onFinal before closing instead of dropping the tail. - fail-closed deriveRealtimeUrl: a non-empty unparseable base now THROWS (no silent api.openai.com fallback that would leak a self-hosted key); http://ws:// bases rejected (plaintext key). Path normalization preserved. - parseUpstreamEvent keys the accumulator by item_id+content_index so GA segments don't concatenate. - inject a wsFactory seam for testing; also fix a latent bug — `import WebSocket from 'ws'` resolved to undefined at runtime (no esModuleInterop) -> import=require. - unref idle/max/drain timers. Server — realtime gateway (ai-realtime.gateway.ts, session-limits.ts): - reject revoked/disabled users and inactive sessions (mirror jwt.strategy: findById+isUserDisabled + findActiveById) with NO counter increment. - CSWSH: Origin allowlist (matching APP_URL, or no Origin for native clients) before auth, no increment. - extract SessionCounters (delete-at-zero, never negative) + pure canConnect (both caps >= checked before any increment); document the per-process/in-memory cap caveat (single-replica only). Client: - dictation-group: realtime final now inserts at the captured rangeRef SNAPSHOT (not the live caret) and guards editor.isEditable; single-space separator. - use-realtime-dictation/realtime-dictation-client: stop-during-acquisition tears down the mic (no leak / button reset); reconnect re-emits start (double-start guarded); interim ghost cleared on teardown; io() options de-duplicated. - pcm16-worklet: flush the partial sub-frame tail on stop; one-pole anti-aliasing low-pass before 48k->24k. - extract shared mic-capture (acquireMicStream/mapGetUserMediaError, used by batch + realtime), pure DSP (pcm16-dsp.ts), and the session reducer/baseLanguageSubtag; extract applyInterimMeta/clampRange/resolveUrl/appendFinalToDraft. Tests + infra: +~150 server tests (deriveRealtimeUrl, parseUpstreamEvent branches, openSession/lifecycle/timers/testConnection via fake ws, gateway auth/caps/no-leak, realtime-test admin contract, AiSettings update/resolve, DTO boolean, SSRF deny) and +~140 client tests (DSP property/edge, resampler continuity, framing, reducer, mic-capture, RealtimeDictationClient/MicButton, ProseMirror interim regression + history guards, appendFinalToDraft, resolveKeyField, route contract). Added @vitest/coverage-v8. CHANGELOG [Unreleased] entry incl. the single-replica caveat. Review: APPROVE WITH SUGGESTIONS (no critical/regression); applied the drain-timer unref. Server tsc clean + 358 tests; client tsc clean + 201 tests; vite build ok. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
69 lines
2.9 KiB
TypeScript
69 lines
2.9 KiB
TypeScript
// Shared microphone-acquisition front-end used by BOTH the batch (`use-dictation`)
|
|
// and streaming (`use-realtime-dictation`) hooks. Only the getUserMedia handshake
|
|
// and its error→message mapping live here — the two hooks keep their own distinct
|
|
// downstream graphs (MediaRecorder vs AudioWorklet) and their own streamRef
|
|
// ownership. This collapses the ~37 duplicated lines without merging the hooks.
|
|
|
|
// Translate function shape (react-i18next's `t`). Kept structural so this module
|
|
// has no i18next dependency and stays trivially testable.
|
|
export type Translate = (key: string) => string;
|
|
|
|
/** Thrown by `acquireMicStream` when the environment cannot capture audio. */
|
|
export class MicUnavailableError extends Error {
|
|
constructor() {
|
|
super("navigator.mediaDevices.getUserMedia is unavailable in this context");
|
|
this.name = "MicUnavailableError";
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Map a getUserMedia rejection to a user-facing, localized message. Mirrors the
|
|
* branching both hooks used previously so behavior is identical. Pure aside from
|
|
* the injected `t`; safe to unit-test with a stub translator.
|
|
*/
|
|
export function mapGetUserMediaError(err: unknown, t: Translate): string {
|
|
const name = (err as { name?: string })?.name;
|
|
const detail = (err as { message?: string })?.message ?? String(err);
|
|
if (name === "NotAllowedError" || name === "SecurityError") {
|
|
return t("Microphone access denied");
|
|
}
|
|
if (name === "NotFoundError" || name === "OverconstrainedError") {
|
|
return t("No microphone found");
|
|
}
|
|
if (name === "NotReadableError" || name === "AbortError") {
|
|
return t("Microphone is unavailable or already in use");
|
|
}
|
|
// Unknown failure: show the real reason instead of a generic string.
|
|
return `${t("Could not start recording")}: ${name ? `${name}: ` : ""}${detail}`;
|
|
}
|
|
|
|
/**
|
|
* Request the microphone. Throws `MicUnavailableError` when the API is missing
|
|
* (so callers can show the "not available in this context" notification), and
|
|
* otherwise rethrows the raw getUserMedia error for `mapGetUserMediaError`. The
|
|
* caller owns the returned stream (assigns it to its own streamRef and is
|
|
* responsible for stopping the tracks on every exit path).
|
|
*/
|
|
export async function acquireMicStream(): Promise<MediaStream> {
|
|
if (!navigator.mediaDevices?.getUserMedia) {
|
|
throw new MicUnavailableError();
|
|
}
|
|
return navigator.mediaDevices.getUserMedia({ audio: true });
|
|
}
|
|
|
|
/**
|
|
* Shared synchronous double-start guard. Returns true when a new capture may
|
|
* begin, false when one is already starting or live (so the second click is a
|
|
* no-op and never opens a leaking second MediaStream). `status` is the React
|
|
* status; the refs cover the window before the next render commits.
|
|
*/
|
|
export function canStartCapture(args: {
|
|
starting: boolean;
|
|
hasStream: boolean;
|
|
hasLiveResource: boolean;
|
|
statusIsIdle: boolean;
|
|
}): boolean {
|
|
if (args.starting || args.hasStream || args.hasLiveResource) return false;
|
|
return args.statusIsIdle;
|
|
}
|