- F10 [stability]: closeMetricsServer() now calls server.closeIdleConnections()
+ server.unref() after server.close(). server.close()'s callback doesn't fire
until keep-alive sockets drain, and the scraper (VictoriaMetrics/vmagent) holds
an idle keep-alive socket — so onModuleDestroy's awaited close would hang until
the scraper disconnects or the orchestrator SIGKILLs on the kill-grace window.
closeIdleConnections() drops idle keep-alive sockets so shutdown completes
immediately (Node 22, per the Dockerfile base).
- F9 [test]: client-telemetry.module.spec.ts pins the E1=B register() gate — the
core of the "public endpoint OFF by default" decision: flag unset / any non-
"true" value ("false"/""/"0"/…) → empty controllers+providers (route absent);
"true"/"TRUE" → registers VitalsController + VitalsService. A flag-inversion or
truthiness regression that reopened the anonymous disk-fill surface now fails.
- F11 [regression/perf]: the db_query_duration_seconds token work (firstSqlToken
regex + Set lookup) is now gated on isMetricsEnabled() in database.module.ts, so
a non-metrics deployment pays NOTHING per query (previously observeDbQuery
no-op'd but the token was still computed on every query). Also hoisted the
13-element known-token Set to a module const (KNOWN_SQL_TOKENS) so it's built
once, not per query.
Gate: server tsc 0; metrics + vitals + client-telemetry suites pass (incl. the
new register-gate test).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Maintainer resolved E1 as variant B: the public vitals sink + client collection
must be OFF by default (else client_metrics grows unbounded on a self-host deploy
with no external pruner, via an unauthenticated public endpoint).
- F1: new operator flag CLIENT_TELEMETRY_ENABLED (default OFF), SEPARATE from
METRICS_PORT (Grafana reads the table directly, independent of the scrape port).
ClientTelemetryModule.register() provides VitalsController ONLY when the flag is
true (route absent otherwise); the flag reaches the client via window.CONFIG
(config.ts isClientTelemetryEnabled), and initVitals() early-returns when off.
- F2/F3 [throttler]: this repo's ThrottlerGuard applies EVERY named throttler to
every guarded route unless skipped. The new VITALS bucket therefore (a) newly
bound collab-token → 429 behind shared/NAT IPs, and (b) the vitals route didn't
skip the stricter public-share-ai (5/min) bucket → effective 5/min not 120.
Fix (additive, global config unchanged): vitals.controller @SkipThrottle the
other buckets + @Throttle VITALS 120/min; collab-token adds VITALS_THROTTLER to
its existing @SkipThrottle (restoring its prior effectively-unthrottled state).
- F4: metrics node:http server is closed on shutdown (MetricsServerLifecycle
OnModuleDestroy → closeMetricsServer(), fired by enableShutdownHooks).
- F5: docSize outside [0, int4-max] drops to null (keeping the event) instead of
overflowing int4 and failing the WHOLE batch insert (+ 2 tests).
- F6: .env.example documents METRICS_PORT (no default — unset = subsystem OFF) +
CLIENT_TELEMETRY_ENABLED; fixed the inaccurate "default 9464" wording.
- F7: disabled/non-sampled sessions install ZERO observers — isVitalsActive()
(enabled && sampled) gates reportClientMetric AND the page-editor
measurePageOpen + dispatchTransaction wrapping.
- F8: kept db.d.ts hand-added (wontfix) — this repo HAND-CURATES db.d.ts (verified
across recent fork migrations a32fba63/8c5b57eb/fdeede00); codegen would be the
deviation. The ClientMetrics interface maps the migration 1:1.
Gate: server tsc 0, client tsc 0, server metrics/vitals/telemetry/throttle 21
tests, client route-template 5. No new deps.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The metrics INFRA is already deployed (VictoriaMetrics scraping docmost:9464,
Grafana dashboards, alerts) with a target `gitmost-app` that is red because the
app half didn't exist. This is that half. The contract (metric names, port,
table, endpoint) is FIXED by the deployed infra and matched exactly.
Server (prom-client):
- A bare node:http `/metrics` server on METRICS_PORT (default 9464), SEPARATE
from the Fastify :3000 listener so /metrics never exists publicly; the whole
subsystem is OFF when METRICS_PORT is unset.
- collectDefaultMetrics() + http_request_duration_seconds{method,route,status}
via a Fastify onResponse hook using the ROUTE TEMPLATE (req.routeOptions.url,
never the raw URL — bounded cardinality; 404 -> "unknown"), EXCLUDING SSE/
streaming responses (would record the connection lifetime and poison p95).
- db_query_duration_seconds (Kysely log callback, labelled by the leading SQL
token), bullmq_queue_depth{queue} (getJobCounts every 15s) +
bullmq_job_duration_seconds{queue} (worker completed/failed),
collab_store_duration_seconds (around onStoreDocument).
- POST /api/telemetry/vitals — PUBLIC (sendBeacon) but IP-throttled; ~16KB body
cap, <=50 events/batch, metric-name + rating whitelist, attr truncated to 120
chars, batch insert; malformed/foreign/oversized silently dropped and 200'd (no
browser retry). New migration `client_metrics` (schema byte-identical to the
contract, both indexes, conditional grafana_ro GRANT; no app-side retention —
the maintenance container prunes >90d).
Client (web-vitals):
- initVitals() decides sampling ONCE per session (25%, sessionStorage) BEFORE
subscribing; onINP/onLCP/onCLS/onTTFB (attribution) buffered + flushed via
navigator.sendBeacon on visibilitychange:hidden and a timer (not fetch-per-
metric). Custom: editor_tx_ms (dispatchTransaction sync-part timer, >8ms, with
doc_size), page_open_ms, longtask_ms. Route labels are templates only; no
titles/slugs/text.
Gate: server + client tsc 0, frozen install 0 (added prom-client + web-vitals +
regenerated the lock), server metrics/vitals tests 11, client route-template 5,
and the migration verified valid against real Postgres.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>