feat(observability): dev-side perf metrics — /metrics + client vitals (#355)
The metrics INFRA is already deployed (VictoriaMetrics scraping docmost:9464,
Grafana dashboards, alerts) with a target `gitmost-app` that is red because the
app half didn't exist. This is that half. The contract (metric names, port,
table, endpoint) is FIXED by the deployed infra and matched exactly.
Server (prom-client):
- A bare node:http `/metrics` server on METRICS_PORT (default 9464), SEPARATE
from the Fastify :3000 listener so /metrics never exists publicly; the whole
subsystem is OFF when METRICS_PORT is unset.
- collectDefaultMetrics() + http_request_duration_seconds{method,route,status}
via a Fastify onResponse hook using the ROUTE TEMPLATE (req.routeOptions.url,
never the raw URL — bounded cardinality; 404 -> "unknown"), EXCLUDING SSE/
streaming responses (would record the connection lifetime and poison p95).
- db_query_duration_seconds (Kysely log callback, labelled by the leading SQL
token), bullmq_queue_depth{queue} (getJobCounts every 15s) +
bullmq_job_duration_seconds{queue} (worker completed/failed),
collab_store_duration_seconds (around onStoreDocument).
- POST /api/telemetry/vitals — PUBLIC (sendBeacon) but IP-throttled; ~16KB body
cap, <=50 events/batch, metric-name + rating whitelist, attr truncated to 120
chars, batch insert; malformed/foreign/oversized silently dropped and 200'd (no
browser retry). New migration `client_metrics` (schema byte-identical to the
contract, both indexes, conditional grafana_ro GRANT; no app-side retention —
the maintenance container prunes >90d).
Client (web-vitals):
- initVitals() decides sampling ONCE per session (25%, sessionStorage) BEFORE
subscribing; onINP/onLCP/onCLS/onTTFB (attribution) buffered + flushed via
navigator.sendBeacon on visibilitychange:hidden and a timer (not fetch-per-
metric). Custom: editor_tx_ms (dispatchTransaction sync-part timer, >8ms, with
doc_size), page_open_ms, longtask_ms. Route labels are templates only; no
titles/slugs/text.
Gate: server + client tsc 0, frozen install 0 (added prom-client + web-vitals +
regenerated the lock), server metrics/vitals tests 11, client route-template 5,
and the migration verified valid against real Postgres.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -40,6 +40,8 @@ import { PageListener } from '@docmost/db/listeners/page.listener';
|
||||
import { PostgresJSDialect } from 'kysely-postgres-js';
|
||||
import * as postgres from 'postgres';
|
||||
import { normalizePostgresUrl } from '../common/helpers';
|
||||
import { observeDbQuery } from '../integrations/metrics/metrics.registry';
|
||||
import { firstSqlToken } from '../integrations/metrics/metrics.constants';
|
||||
|
||||
@Global()
|
||||
@Module({
|
||||
@@ -67,6 +69,14 @@ import { normalizePostgresUrl } from '../common/helpers';
|
||||
}),
|
||||
plugins: [new CamelCasePlugin()],
|
||||
log: (event: LogEvent) => {
|
||||
// #355 — db_query_duration_seconds, labelled by the leading SQL token
|
||||
// (bounded cardinality). No-op when METRICS_PORT is unset. Runs for
|
||||
// every query, independent of the dev-only debug logging below.
|
||||
observeDbQuery(
|
||||
firstSqlToken(event.query.sql),
|
||||
event.queryDurationMillis / 1000,
|
||||
);
|
||||
|
||||
if (environmentService.getNodeEnv() !== 'development') return;
|
||||
const logger = new Logger(DatabaseModule.name);
|
||||
if (process.env.DEBUG_DB?.toLowerCase() === 'true') {
|
||||
|
||||
Reference in New Issue
Block a user