feat(share-ai): cap per-request output tokens and fail closed on Redis loss

Harden the anonymous public-share AI assistant against token-cost abuse before exposing it to the internet: - Add an env-tunable per-request output ceiling (maxOutputTokens) to the public-share streamText call so one anonymous request cannot run up the provider bill even if the per-IP throttle is evaded. New resolveShareAiMaxOutputTokens() / SHARE_AI_MAX_OUTPUT_TOKENS_DEFAULT (env SHARE_AI_MAX_OUTPUT_TOKENS, default 512), mirroring resolveShareAiWorkspaceMax(). - Flip the per-workspace cost limiter to FAIL CLOSED on Redis failure (was fail-open): if Redis is unavailable we cannot prove the workspace is under its cap, so deny rather than admit an unmetered, billable call. - Update the limiter spec (fail-open -> fail-closed) and add resolver tests; document both knobs in .env.example. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 02:13:04 +03:00
parent 987a4fd32e
commit 262a0707d9
4 changed files with 85 additions and 17 deletions
--- a/.env.example
+++ b/.env.example
@@ -112,7 +112,12 @@ MCP_DOCMOST_PASSWORD=
 #
 # Backstop: a cluster-wide, sliding-window cap per workspace (IP-independent,
 # keyed by the server-resolved workspace id) bounds the owner's bill even if the
-# per-IP limit is fully evaded. It is a COST backstop, not an access control,
-# and FAILS OPEN if Redis is unavailable. Override the hourly cap below
+# per-IP limit is fully evaded. It is a COST backstop, not an access control, and
+# FAILS CLOSED if Redis is unavailable (an optional assistant briefly going
+# offline is safer than an unbounded bill). Override the hourly cap below
 # (default: 300 calls per workspace per rolling hour).
 # SHARE_AI_WORKSPACE_MAX_PER_HOUR=300
+#
+# Per-request output-token ceiling for the anonymous assistant (default: 512).
+# Worst-case output per accepted call = agent steps (5) × this value.
+# SHARE_AI_MAX_OUTPUT_TOKENS=512