feat(share-ai): cap per-request output tokens and fail closed on Redis loss

Harden the anonymous public-share AI assistant against token-cost abuse
before exposing it to the internet:

- Add an env-tunable per-request output ceiling (maxOutputTokens) to the
  public-share streamText call so one anonymous request cannot run up the
  provider bill even if the per-IP throttle is evaded. New
  resolveShareAiMaxOutputTokens() / SHARE_AI_MAX_OUTPUT_TOKENS_DEFAULT
  (env SHARE_AI_MAX_OUTPUT_TOKENS, default 512), mirroring
  resolveShareAiWorkspaceMax().
- Flip the per-workspace cost limiter to FAIL CLOSED on Redis failure
  (was fail-open): if Redis is unavailable we cannot prove the workspace is
  under its cap, so deny rather than admit an unmetered, billable call.
- Update the limiter spec (fail-open -> fail-closed) and add resolver tests;
  document both knobs in .env.example.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
claude_code
2026-06-21 02:13:04 +03:00
parent 987a4fd32e
commit 262a0707d9
4 changed files with 85 additions and 17 deletions

View File

@@ -112,7 +112,12 @@ MCP_DOCMOST_PASSWORD=
#
# Backstop: a cluster-wide, sliding-window cap per workspace (IP-independent,
# keyed by the server-resolved workspace id) bounds the owner's bill even if the
# per-IP limit is fully evaded. It is a COST backstop, not an access control,
# and FAILS OPEN if Redis is unavailable. Override the hourly cap below
# per-IP limit is fully evaded. It is a COST backstop, not an access control, and
# FAILS CLOSED if Redis is unavailable (an optional assistant briefly going
# offline is safer than an unbounded bill). Override the hourly cap below
# (default: 300 calls per workspace per rolling hour).
# SHARE_AI_WORKSPACE_MAX_PER_HOUR=300
#
# Per-request output-token ceiling for the anonymous assistant (default: 512).
# Worst-case output per accepted call = agent steps (5) × this value.
# SHARE_AI_MAX_OUTPUT_TOKENS=512