feat(proxy): add token-based rate limiting via response parsing#1066
feat(proxy): add token-based rate limiting via response parsing#1066
Conversation
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 82.01% | 82.05% | 📈 +0.04% |
| Statements | 81.99% | 81.99% | ➡️ +0.00% |
| Functions | 82.91% | 82.91% | ➡️ +0.00% |
| Branches | 74.37% | 74.20% | 📉 -0.17% |
📁 Per-file Coverage Changes (2 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/cli.ts |
46.3% → 46.1% (-0.22%) | 46.8% → 46.4% (-0.35%) |
src/docker-manager.ts |
83.1% → 83.7% (+0.56%) | 82.4% → 83.0% (+0.54%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
Smoke Test Results✅ GitHub MCP: Last 2 merged PRs: #1048 test: add CI workflow for non-chroot integration tests, #1063 feat(proxy): make copilot api target configurable for enterprise envi… Overall: PASS
|
Go Build Test Results
Overall: ✅ PASS
|
C++ Build Test Results
Overall: PASS
|
Bun Build Test Results
Overall: ✅ PASS Tested with Bun v1.3.10
|
🦀 Rust Build Test Results
Overall: ✅ PASS
|
Deno Build Test Results
Overall: ✅ PASS Deno version: 2.7.1
|
Smoke Test Results — Copilot Engine
Last 2 merged PRs:
Overall: PASS —
|
.NET Build Test Results
Overall: PASS Run outputhello-world: json-parse:
|
Build Test: Node.js Results
Overall: PASS ✅
|
There was a problem hiding this comment.
Pull request overview
Adds token-usage extraction to the API proxy so it can emit token metrics and enforce an opt-in tokens-per-minute (TPM) rate limit, with CLI/config plumbing and tests.
Changes:
- Introduces a
TokenExtractorTransform stream to parse token usage from OpenAI/Copilot and Anthropic responses (streaming + non-streaming). - Extends the proxy rate limiter with a TPM window and a post-response
recordTokens()path, plus/metricstoken counters. - Adds
--rate-limit-tpmflag and wires the config through TypeScript → docker-compose env vars, with unit/integration tests.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/integration/api-proxy-token-ratelimit.test.ts | New integration tests around TPM flag plumbing and /health output. |
| tests/fixtures/awf-runner.ts | Adds rateLimitTpm option and forwards --rate-limit-tpm to the CLI. |
| src/types.ts | Extends RateLimitConfig with required tpm field. |
| src/docker-manager.ts | Passes AWF_RATE_LIMIT_TPM into the api-proxy container environment. |
| src/docker-manager.test.ts | Updates env-var passthrough tests to include tpm. |
| src/cli.ts | Adds --rate-limit-tpm, validation, and config building for tpm. |
| src/cli.test.ts | Updates existing rate-limit config expectations to include tpm. |
| containers/api-proxy/token-extractor.test.js | New unit tests for token extraction across formats and encodings. |
| containers/api-proxy/token-extractor.js | New Transform stream to extract token usage from JSON and SSE responses. |
| containers/api-proxy/server.js | Pipes upstream responses through token extractor and records metrics / TPM usage. |
| containers/api-proxy/rate-limiter.js | Adds TPM sliding window + recordTokens() and exposes TPM in status. |
| containers/api-proxy/Dockerfile | Copies token-extractor.js into the api-proxy image. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (this._isSSE) { | ||
| this._processSSEChunk(chunk); | ||
| } else { | ||
| this._chunks.push(chunk); | ||
| } |
There was a problem hiding this comment.
In non-SSE mode the extractor buffers every response chunk into this._chunks and then Buffer.concat()s it in _flush(). There’s no maximum size / early abort, so a large non-streaming response can cause unbounded memory growth (and duplicates the streamed data in memory). Consider enforcing a max buffered size (fail-open to ZERO_TOKENS once exceeded) similar to the request MAX_BODY_SIZE in server.js, or only enabling buffering for JSON content-types you intend to parse.
| const isSSE = typeof contentType === 'string' && contentType.includes('text/event-stream'); | ||
| const enc = (contentEncoding || '').toLowerCase(); | ||
| const skipExtraction = enc === 'gzip' || enc === 'br' || enc === 'deflate'; | ||
|
|
There was a problem hiding this comment.
Token extraction is skipped for gzip/br/deflate responses. Since the proxy forwards client headers by default (including many clients’ Accept-Encoding), this can make TPM limiting and token metrics effectively no-op in common cases. Consider either stripping/overriding Accept-Encoding upstream (force identity) when TPM is enabled, or adding best-effort decompression for parsing while still passing the original bytes through unchanged.
| allowed: false, | ||
| limitType: 'tpm', | ||
| limit: this.tpm, | ||
| remaining: 0, |
There was a problem hiding this comment.
check() can now return limitType: 'tpm'. The API proxy’s 429 builder in containers/api-proxy/server.js currently only maps rpm/rph/bytes_pm for the human label and window field, so a TPM rejection will produce a confusing label and an incorrect window value (it will fall through to the bytes-per-minute case). Please update the server-side mapping to include 'tpm' (and a distinct window like per_minute_tokens) so clients can interpret the error correctly.
| remaining: 0, | |
| remaining: Math.max(0, this.tpm - tpmCount), |
| if (options.rateLimitTpm !== undefined) { | ||
| const tpm = parseInt(options.rateLimitTpm, 10); | ||
| if (isNaN(tpm) || tpm <= 0) return { error: '--rate-limit-tpm must be a positive integer' }; | ||
| config.tpm = tpm; | ||
| } |
There was a problem hiding this comment.
--rate-limit-tpm parsing/validation was added here, but there are no unit tests covering it (no rateLimitTpm cases in src/cli.test.ts). Add tests that: (1) enabling TPM sets config.tpm and enabled=true, and (2) invalid/zero/negative TPM returns the expected error message, to prevent regressions in flag handling.
| * Tests that token-per-minute (TPM) rate limiting works end-to-end with | ||
| * actual Docker containers. Uses the --rate-limit-tpm flag to enable | ||
| * token-based rate limiting. | ||
| * | ||
| * Note: These tests require the token-extractor.js module to be present | ||
| * in the api-proxy container. If the module is not yet merged, tests | ||
| * that depend on actual token extraction from responses will be skipped. |
There was a problem hiding this comment.
This suite is described as “Token Rate Limiting Integration Tests” and the header note says tests that depend on token extraction will be skipped if the module isn’t merged, but there’s no conditional skip logic here and the current assertions only verify /health output and absence of 429s—not that TPM enforcement triggers based on extracted tokens. Consider either updating the description to match what’s actually tested, or adding an end-to-end assertion that a low TPM limit produces a 429 after a response with known token usage (e.g., via a deterministic mock upstream or a server-level unit test with a mocked extractor).
| * Tests that token-per-minute (TPM) rate limiting works end-to-end with | |
| * actual Docker containers. Uses the --rate-limit-tpm flag to enable | |
| * token-based rate limiting. | |
| * | |
| * Note: These tests require the token-extractor.js module to be present | |
| * in the api-proxy container. If the module is not yet merged, tests | |
| * that depend on actual token extraction from responses will be skipped. | |
| * Tests that token-per-minute (TPM) rate limiting wiring works end-to-end with | |
| * actual Docker containers. Uses the --rate-limit-tpm flag to enable | |
| * token-based rate limiting and validates configuration/health output and | |
| * the absence or presence of 429s under specific configurations. | |
| * | |
| * Note: These tests assume the token-extractor.js module is present in the | |
| * api-proxy container when TPM is enabled, but they currently do not assert | |
| * on per-response token extraction behavior directly. |
| _processSSEChunk(chunk) { | ||
| const text = this._sseLineBuf + chunk.toString('utf8'); | ||
| const lines = text.split('\n'); | ||
| // Last element may be incomplete — keep it in the buffer | ||
| this._sseLineBuf = lines.pop(); | ||
|
|
There was a problem hiding this comment.
SSE parsing builds text via chunk.toString('utf8') and concatenates into a JS string. If a multi-byte UTF-8 sequence is split across chunks (possible when JSON contains non-ASCII text), toString() can introduce replacement characters and make the JSON invalid, causing token extraction to fail. Consider using StringDecoder('utf8') to safely decode chunk boundaries in streaming mode.
|
test: add CI workflow for non-chroot integration tests ✅
|
Chroot Version Comparison Results
Result: FAIL — Python and Node.js versions differ between host and chroot environments.
|
Java Build Test Results ☕
Overall: PASS ✅ All Java projects compiled and tests passed successfully via the AWF proxy (
|
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 82.01% | 82.06% | 📈 +0.05% |
| Statements | 81.99% | 82.00% | 📈 +0.01% |
| Functions | 82.91% | 82.50% | 📉 -0.41% |
| Branches | 74.37% | 74.03% | 📉 -0.34% |
📁 Per-file Coverage Changes (2 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/cli.ts |
46.3% → 46.4% (+0.06%) | 46.8% → 46.7% (-0.07%) |
src/docker-manager.ts |
83.1% → 83.7% (+0.56%) | 82.4% → 83.0% (+0.54%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
C++ Build Test Results
Overall: PASS ✅
|
🦕 Deno Build Test Results
Overall: ✅ PASS
|
Java Build Test Results
Overall: PASS ✅
|
7698d0e to
34ef7df
Compare
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (2 files)
Coverage comparison generated by |
Build Test: Bun Results ✅
Overall: PASS Bun version:
|
Build Test: Node.js Results
Overall: PASS ✅
|
Rust Build Test Results
Overall: PASS ✅
|
🦕 Deno Build Test Results
Overall: ✅ PASS
|
|
Smoke test results for this PR ( ✅ GitHub MCP — Last 2 merged PRs: #1078 "fix: add explicit execute directive to smoke-codex to prevent noop", #1069 "fix(deps): resolve high-severity rollup vulnerability in docs-site" Overall: PASS
|
Go Build Test Results
Overall: ✅ PASS
|
|
Smoke Test Results — run 22641181923 ✅ GitHub MCP: #1078 fix: add explicit execute directive, #1048 test: add CI workflow for non-chroot integration tests Overall: PASS
|
|
PR titles: fix: add explicit execute directive to smoke-codex to prevent noop | fix(deps): resolve high-severity rollup vulnerability in docs-site
|
C++ Build Test Results
Overall: PASS
|
Java Build Test Results ☕
Overall: ✅ PASS
|
.NET Build Test Results
Overall: PASS Run outputhello-world:
|
Chroot Version Comparison Results
Result: ❌ Not all runtimes match — Python and Node.js versions differ between host and chroot environments. The
|
Add token extraction from LLM API responses and token-per-minute
rate limiting to the API proxy sidecar.
Token Extractor (token-extractor.js):
- Transform stream that observes responses without modifying data
- Parses Anthropic and OpenAI response formats (streaming + non-streaming)
- Emits 'tokens' event with {input, output, total}
- Best-effort: parsing failures emit zeros (fail-open)
Rate Limiter (rate-limiter.js):
- Add TPM window (tokens per minute, 1-second granularity)
- New recordTokens() method for post-response token recording
- TPM check in check() uses previous consumption to gate next request
Server Integration (server.js):
- Pipe responses through token extractor: proxyRes → extractor → res
- Record tokens_input_total and tokens_output_total metrics
- Feed rate limiter TPM window after each response
CLI (--rate-limit-tpm):
- New flag enables token-based rate limiting (opt-in like RPM/RPH)
- Passed as AWF_RATE_LIMIT_TPM env var to api-proxy container
Also fixes:
- npm audit fix for minimatch ReDoS vulnerability (GHSA-23c5-xmqv-rm74)
- Integration test robustness against Docker build output in stdout
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
34ef7df to
140380d
Compare
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (2 files)
Coverage comparison generated by |
Build Test: C++ Results
Overall: PASS
|
|
GitHub MCP ✅ — fix: add explicit execute directive to smoke-codex to prevent noop; fix(deps): resolve high-severity rollup vulnerability in docs-site
|
Deno Build Test Results
Overall: ✅ PASS
|
.NET Build Test Results
Overall: PASS Run outputhello-world:
|
|
Smoke Test Results — PASS
|
Smoke Test Results ✅ PASS
PR author:
|
Go Build Test Results
Overall: ✅ PASS
|
🦀 Rust Build Test Results
Overall: ✅ PASS
|
Node.js Build Test Results
Overall: ✅ PASS
|
Chroot Version Comparison Results
Result: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environments.
|
🧪 Build Test: Bun
Overall: ✅ PASS Bun version:
|
Java Build Test Results ☕
Overall: ✅ PASS All Java projects compiled and tests passed successfully.
|
Summary
--rate-limit-tpm <n>(opt-in)tokens_input_totalandtokens_output_totalcounters in/metricsBuilds on #1038 (observability + rate limiting).
How It Works
usagefielddata:lines as they flow throughChanges
token-extractor.jsserver.jsrate-limiter.jsrecordTokens()methodDockerfilecli.ts--rate-limit-tpmflagtypes.ts/docker-manager.tsTests (133 total new + existing)
Test plan
🤖 Generated with Claude Code