feat(costs): add billing, quota, and budget control plane
This commit is contained in:
468
doc/plans/2026-03-14-billing-ledger-and-reporting.md
Normal file
468
doc/plans/2026-03-14-billing-ledger-and-reporting.md
Normal file
@@ -0,0 +1,468 @@
|
||||
# Billing Ledger and Reporting
|
||||
|
||||
## Context
|
||||
|
||||
Paperclip currently stores model spend in `cost_events` and operational run state in `heartbeat_runs`.
|
||||
That split is fine, but the current reporting code tries to infer billing semantics by mixing both tables:
|
||||
|
||||
- `cost_events` knows provider, model, tokens, and dollars
|
||||
- `heartbeat_runs.usage_json` knows some per-run billing metadata
|
||||
- `heartbeat_runs.usage_json` does **not** currently carry enough normalized billing dimensions to support honest provider-level reporting
|
||||
|
||||
This becomes incorrect as soon as a company uses more than one provider, more than one billing channel, or more than one billing mode.
|
||||
|
||||
Examples:
|
||||
|
||||
- direct OpenAI API usage
|
||||
- Claude subscription usage with zero marginal dollars
|
||||
- subscription overage with dollars and tokens
|
||||
- OpenRouter billing where the biller is OpenRouter but the upstream provider is Anthropic or OpenAI
|
||||
|
||||
The system needs to support:
|
||||
|
||||
- dollar reporting
|
||||
- token reporting
|
||||
- subscription-included usage
|
||||
- subscription overage
|
||||
- direct metered API usage
|
||||
- future aggregator billing such as OpenRouter
|
||||
|
||||
## Product Decision
|
||||
|
||||
`cost_events` becomes the canonical billing and usage ledger for reporting.
|
||||
|
||||
`heartbeat_runs` remains an operational execution log. It may keep mirrored billing metadata for debugging and transcripts, but reporting must not reconstruct billing semantics from `heartbeat_runs.usage_json`.
|
||||
|
||||
## Decision: One Ledger Or Two
|
||||
|
||||
We do **not** need two tables to solve the current PR's problem.
|
||||
For request-level inference reporting, `cost_events` is enough if it carries the right dimensions:
|
||||
|
||||
- upstream provider
|
||||
- biller
|
||||
- billing type
|
||||
- model
|
||||
- token fields
|
||||
- billed amount
|
||||
|
||||
That is why the first implementation pass extends `cost_events` instead of introducing a second table immediately.
|
||||
|
||||
However, if Paperclip needs to account for the full billing surface of aggregators and managed AI platforms, then `cost_events` alone is not enough.
|
||||
Some charges are not cleanly representable as a single model inference event:
|
||||
|
||||
- account top-ups and credit purchases
|
||||
- platform fees charged at purchase time
|
||||
- BYOK platform fees that are account-level or threshold-based
|
||||
- prepaid credit expirations, refunds, and adjustments
|
||||
- provisioned throughput commitments
|
||||
- fine-tuning, training, model import, and storage charges
|
||||
- gateway logging or other platform overhead that is not attributable to one prompt/response pair
|
||||
|
||||
So the decision is:
|
||||
|
||||
- near term: keep `cost_events` as the inference and usage ledger
|
||||
- next phase: add `finance_events` for non-inference financial events
|
||||
|
||||
This is a deliberate split between:
|
||||
|
||||
- usage and inference accounting
|
||||
- account-level and platform-level financial accounting
|
||||
|
||||
That separation keeps request reporting honest without forcing us to fake invoice semantics onto rows that were never request-scoped.
|
||||
|
||||
## External Motivation And Sources
|
||||
|
||||
The need for this model is not theoretical.
|
||||
It follows directly from the billing systems of providers and aggregators Paperclip needs to support.
|
||||
|
||||
### OpenRouter
|
||||
|
||||
Source URLs:
|
||||
|
||||
- https://openrouter.ai/docs/faq#credit-and-billing-systems
|
||||
- https://openrouter.ai/pricing
|
||||
|
||||
Relevant billing behavior as of March 14, 2026:
|
||||
|
||||
- OpenRouter passes through underlying inference pricing and deducts request cost from purchased credits.
|
||||
- OpenRouter charges a 5.5% fee with a $0.80 minimum when purchasing credits.
|
||||
- Crypto payments are charged a 5% fee.
|
||||
- BYOK has its own fee model after a free request threshold.
|
||||
- OpenRouter billing is aggregated at the OpenRouter account level even when the upstream provider is Anthropic, OpenAI, Google, or another provider.
|
||||
|
||||
Implication for Paperclip:
|
||||
|
||||
- request usage belongs in `cost_events`
|
||||
- credit purchases, purchase fees, BYOK fees, refunds, and expirations belong in `finance_events`
|
||||
- `biller=openrouter` must remain distinct from `provider=anthropic|openai|google|...`
|
||||
|
||||
### Cloudflare AI Gateway Unified Billing
|
||||
|
||||
Source URL:
|
||||
|
||||
- https://developers.cloudflare.com/ai-gateway/features/unified-billing/
|
||||
|
||||
Relevant billing behavior as of March 14, 2026:
|
||||
|
||||
- Unified Billing lets users call multiple upstream providers while receiving a single Cloudflare bill.
|
||||
- Usage is paid from Cloudflare-loaded credits.
|
||||
- Cloudflare supports manual top-ups and auto top-up thresholds.
|
||||
- Spend limits can stop request processing on daily, weekly, or monthly boundaries.
|
||||
- Unified Billing traffic can use Cloudflare-managed credentials rather than the user's direct provider key.
|
||||
|
||||
Implication for Paperclip:
|
||||
|
||||
- request usage needs `biller=cloudflare`
|
||||
- upstream provider still needs to be preserved separately
|
||||
- Cloudflare credit loads and related account-level events are not inference rows and should not be forced into `cost_events`
|
||||
- quota and limits reporting must support biller-level controls, not just upstream provider limits
|
||||
|
||||
### Amazon Bedrock
|
||||
|
||||
Source URL:
|
||||
|
||||
- https://aws.amazon.com/bedrock/pricing/
|
||||
|
||||
Relevant billing behavior as of March 14, 2026:
|
||||
|
||||
- Bedrock supports on-demand and batch pricing.
|
||||
- Bedrock pricing varies by region.
|
||||
- some pricing tiers add premiums or discounts relative to standard pricing
|
||||
- provisioned throughput is commitment-based rather than request-based
|
||||
- custom model import uses Custom Model Units billed per minute, with monthly storage charges
|
||||
- imported model copies are billed in 5-minute windows once active
|
||||
- customization and fine-tuning introduce training and hosted-model charges beyond normal inference
|
||||
|
||||
Implication for Paperclip:
|
||||
|
||||
- normal tokenized inference fits in `cost_events`
|
||||
- provisioned throughput, custom model unit charges, training, and storage charges require `finance_events`
|
||||
- region and pricing tier need to be first-class dimensions in the financial model
|
||||
|
||||
## Ledger Boundary
|
||||
|
||||
To keep the system coherent, the table boundary should be explicit.
|
||||
|
||||
### `cost_events`
|
||||
|
||||
Use `cost_events` for request-scoped usage and inference charges:
|
||||
|
||||
- one row per billable or usage-bearing run event
|
||||
- provider/model/biller/billingType/tokens/cost
|
||||
- optionally tied to `heartbeat_run_id`
|
||||
- supports direct APIs, subscriptions, overage, OpenRouter-routed inference, Cloudflare-routed inference, and Bedrock on-demand inference
|
||||
|
||||
### `finance_events`
|
||||
|
||||
Use `finance_events` for account-scoped or platform-scoped financial events:
|
||||
|
||||
- credit purchase
|
||||
- top-up
|
||||
- refund
|
||||
- fee
|
||||
- expiry
|
||||
- provisioned capacity
|
||||
- training
|
||||
- model import
|
||||
- storage
|
||||
- invoice adjustment
|
||||
|
||||
These rows may or may not have a related model, provider, or run id.
|
||||
Trying to force them into `cost_events` would either create fake request rows or create null-heavy rows that mean something fundamentally different from inference usage.
|
||||
|
||||
## Canonical Billing Dimensions
|
||||
|
||||
Every persisted billing event should model four separate axes:
|
||||
|
||||
1. Usage provider
|
||||
The upstream provider whose model performed the work.
|
||||
Examples: `openai`, `anthropic`, `google`.
|
||||
|
||||
2. Biller
|
||||
The system that charged for the usage.
|
||||
Examples: `openai`, `anthropic`, `openrouter`, `cursor`, `chatgpt`.
|
||||
|
||||
3. Billing type
|
||||
The pricing mode applied to the event.
|
||||
Initial canonical values:
|
||||
- `metered_api`
|
||||
- `subscription_included`
|
||||
- `subscription_overage`
|
||||
- `credits`
|
||||
- `fixed`
|
||||
- `unknown`
|
||||
|
||||
4. Measures
|
||||
Usage and billing must both be storable:
|
||||
- `input_tokens`
|
||||
- `output_tokens`
|
||||
- `cached_input_tokens`
|
||||
- `cost_cents`
|
||||
|
||||
These dimensions are independent.
|
||||
For example, an event may be:
|
||||
|
||||
- provider: `anthropic`
|
||||
- biller: `openrouter`
|
||||
- billing type: `metered_api`
|
||||
- tokens: non-zero
|
||||
- cost cents: non-zero
|
||||
|
||||
Or:
|
||||
|
||||
- provider: `anthropic`
|
||||
- biller: `anthropic`
|
||||
- billing type: `subscription_included`
|
||||
- tokens: non-zero
|
||||
- cost cents: `0`
|
||||
|
||||
## Schema Changes
|
||||
|
||||
Extend `cost_events` with:
|
||||
|
||||
- `heartbeat_run_id uuid null references heartbeat_runs.id`
|
||||
- `biller text not null default 'unknown'`
|
||||
- `billing_type text not null default 'unknown'`
|
||||
- `cached_input_tokens int not null default 0`
|
||||
|
||||
Keep `provider` as the upstream usage provider.
|
||||
Do not overload `provider` to mean biller.
|
||||
|
||||
Add a future `finance_events` table for account-level financial events with fields along these lines:
|
||||
|
||||
- `company_id`
|
||||
- `occurred_at`
|
||||
- `event_kind`
|
||||
- `direction`
|
||||
- `biller`
|
||||
- `provider nullable`
|
||||
- `execution_adapter_type nullable`
|
||||
- `pricing_tier nullable`
|
||||
- `region nullable`
|
||||
- `model nullable`
|
||||
- `quantity nullable`
|
||||
- `unit nullable`
|
||||
- `amount_cents`
|
||||
- `currency`
|
||||
- `estimated`
|
||||
- `related_cost_event_id nullable`
|
||||
- `related_heartbeat_run_id nullable`
|
||||
- `external_invoice_id nullable`
|
||||
- `metadata_json nullable`
|
||||
|
||||
Add indexes:
|
||||
|
||||
- `(company_id, biller, occurred_at)`
|
||||
- `(company_id, provider, occurred_at)`
|
||||
- `(company_id, heartbeat_run_id)` if distinct-run reporting remains common
|
||||
|
||||
## Shared Contract Changes
|
||||
|
||||
### Shared types
|
||||
|
||||
Add a shared billing type union and enrich cost types with:
|
||||
|
||||
- `heartbeatRunId`
|
||||
- `biller`
|
||||
- `billingType`
|
||||
- `cachedInputTokens`
|
||||
|
||||
Update reporting response types so the provider breakdown reflects the ledger directly rather than inferred run metadata.
|
||||
|
||||
### Validators
|
||||
|
||||
Extend `createCostEventSchema` to accept:
|
||||
|
||||
- `heartbeatRunId`
|
||||
- `biller`
|
||||
- `billingType`
|
||||
- `cachedInputTokens`
|
||||
|
||||
Defaults:
|
||||
|
||||
- `biller` defaults to `provider`
|
||||
- `billingType` defaults to `unknown`
|
||||
- `cachedInputTokens` defaults to `0`
|
||||
|
||||
## Adapter Contract Changes
|
||||
|
||||
Extend adapter execution results so they can report:
|
||||
|
||||
- `biller`
|
||||
- richer billing type values
|
||||
|
||||
Backwards compatibility:
|
||||
|
||||
- existing adapter values `api` and `subscription` are treated as legacy aliases
|
||||
- map `api -> metered_api`
|
||||
- map `subscription -> subscription_included`
|
||||
|
||||
Future adapters may emit the canonical values directly.
|
||||
|
||||
OpenRouter support will use:
|
||||
|
||||
- `provider` = upstream provider when known
|
||||
- `biller` = `openrouter`
|
||||
- `billingType` = `metered_api` unless OpenRouter later exposes another billing mode
|
||||
|
||||
Cloudflare Unified Billing support will use:
|
||||
|
||||
- `provider` = upstream provider when known
|
||||
- `biller` = `cloudflare`
|
||||
- `billingType` = `credits` or `metered_api` depending on the normalized request billing contract
|
||||
|
||||
Bedrock support will use:
|
||||
|
||||
- `provider` = upstream provider or `aws_bedrock` depending on adapter shape
|
||||
- `biller` = `aws_bedrock`
|
||||
- `billingType` = request-scoped mode for inference rows
|
||||
- `finance_events` for provisioned, training, import, and storage charges
|
||||
|
||||
## Write Path Changes
|
||||
|
||||
### Heartbeat-created events
|
||||
|
||||
When a heartbeat run produces usage or spend:
|
||||
|
||||
1. normalize adapter billing metadata
|
||||
2. write a ledger row to `cost_events`
|
||||
3. attach `heartbeat_run_id`
|
||||
4. set `provider`, `biller`, `billing_type`, token fields, and `cost_cents`
|
||||
|
||||
The write path should no longer depend on later inference from `heartbeat_runs`.
|
||||
|
||||
### Manual API-created events
|
||||
|
||||
Manual cost event creation remains supported.
|
||||
These events may have `heartbeatRunId = null`.
|
||||
|
||||
Rules:
|
||||
|
||||
- `provider` remains required
|
||||
- `biller` defaults to `provider`
|
||||
- `billingType` defaults to `unknown`
|
||||
|
||||
## Reporting Changes
|
||||
|
||||
### Server
|
||||
|
||||
Refactor reporting queries to use `cost_events` only.
|
||||
|
||||
#### `summary`
|
||||
|
||||
- sum `cost_cents`
|
||||
|
||||
#### `by-agent`
|
||||
|
||||
- sum costs and token fields from `cost_events`
|
||||
- use `count(distinct heartbeat_run_id)` filtered by billing type for run counts
|
||||
- use token sums filtered by billing type for subscription usage
|
||||
|
||||
#### `by-provider`
|
||||
|
||||
- group by `provider`, `model`
|
||||
- sum costs and token fields directly from the ledger
|
||||
- derive billing-type slices from `cost_events.billing_type`
|
||||
- never pro-rate from unrelated `heartbeat_runs`
|
||||
|
||||
#### future `by-biller`
|
||||
|
||||
- group by `biller`
|
||||
- this is the right view for invoice and subscription accountability
|
||||
|
||||
#### `window-spend`
|
||||
|
||||
- continue to use `cost_events`
|
||||
|
||||
#### project attribution
|
||||
|
||||
Keep current project attribution logic for now, but prefer `cost_events.heartbeat_run_id` as the join anchor whenever possible.
|
||||
|
||||
## UI Changes
|
||||
|
||||
### Principles
|
||||
|
||||
- Spend, usage, and quota are related but distinct
|
||||
- a missing quota fetch is not the same as “no quota”
|
||||
- provider and biller are different dimensions
|
||||
|
||||
### Immediate UI changes
|
||||
|
||||
1. Keep the current costs page structure.
|
||||
2. Make the provider cards accurate by reading only ledger-backed values.
|
||||
3. Show provider quota fetch errors explicitly instead of dropping them.
|
||||
|
||||
### Follow-up UI direction
|
||||
|
||||
The long-term board UI should expose:
|
||||
|
||||
- Spend
|
||||
Dollars by biller, provider, model, agent, project
|
||||
- Usage
|
||||
Tokens by provider, model, agent, project
|
||||
- Quotas
|
||||
Live provider or biller limits, credits, and reset windows
|
||||
- Financial events
|
||||
Credit purchases, top-ups, fees, refunds, commitments, storage, and other non-inference charges
|
||||
|
||||
## Migration Plan
|
||||
|
||||
Migration behavior:
|
||||
|
||||
- add new non-destructive columns with defaults
|
||||
- backfill existing rows:
|
||||
- `biller = provider`
|
||||
- `billing_type = 'unknown'`
|
||||
- `cached_input_tokens = 0`
|
||||
- `heartbeat_run_id = null`
|
||||
|
||||
Do **not** attempt to backfill historical provider-level subscription attribution from `heartbeat_runs`.
|
||||
That data was never stored with the required dimensions.
|
||||
|
||||
## Testing Plan
|
||||
|
||||
Add or update tests for:
|
||||
|
||||
1. heartbeat-created ledger rows persist `heartbeatRunId`, `biller`, `billingType`, and cached tokens
|
||||
2. legacy adapter billing values map correctly
|
||||
3. provider reporting uses ledger data only
|
||||
4. mixed-provider companies do not cross-attribute subscription usage
|
||||
5. zero-dollar subscription usage still appears in token reporting
|
||||
6. quota fetch failures render explicit UI state
|
||||
7. manual cost events still validate and write correctly
|
||||
8. biller reporting keeps upstream provider breakdowns separate
|
||||
9. OpenRouter-style rows can show `biller=openrouter` with non-OpenRouter upstream providers
|
||||
10. Cloudflare-style rows can show `biller=cloudflare` with preserved upstream provider identity
|
||||
11. future `finance_events` aggregation handles non-request charges without requiring a model or run id
|
||||
|
||||
## Delivery Plan
|
||||
|
||||
### Step 1
|
||||
|
||||
- land the ledger contract and query rewrite
|
||||
- make the current costs page correct
|
||||
|
||||
### Step 2
|
||||
|
||||
- add biller-oriented reporting endpoints and UI
|
||||
|
||||
### Step 3
|
||||
|
||||
- wire OpenRouter and any future aggregator adapters to the same contract
|
||||
|
||||
### Step 4
|
||||
|
||||
- add `executionAdapterType` to persisted cost reporting if adapter-level grouping becomes a product requirement
|
||||
|
||||
### Step 5
|
||||
|
||||
- introduce `finance_events`
|
||||
- add non-inference accounting endpoints
|
||||
- add UI for platform/account charges alongside inference spend and usage
|
||||
|
||||
## Non-Goals For This Change
|
||||
|
||||
- multi-currency support
|
||||
- invoice reconciliation
|
||||
- provider-specific cost estimation beyond persisted billed cost
|
||||
- replacing `heartbeat_runs` as the operational run record
|
||||
Reference in New Issue
Block a user