Skip to content

Cost Tracking

llmist provides comprehensive cost tracking through the ExecutionTree and ModelRegistry APIs.

Track costs after an agent run:

const answer = await LLMist.createAgent()
.withModel('sonnet')
.withGadgets(FloppyDisk)
.withHooks({
observers: {
onAgentComplete: (ctx) => {
const tree = ctx.tree;
console.log(`Total cost: $${tree.getTotalCost().toFixed(4)}`);
console.log(`Total tokens:`, tree.getTotalTokens());
},
},
})
.askAndCollect('How many floppies for DOOM.ZIP at 50MB?');

Output:

Total cost: $0.0032
Total tokens: { input: 850, output: 120, cached: 0 }

Estimate costs before making API calls:

const client = new LLMist();
const registry = client.modelRegistry;
// Estimate for known token counts
const cost = registry.estimateCost('gpt-5', 10_000, 2_000);
console.log(`Estimated: $${cost.totalCost.toFixed(4)}`);
// Estimate from messages
const messages = [
{ role: 'system', content: 'You are helpful' },
{ role: 'user', content: 'Explain quantum computing in detail...' },
];
const inputTokens = await client.countTokens('openai:gpt-5', messages);
const estimatedCost = registry.estimateCost('gpt-5', inputTokens, 1000);
const cost = registry.estimateCost('sonnet', 10_000, 2_000);
console.log(cost.inputCost); // Cost for input tokens
console.log(cost.outputCost); // Cost for output tokens
console.log(cost.totalCost); // Combined total

The ExecutionTree tracks all costs during agent execution:

const result = await LLMist.createAgent()
.withModel('sonnet')
.withHooks({
observers: {
onLLMCallComplete: (ctx) => {
const iterationCost = ctx.tree.getTotalCost();
console.log(`Running total: $${iterationCost.toFixed(4)}`);
},
},
})
.askAndCollect('Research task');
const tokens = tree.getTotalTokens();
console.log(tokens.input); // Total input tokens
console.log(tokens.output); // Total output tokens
console.log(tokens.cached); // Cached tokens (if supported by provider)

Track costs for nested agents (subagents):

// Get costs for a specific subtree
const subtreeCost = tree.getSubtreeCost(gadgetNodeId);
const subtreeTokens = tree.getSubtreeTokens(gadgetNodeId);
console.log(`Subagent cost: $${subtreeCost.toFixed(4)}`);

Example with BrowseWeb subagent:

ExecutionTree
├── LLM Call #1 (sonnet, 1,200 tokens, $0.003)
│ ├── Gadget: ReadFile
│ └── Gadget: BrowseWeb (subagent)
│ ├── LLM Call #1 (haiku, 800 tokens, $0.001)
│ └── LLM Call #2 (haiku, 600 tokens, $0.001)
└── LLM Call #2 (sonnet, 900 tokens, $0.002)
Total: $0.007

Look up model pricing:

const client = new LLMist();
const spec = client.modelRegistry.getModelSpec('gpt-5');
console.log(spec.pricing.input); // $ per 1M input tokens
console.log(spec.pricing.output); // $ per 1M output tokens
const cheapest = client.modelRegistry.getCheapestModel(10_000, 2_000);
console.log(`Cheapest: ${cheapest.modelId}`);
const agent = LLMist.createAgent()
.withModel('opus')
.withHooks({
controllers: {
beforeLLMCall: async (ctx) => {
const currentCost = ctx.tree.getTotalCost();
if (currentCost > 0.10) {
console.warn('⚠️ Cost exceeds $0.10, switching to cheaper model');
return {
action: 'proceed',
modifiedOptions: { model: 'haiku' },
};
}
return { action: 'proceed' };
},
},
});
import { HookPresets } from 'llmist';
await LLMist.createAgent()
.withModel('sonnet')
.withHooks(HookPresets.tokenTracking())
.askAndCollect('Your prompt');
// Logs cumulative token usage after each call
.withHooks({
observers: {
onAgentComplete: (ctx) => {
const cost = ctx.tree.getTotalCost();
const tokens = ctx.tree.getTotalTokens();
// Log to your analytics system
analytics.track('agent_complete', {
cost,
tokens,
model: ctx.options.model,
iterations: ctx.iteration,
});
},
},
})
  1. Use model shortcuts strategically

    • haiku for simple tasks
    • sonnet for complex reasoning
    • opus only when needed
  2. Leverage caching (Anthropic)

    • System prompts are cached automatically
    • Repeated context reduces costs
  3. Monitor with hooks

    • Use HookPresets.tokenTracking() in development
    • Set cost alerts in production
  4. Batch operations

    • Combine related queries into single prompts
    • Use subagents for parallel work