Skip to content

Retry Strategies

LLM APIs can fail transiently due to rate limits, server overload, or network issues. llmist automatically retries these failures with exponential backoff and jitter to maximize reliability.

Retry is enabled by default with sensible settings:

// Default behavior - automatic retry on transient errors
const agent = await LLMist.createAgent()
.withModel('sonnet')
.ask('...');

To customize:

// Custom retry configuration
.withRetry({
retries: 5, // Max 5 attempts
minTimeout: 2000, // Start with 2s delay
maxTimeout: 60000, // Cap at 60s
onRetry: (error, attempt) => {
console.log(`Retry ${attempt}: ${error.message}`);
},
})
// Disable retry
.withRetry({ enabled: false })

The RetryConfig interface supports these options:

OptionTypeDefaultDescription
enabledbooleantrueEnable/disable retry
retriesnumber3Maximum retry attempts
minTimeoutnumber1000Initial delay in ms
maxTimeoutnumber30000Maximum delay in ms
factornumber2Exponential backoff factor
randomizebooleantrueAdd jitter to prevent thundering herd
.withRetry({
// Called before each retry
onRetry: (error: Error, attempt: number) => {
metrics.increment('llm.retry', { attempt });
console.warn(`Retry ${attempt}: ${error.message}`);
},
// Called when all retries exhausted
onRetriesExhausted: (error: Error, attempts: number) => {
alerting.notify(`LLM failed after ${attempts} attempts`);
},
})

Override the default error classification:

.withRetry({
shouldRetry: (error: Error) => {
// Only retry rate limits, not server errors
return error.message.includes('429');
},
})

By default, llmist retries these errors automatically:

Error TypeExamples
Rate Limits429, “rate limit exceeded”, “rate_limit”
Server Errors500, 502, 503, 504, “internal server error”
Timeouts”timeout”, “etimedout”, “timed out”
Connection Issues”econnreset”, “econnrefused”, “enotfound”
Provider Overload”overloaded”, “capacity”
Error TypeExamples
Authentication401, 403, “unauthorized”, “forbidden”
Bad Request400, “invalid”
Not Found404
Content Policy”content policy”, “safety”

The delay between retries follows exponential backoff:

delay = min(minTimeout * (factor ^ attempt), maxTimeout)

With randomize: true (default), jitter is added:

delay = delay * random(0.5, 1.5)

Example with defaults:

  • Attempt 1: ~1-1.5s
  • Attempt 2: ~2-3s
  • Attempt 3: ~4-6s
const agent = LLMist.createAgent()
.withRetry({
retries: 5,
minTimeout: 2000,
maxTimeout: 120000, // Up to 2 minutes between retries
onRetry: (error, attempt) => {
logger.warn(`LLM retry ${attempt}/5`, { error: error.message });
},
onRetriesExhausted: (error, attempts) => {
logger.error(`LLM failed permanently after ${attempts} attempts`);
alerting.trigger('llm_failure');
},
})
.ask('...');
const agent = LLMist.createAgent()
.withRetry({
retries: 1, // Only one retry
minTimeout: 500, // Short delay
maxTimeout: 2000, // Cap quickly
})
.ask('...');
// For testing or when you handle retries externally
const agent = LLMist.createAgent()
.withRetry({ enabled: false })
.ask('...');
const agent = LLMist.createAgent()
.withRetry({
shouldRetry: (error) => {
// Only retry rate limits and overload
const msg = error.message.toLowerCase();
return msg.includes('429') ||
msg.includes('rate limit') ||
msg.includes('overloaded');
},
})
.ask('...');
const agent = LLMist.createAgent()
.withRetry({
onRetry: (error, attempt) => {
statsd.increment('llm.retries', {
attempt: String(attempt),
error_type: classifyError(error),
});
},
onRetriesExhausted: (error, attempts) => {
statsd.increment('llm.failures', {
total_attempts: String(attempts),
});
},
})
.ask('...');

llmist also provides formatLLMError() to clean up verbose API error messages:

import { formatLLMError } from 'llmist';
try {
await agent.askAndCollect('...');
} catch (error) {
// Instead of: "{\"error\":{\"message\":\"Rate limit exceeded...\"}}"
// You get: "Rate limit exceeded (429) - retry after a few seconds"
console.error(formatLLMError(error));
}