Skip to content

Gemini Provider

Set your Gemini API key:

Terminal window
export GEMINI_API_KEY=...

llmist will automatically discover and use Gemini.

ModelAliasBest For
gemini-2.5-flashflashFast, cost-effective (recommended)
gemini-3-pro-previewproComplex reasoning
gemini-2.0-flash-thinking-Step-by-step reasoning
ModelDescription
imagen-3High-quality image generation
import { LLMist } from 'llmist';
const answer = await LLMist.createAgent()
.withModel('flash')
.askAndCollect('What is the speed of light?');

Gemini 2.5 and 3 models support thinking/reasoning with different native mechanisms:

Uses numeric thinkingConfig.thinkingBudget:

EffortThinking Budget
"none"0
"low"2048
"medium"8192
"high"16384
"maximum"24576

Uses categorical thinkingConfig.thinkingLevel — Pro only supports "low" and "high":

EffortThinking Level
"none""low"
"low""low"
"medium""high"
"high""high"
"maximum""high"

Uses categorical thinkingConfig.thinkingLevel — Flash supports the full range:

EffortThinking Level
"none""minimal"
"low""low"
"medium""medium"
"high""high"
"maximum""high"
const answer = await LLMist.createAgent()
.withModel('pro') // gemini-3-pro-preview
.withReasoning('high')
.askAndCollect('Solve step by step: ∫ sin(x)cos(x) dx');

See the Reasoning Models guide for full details.

Gemini supports explicit context caching via caches.create(), which pre-computes KV pairs for large content and stores them server-side with a configurable TTL. This significantly reduces latency and cost for repeated context.

// Cache a large system prompt
const agent = LLMist.createAgent()
.withModel('gemini:gemini-2.5-flash')
.withSystem(largeCodebaseContext)
.withCaching({ enabled: true, scope: 'system', ttl: '3600s' })
.ask('Analyze this codebase...');
ScopeWhat’s CachedBest For
"system"System prompt onlyStable system prompts across many queries
"conversation"System + all turns except latest user messageMulti-turn conversations with growing context
.withCaching({
enabled: true,
scope: 'conversation', // default
ttl: '3600s', // Cache lifetime (default: 1 hour, min: 300s)
minTokenThreshold: 32768, // Skip caching if below this (default: 32768)
})

Gemini models have excellent vision capabilities:

import { LLMist, imageFromUrl } from 'llmist';
const answer = await LLMist.createAgent()
.withModel('flash')
.askWithImage(
'What objects are in this image?',
imageFromUrl('https://example.com/photo.jpg')
)
.askAndCollect();

Gemini supports multiple images in a single request:

const answer = await LLMist.createAgent()
.withModel('flash')
.askWithImage(
'Compare these two images',
imageFromUrl('https://example.com/image1.jpg'),
imageFromUrl('https://example.com/image2.jpg')
)
.askAndCollect();

Gemini Flash Fast & Cheap

Section titled “Gemini Flash ”
  • Extremely fast responses
  • Very cost-effective
  • 1M token context window
  • Great for high-volume tasks

Gemini Pro Most Capable

Section titled “Gemini Pro ”
  • Best reasoning capabilities
  • Higher latency
  • 1M token context window
  • Best for complex analysis
  • Shows step-by-step reasoning
  • Good for math and logic problems
  • Outputs thinking process
import { LLMist, GeminiGenerativeProvider } from 'llmist';
const client = new LLMist({
autoDiscoverProviders: false,
adapters: [
new GeminiGenerativeProvider({
apiKey: process.env.GEMINI_API_KEY,
}),
],
});

Gemini can ground responses with real-time Google Search:

// Note: Grounding is configured at the model level
// Check Google AI Studio for grounding options

Gemini has a 1M token context window—great for:

  • Analyzing entire codebases
  • Processing long documents
  • Multi-document reasoning
for await (const event of agent.run()) {
if (event.type === 'llm_call_complete') {
console.log('Tokens:', event.usage);
console.log('Cost:', event.cost);
}
}
  1. Use Flash for speed - Fastest and cheapest option
  2. Use Pro for reasoning - Complex analysis and coding
  3. Leverage 1M context - Gemini handles very long inputs well
  4. Multi-image support - Send multiple images for comparison tasks