Gemini Provider

Setup

Set your Gemini API key:

export GEMINI_API_KEY=...

llmist will automatically discover and use Gemini.

Available Models

Text Models

Model	Alias	Best For
`gemini-2.5-flash`	`flash`	Fast, cost-effective (recommended)
`gemini-3-pro-preview`	`pro`	Complex reasoning
`gemini-2.0-flash-thinking`	-	Step-by-step reasoning

Image Models

Model	Description
`imagen-3`	High-quality image generation

Usage Examples

import { LLMist } from 'llmist';

const answer = await LLMist.createAgent()
  .withModel('flash')
  .askAndCollect('What is the speed of light?');

import { LLMist } from 'llmist';

const answer = await LLMist.createAgent()
  .withModel('flash')
  .withTemperature(0.9)  // More creative
  .askAndCollect('Write a creative story about AI');

import { LLMist } from 'llmist';

const client = new LLMist();
const result = await client.image.generate({
  prompt: 'A futuristic city at night',
  model: 'imagen-3',
});

console.log(result.url);

Reasoning Support

Gemini 2.5 and 3 models support thinking/reasoning with different native mechanisms:

Gemini 2.5 (Pro/Flash)

Uses numeric thinkingConfig.thinkingBudget:

Effort	Thinking Budget
`"none"`	0
`"low"`	2048
`"medium"`	8192
`"high"`	16384
`"maximum"`	24576

Gemini 3 Pro

Uses categorical thinkingConfig.thinkingLevel — Pro only supports "low" and "high":

Effort	Thinking Level
`"none"`	`"low"`
`"low"`	`"low"`
`"medium"`	`"high"`
`"high"`	`"high"`
`"maximum"`	`"high"`

Gemini 3 Flash

Uses categorical thinkingConfig.thinkingLevel — Flash supports the full range:

Effort	Thinking Level
`"none"`	`"minimal"`
`"low"`	`"low"`
`"medium"`	`"medium"`
`"high"`	`"high"`
`"maximum"`	`"high"`

const answer = await LLMist.createAgent()
  .withModel('pro')  // gemini-3-pro-preview
  .withReasoning('high')
  .askAndCollect('Solve step by step: ∫ sin(x)cos(x) dx');

See the Reasoning Models guide for full details.

Context Caching

Gemini supports explicit context caching via caches.create(), which pre-computes KV pairs for large content and stores them server-side with a configurable TTL. This significantly reduces latency and cost for repeated context.

// Cache a large system prompt
const agent = LLMist.createAgent()
  .withModel('gemini:gemini-2.5-flash')
  .withSystem(largeCodebaseContext)
  .withCaching({ enabled: true, scope: 'system', ttl: '3600s' })
  .ask('Analyze this codebase...');

Caching Scopes

Scope	What’s Cached	Best For
`"system"`	System prompt only	Stable system prompts across many queries
`"conversation"`	System + all turns except latest user message	Multi-turn conversations with growing context

Configuration

.withCaching({
  enabled: true,
  scope: 'conversation',       // default
  ttl: '3600s',                // Cache lifetime (default: 1 hour, min: 300s)
  minTokenThreshold: 32768,    // Skip caching if below this (default: 32768)
})

Vision (Image Input)

Gemini models have excellent vision capabilities:

import { LLMist, imageFromUrl } from 'llmist';

const answer = await LLMist.createAgent()
  .withModel('flash')
  .askWithImage(
    'What objects are in this image?',
    imageFromUrl('https://example.com/photo.jpg')
  )
  .askAndCollect();

Gemini supports multiple images in a single request:

const answer = await LLMist.createAgent()
  .withModel('flash')
  .askWithImage(
    'Compare these two images',
    imageFromUrl('https://example.com/image1.jpg'),
    imageFromUrl('https://example.com/image2.jpg')
  )
  .askAndCollect();

Model Characteristics

Gemini Flash Fast & Cheap

Extremely fast responses
Very cost-effective
1M token context window
Great for high-volume tasks

Gemini Pro Most Capable

Best reasoning capabilities
Higher latency
1M token context window
Best for complex analysis

Gemini Flash Thinking

Shows step-by-step reasoning
Good for math and logic problems
Outputs thinking process

Configuration Options

import { LLMist, GeminiGenerativeProvider } from 'llmist';

const client = new LLMist({
  autoDiscoverProviders: false,
  adapters: [
    new GeminiGenerativeProvider({
      apiKey: process.env.GEMINI_API_KEY,
    }),
  ],
});

Unique Features

Grounding with Google Search

Gemini can ground responses with real-time Google Search:

// Note: Grounding is configured at the model level
// Check Google AI Studio for grounding options

Long Context

Gemini has a 1M token context window—great for:

Analyzing entire codebases
Processing long documents
Multi-document reasoning

Cost Tracking

for await (const event of agent.run()) {
  if (event.type === 'llm_call_complete') {
    console.log('Tokens:', event.usage);
    console.log('Cost:', event.cost);
  }
}

Best Practices

Use Flash for speed - Fastest and cheapest option
Use Pro for reasoning - Complex analysis and coding
Leverage 1M context - Gemini handles very long inputs well
Multi-image support - Send multiple images for comparison tasks