Welcome to
the Gemini era

The Gemini ecosystem represents Google’s most capable AI.

Our Gemini models are built from the ground up for multimodality — reasoning seamlessly across text, images, audio, video, and code.

Gemini represents a significant leap forward in how AI can help improve our daily lives.

Introducing
Gemini 1.5

Our next-generation model

Gemini 1.5 delivers dramatically enhanced performance with a more efficient architecture. The first model we’ve released for early testing, Gemini 1.5 Pro, introduces a breakthrough experimental feature in long-context understanding.

Read the technical paper Read the blog post

Reasoning about vast
amounts of information

Gemini 1.5 Pro can analyze and summarize the 402-page transcripts from Apollo 11’s mission to the moon.

Better understanding
across modalities

Gemini 1.5 Pro can perform highly sophisticated reasoning tasks for different modalities, like a silent Buster Keaton movie.

https://youtube.com/watch?v=wa0MT8OwHuk%3Frel%3D0%26enablejsapi%3D1%26origin%3Dhttps%253A%252F%252Fdeepmind.google%26widgetid%3D5

Problem-solving with
longer blocks of code

Gemini 1.5 Pro can reason across 100,000 lines of code giving helpful solutions, modifications and explanations.

https://youtube.com/watch?v=SSnsmqIj1MI%3Frel%3D0%26enablejsapi%3D1%26origin%3Dhttps%253A%252F%252Fdeepmind.google%26widgetid%3D7

Gemini comes in three model sizes

Ultra

1.0

Our most capable and largest model for highly-complex tasks.

Pro

1.01.5

Our best model for scaling across a wide range of tasks.

Nano

1.0

Our most efficient model for on-device tasks.

Meet the first version of Gemini— our most capable AI model.

Gemini 1.0 Ultra

90.0%

CoT@32*

89.8% Human expert (MMLU)

86.4%

5-shot* (reported)
Previous SOTA (GPT-4)

*Note that evaluations of previous SOTA models use different prompting techniques.

Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem-solving abilities of AI models.

Gemini 1.0 Ultra surpasses state-of-the-art performance on a range of benchmarks including text and coding.

TEXT

Capability	Benchmark Higher is better	Description	Gemini 1.0 Ultra	GPT-4API numbers calculated where reported numbers were missing
General	MMLU	Representation of questions in 57 subjects (incl. STEM, humanities, and others)	90.0%CoT@32*	86.4%5-shot** (reported)
Reasoning	Big-Bench Hard	Diverse set of challenging tasks requiring multi-step reasoning	83.6%3-shot	83.1%3-shot (API)
	DROP	Reading comprehension (F1 Score)	82.4Variable shots	80.93-shot (reported)
	HellaSwag	Commonsense reasoning for everyday tasks	87.8%10-shot*	95.3%10-shot* (reported)
Math	GSM8K	Basic arithmetic manipulations (incl. Grade School math problems)	94.4%maj1@32	92.0%5-shot CoT (reported)
	MATH	Challenging math problems (incl. algebra, geometry, pre-calculus, and others)	53.2%4-shot	52.9%4-shot (API)
Code	HumanEval	Python code generation	74.4%0-shot (IT)*	67.0%0-shot* (reported)
	Natural2Code	Python code generation. New held out dataset HumanEval-like, not leaked on the web	74.9%0-shot	73.9%0-shot (API)

*See the technical report for details on performance with other methodologies
**GPT-4 scores 87.29% with CoT@32—see the technical report for full comparison

Our Gemini 1.0 models surpass state-of-the-art performance on a range of multimodal benchmarks.

MULTIMODAL

Capability	Benchmark	Description Higher is better unless otherwise noted	Gemini	GPT-4VPrevious SOTA model listed when capability is not supported in GPT-4V
Image	MMMU	Multi-discipline college-level reasoning problems	59.4%0-shot pass@1 Gemini 1.0 Ultra (pixel only*)	56.8%0-shot pass@1 GPT-4V
	VQAv2	Natural image understanding	77.8%0-shot Gemini 1.0 Ultra (pixel only*)	77.2%0-shot GPT-4V
	TextVQA	OCR on natural images	82.3%0-shot Gemini 1.0 Ultra (pixel only*)	78%0-shot GPT-4V
	DocVQA	Document understanding	90.9%0-shot Gemini 1.0 Ultra (pixel only*)	88.4%0-shot GPT-4V (pixel only)
	Infographic VQA	Infographic understanding	80.3%0-shot Gemini 1.0 Ultra (pixel only*)	75.1%0-shot GPT-4V (pixel only)
	MathVista	Mathematical reasoning in visual contexts	53%0-shot Gemini 1.0 Ultra (pixel only*)	49.9%0-shot GPT-4V
Video	VATEX	English video captioning (CIDEr)	62.74-shot Gemini 1.0 Ultra	564-shot DeepMind Flamingo
	Perception Test MCQA	Video question answering	54.7%0-shot Gemini 1.0 Ultra	46.3%0-shot SeViLA
Audio	CoVoST 2 (21 languages)	Automatic speech translation (BLEU score)	40.1Gemini 1.0 Pro	29.1Whisper v2
	FLEURS (62 languages)	Automatic speech recognition (based on word error rate, lower is better)	7.6%Gemini 1.0 Pro	17.6%Whisper v3

*Gemini image benchmarks are pixel only—no assistance from OCR systemsRead the technical report

Anything to anything

Gemini models are natively multimodal, which gives you the potential to transform any type of input into any type of output.

Gemini models can generate code based on different inputs you give it.

Geminimodelscangeneratecodebasedondifferentinputsyougiveit.

Could Gemini help make a demo based on this video?

Gemini

I see a murmuration of starlings, so I coded a flocking simulation.class Boid { constructor(x, y) { this.pos = new p5.Vector(x, y); this.vel = p5.Vector.random2D(); this.vel.setMag(random(2, 4)); this.acc = new p5.Vector(); this.maxForce = 0.2; this.maxSpeed = 4; } }

The potential of Gemini

Learn about what our Gemini models can do from some of the people who built it.

Read the blog post

Image: two people standing at a table about to say something.

TAYLOR APPLEBAUM AND SEBASTIAN NOWOZIN

Unlocking insights in scientific literature

Image: two people standing at a table with a computer in front of a curtain.

RÉMI LEBLOND AND GABRIELA SURITA

Excelling at competitive programming

Image: a person with glasses standing in a room, smiling and ready to speak.

ADRIÀ RECASENS

Processing and understanding raw audio signal end-to-end

Image: a person with glasses sitting in front of a computer speaking.

SAM CHEUNG

Explaining reasoning in math and physics

Image: a person with a beard standing in front of an open comptuer, similing and ready to speak.

PALASH NANDY

Reasoning about user intent to generate bespoke experiences

Building and deploying
Gemini responsibly

We’ve built our Gemini models responsibly from the start, incorporating safeguards and working together with partners to make it safer and more inclusive.