Welcome to
the Gemini era
The Gemini ecosystem represents Google’s most capable AI.
Our Gemini models are built from the ground up for multimodality — reasoning seamlessly across text, images, audio, video, and code.
Gemini represents a significant leap forward in how AI can help improve our daily lives.
Introducing
Gemini 1.5
Our next-generation model
Gemini 1.5 delivers dramatically enhanced performance with a more efficient architecture. The first model we’ve released for early testing, Gemini 1.5 Pro, introduces a breakthrough experimental feature in long-context understanding.
Read the technical paperRead the blog post
Reasoning about vast
amounts of information
Gemini 1.5 Pro can analyze and summarize the 402-page transcripts from Apollo 11’s mission to the moon.
Better understanding
across modalities
Gemini 1.5 Pro can perform highly sophisticated reasoning tasks for different modalities, like a silent Buster Keaton movie.
https://youtube.com/watch?v=wa0MT8OwHuk%3Frel%3D0%26enablejsapi%3D1%26origin%3Dhttps%253A%252F%252Fdeepmind.google%26widgetid%3D5
Problem-solving with
longer blocks of code
Gemini 1.5 Pro can reason across 100,000 lines of code giving helpful solutions, modifications and explanations.
https://youtube.com/watch?v=SSnsmqIj1MI%3Frel%3D0%26enablejsapi%3D1%26origin%3Dhttps%253A%252F%252Fdeepmind.google%26widgetid%3D7
Gemini comes in three model sizes
Ultra
1.0
Our most capable and largest model for highly-complex tasks.
Pro
1.01.5
Our best model for scaling across a wide range of tasks.
Nano
1.0
Our most efficient model for on-device tasks.
Meet the first version of Gemini— our most capable AI model.
Gemini 1.0 Ultra
90.0%
90.0%
CoT@32*
89.8% Human expert (MMLU)
86.4%
86.4%
5-shot* (reported)
Previous SOTA (GPT-4)
*Note that evaluations of previous SOTA models use different prompting techniques.
Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem-solving abilities of AI models.
Gemini 1.0 Ultra surpasses state-of-the-art performance on a range of benchmarks including text and coding.
TEXT
Capability | Benchmark Higher is better | Description | Gemini 1.0 Ultra | GPT-4API numbers calculated where reported numbers were missing |
---|---|---|---|---|
General | MMLU | Representation of questions in 57 subjects (incl. STEM, humanities, and others) | 90.0%CoT@32* | 86.4%5-shot** (reported) |
Reasoning | Big-Bench Hard | Diverse set of challenging tasks requiring multi-step reasoning | 83.6%3-shot | 83.1%3-shot (API) |
DROP | Reading comprehension (F1 Score) | 82.4Variable shots | 80.93-shot (reported) | |
HellaSwag | Commonsense reasoning for everyday tasks | 87.8%10-shot* | 95.3%10-shot* (reported) | |
Math | GSM8K | Basic arithmetic manipulations (incl. Grade School math problems) | 94.4%maj1@32 | 92.0%5-shot CoT (reported) |
MATH | Challenging math problems (incl. algebra, geometry, pre-calculus, and others) | 53.2%4-shot | 52.9%4-shot (API) | |
Code | HumanEval | Python code generation | 74.4%0-shot (IT)* | 67.0%0-shot* (reported) |
Natural2Code | Python code generation. New held out dataset HumanEval-like, not leaked on the web | 74.9%0-shot | 73.9%0-shot (API) |
*See the technical report for details on performance with other methodologies
**GPT-4 scores 87.29% with CoT@32—see the technical report for full comparison
Our Gemini 1.0 models surpass state-of-the-art performance on a range of multimodal benchmarks.
MULTIMODAL
Capability | Benchmark | Description Higher is better unless otherwise noted | Gemini | GPT-4VPrevious SOTA model listed when capability is not supported in GPT-4V |
---|---|---|---|---|
Image | MMMU | Multi-discipline college-level reasoning problems | 59.4%0-shot pass@1 Gemini 1.0 Ultra (pixel only*) | 56.8%0-shot pass@1 GPT-4V |
VQAv2 | Natural image understanding | 77.8%0-shot Gemini 1.0 Ultra (pixel only*) | 77.2%0-shot GPT-4V | |
TextVQA | OCR on natural images | 82.3%0-shot Gemini 1.0 Ultra (pixel only*) | 78%0-shot GPT-4V | |
DocVQA | Document understanding | 90.9%0-shot Gemini 1.0 Ultra (pixel only*) | 88.4%0-shot GPT-4V (pixel only) | |
Infographic VQA | Infographic understanding | 80.3%0-shot Gemini 1.0 Ultra (pixel only*) | 75.1%0-shot GPT-4V (pixel only) | |
MathVista | Mathematical reasoning in visual contexts | 53%0-shot Gemini 1.0 Ultra (pixel only*) | 49.9%0-shot GPT-4V | |
Video | VATEX | English video captioning (CIDEr) | 62.74-shot Gemini 1.0 Ultra | 564-shot DeepMind Flamingo |
Perception Test MCQA | Video question answering | 54.7%0-shot Gemini 1.0 Ultra | 46.3%0-shot SeViLA | |
Audio | CoVoST 2 (21 languages) | Automatic speech translation (BLEU score) | 40.1Gemini 1.0 Pro | 29.1Whisper v2 |
FLEURS (62 languages) | Automatic speech recognition (based on word error rate, lower is better) | 7.6%Gemini 1.0 Pro | 17.6%Whisper v3 |
*Gemini image benchmarks are pixel only—no assistance from OCR systemsRead the technical report
Anything to anything
Gemini models are natively multimodal, which gives you the potential to transform any type of input into any type of output.
Gemini models can generate code based on different inputs you give it.
Geminimodelscangeneratecodebasedondifferentinputsyougiveit.
Could Gemini help make a demo based on this video?
Gemini
I see a murmuration of starlings, so I coded a flocking simulation.class Boid { constructor(x, y) { this.pos = new p5.Vector(x, y); this.vel = p5.Vector.random2D(); this.vel.setMag(random(2, 4)); this.acc = new p5.Vector(); this.maxForce = 0.2; this.maxSpeed = 4; } }
The potential of Gemini
Learn about what our Gemini models can do from some of the people who built it.
TAYLOR APPLEBAUM AND SEBASTIAN NOWOZIN
Unlocking insights in scientific literature
RÉMI LEBLOND AND GABRIELA SURITA
Excelling at competitive programming
ADRIÀ RECASENS
Processing and understanding raw audio signal end-to-end
SAM CHEUNG
Explaining reasoning in math and physics
PALASH NANDY
Reasoning about user intent to generate bespoke experiences
Building and deploying
Gemini responsibly
We’ve built our Gemini models responsibly from the start, incorporating safeguards and working together with partners to make it safer and more inclusive.
Try Gemini Advanced with our most capable AI model
With 1.0 Ultra, Gemini Advanced is far more capable at coding, reasoning, and creative collaboration.Try for 2 months, at no chargeLearn more
Build with Gemini
Integrate Gemini models into your applications with Google AI Studio and Google Cloud Vertex AI.