Science & technology | On your marks

GPT, Claude, Llama? How to tell which AI model is best

Beware model-makers marking their own homework 

An illustration showing a lineup of robots of different shapes and sizes.
Illustration: George Wylesol

When Meta, the parent company of Facebook, announced its latest open-source large language model (LLM) on July 23rd, it claimed that the most powerful version of Llama 3.1 had “state-of-the-art capabilities that rival the best closed-source models” such as GPT-4o and Claude 3.5 Sonnet. Meta’s announcement included a table, showing the scores achieved by these and other models on a series of popular benchmarks with names such as MMLU, GSM8K and GPQA.

Explore more

From the August 3rd 2024 edition

Discover stories from this section and more in the list of contents

Explore the edition

More from Science & technology

A person blowing about a pattern in the shape of a brain

Can you breathe stress away?

It won’t hurt to try. But scientists are only beginning to understand the links between the breath and the mind

The Economist’s science and technology internship

We invite applications for the 2025 Richard Casement internship


A man sits inside a pixelated pink brain while examining a clipboard, with colored squares falling from the brain

A better understanding of Huntington’s disease brings hope

Previous research seems to have misinterpreted what is going on


Is obesity a disease?

It wasn’t. But it is now

Volunteers with Down’s syndrome could help find Alzheimer’s drugs

Those with the syndrome have more of a protein implicated in dementia

Should you start lifting weights?

You’ll stay healthier for longer if you’re strong