teormininum_eval_logo brows
Submit a problem!

The Theoretical Minimum is an exam designed by Lev Landau to assess student's understanding of physics. L.Landau regarded it as the "minimum" necessary to begin serious work in theoretical physics, and required it for entry into his elite seminar and research school. Since its introduction in 1933, only 900 students have succeeded.

TeorMinimum-Eval aims to approximate(*) Landau's Theoretical Minimum as an evaluation benchmark for AI systems.

Basic Eval Runs Preview
Findings → Open Full View →

Contributors

Savelii Kholin
David Saykin

What are we trying to do?

We argue that current physics benchmarks primarily test pattern recall rather than true understanding. TeorMinimumEval seeks to reconstruct Landau’s Theoretical Minimum exam—one of the most challenging real-world tests of genuine understanding—as an evaluation benchmark for AI.

This evaluation stands out because:
1) We have sourced hundreds of hard problems that are previously unseen by LLMs.
2) We design scoring systems that assess not only answer correctness, but also critical aspects such as intuition, progress, hypothesis generation, elegance, and other process-based rewards.

Our goal is to produce a dataset of identified physics hallucinations and reasoning failures, create an RL environment for training on high-quality tactics and problems, and, in the long term, develop better AI assistants for teaching and research.

Progress map

This eval started on October 6th 2025 as a side project, we are very early, but we want to be transparent about our progress to help us raise small compute grant and attract interested contributors.

Data

Mathematics I
14 problems
Mechanics
2 problems
Field Theory
3 problems
Mathematics II
12 problems
🚧 Quantum Mechanics
322 problems
Quantum Electrodynamics
0 problems
Statistical Physics I
0 problems
Continuum Mechanics
0 problems
Electrodynamics of Continuous Media
0 problems
Statistical Physics II
0 problems
Physical Kinetics
0 problems

Evals

  • ✓ Basic run
  • Ablations on models, prompts, thinking budget
  • Ablations on scoring

Infra

  • Public leaderboard
  • Public problems preview
  • An API to submit a model for evaluation

Training

  • RL runs with LORA
  • SFT runs

Dataset

Examples 1-2/5
⚛️ Quantum Mechanics
Question:

An oscillator of mass $m$ and frequency $\omega$ is in a ground state. Suddenly the frequency changes to $\omega'$. Find the probability of transition to an excited state.

⚛️ Quantum Mechanics
Question:

Find

  1. Born scattering amplitude for a slow particle on a potential which decays as $\lambda/r^3$ at infinity.
  2. Scattering cross-section.

Inference costs

We'll start working on a lighter version of the eval once we finish making this one as comprehensive and hard as possilbe, and we'll publish it once there will be evidence that it will be just as challanging as the full one.

If you are an inference provider and are able to sponsor compute consider sending a note to: savelii.kho@gmail.com

Other notes

This eval is part of a larger project to improve AI's ability to learn principles of solving problems and conducting scientific research that generalise the most. We believe that current AI models dont exhibit that ability yet and we want to learn how to train models in way that they do (**). One of the ways that ability is seen is through observing how a student solves problems, which motivated work in better evals and more convenient annotation infra.

There are several ways this eval is different from all existing evals. One, it includes a private collection of hard and beautiful problems that almost do not appear in any other benchmarks (***). Two, and perhaps most important one, is it scores AI system not only on the correctness of the final answer (often is a formula), but all sorts of other metrics of progress. This is heavily motivated by real academic evidence that a good problem can teach a student a lot once student spends enough time with it. Any well educated scientist knows that it is not only useful to arrive to the correct solution, but also learn how one problem connects to another, how it demonstrates sertain methods, where does the problem come from, what tactics are usefull for the problem, that parts of the solution are creative and what parts and mechanical. On the real exam, a student if often allowed to take their time with very mechanical work of, say, taking integrals, where it is easy to make a typo, and is rewarded disproportionatelly more for good intuitions, beautiful solutions, and etc.

A modified version of this eval naturally becomes an RL environment.

After we are done with TeorMinimumEval, we'll start working on a eval tuned for practical real-world experimental physics, starting with solid state physics, materials design, and superconductivity.

(*) real exams is more than just writing down solution for problems and getting a score. It is collaboration between student and examinator that often can streer away from original tasks into a series a follow-up questions, new problems, perhaps all the ciruculum, and can last many hours. Exactly the correct way to approximate such human-human interraction with chain of thought solver, self scitique, LLM as a judge, verifiable rewards, or other, is currently a subject of study and experimentation.

(**) Of course, plenty of engineering and research is already done to train models in a way that they generalise most, and there is even some empirical evidence (double dessend) and theoretical foundations (compression, symmetries) that they do. We are only talking about how keep making improvements on top of that, perhaps by leveraging new empricial evidence such as new scaling laws in test time compute and RL compute with better collected datasets and new environments.

(***) Exactly how much does this dataset is "poluted" or intersets with other benchmarks was not yet carefully analysed but is useful it to understand and is WIP.

Citation

@software{Kholin2025TeorMinimumEval, author = {Savelii Kholin and David Saykin}, title = {TeorMinimumEval: A Benchmark for Evaluating AI's Understanding of Physics}, year = {2025}, version = {0.1.0}, doi = {10.5281/zenodo.xxxxxxx}, url = {https://github.com/asapsav/TeorMininumEval}, note = {Available at GitHub. MIT License.} }
For any inquiries or feedback, please contact us at savelii.kho@gmail.com