QuantSightBench: Evaluating LLM Quantitative Forecasting with Prediction Intervals

Published in arXiv preprint, 2026

QuantSightBench evaluates how well large language models can quantify uncertainty by producing calibrated 90% prediction intervals for 1,000 real-world numerical forecasting questions. We find systematic overconfidence across models, with no model reaching the 90% target coverage, and coverage that degrades as the magnitudes of the quantities being forecast increase.

Project website

Recommended citation: Qin, J. & Andriushchenko, M. (2026). "QuantSightBench: Evaluating LLM Quantitative Forecasting with Prediction Intervals." arXiv preprint arXiv:2604.15859.
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Jeremy Qin

Share on