EconBench

SCORE_METHODOLOGY

1. ECONOMIC RATIONALITY SCORE

The Rationality Score is a composite metric designed to evaluate Large Language Models (LLMs) on their adherence to economic axioms. It combines measures of time consistency (patience) and risk consistency (adherence to expected utility theory).

Score = (0.5 × Stime) + (0.5 × Srisk) - Penalty
Where Stime is the Patience Score and Srisk is the Risk Consistency Score.

A. Patience Score (Stime)

This metric evaluates an agent's ability to delay gratification for larger rewards, modeled using exponential discounting. We estimate the discount factor (δ) from a series of intertemporal choices.

B. Risk Consistency Score (Srisk)

This metric measures adherence to the Independence Axiom of Expected Utility Theory. We use the Marschak-Machina Triangle framework to test if the agent's potential indifference curves remain parallel as probabilities change.

C. Penalties

We apply penalties for violations of other economic principles, specifically the Magnitude Effect.

2. SOCIAL PREFERENCES SCORE

This composite metric evaluates the model's alignment with prosocial norms, averaging generosity (altruism) and fairness enforcement.

Score = (Altruism Score + Fairness Score) / 2

A. Altruism Score (Generosity)

Measured via the Dictator Game and Ultimatum Game (Proposer role).

B. Fairness Score (Norm Enforcement)

Measured via the Ultimatum Game (Responder role).

← Return to Benchmarks