EconBench

A benchmarking framework for LLM economic rationality and social preferences.

CREATOR	MODEL_NAME	PATIENCE_LEVEL	RISK_CONSISTENCY	RATIONALITY_SCORE
ECONOMIC RATIONALITY

INITIALIZING_NEOCLASSICAL_FRAMEWORK

METRICS_DEFINITIONS:

PATIENCE_LEVEL: Willingness to wait for larger rewards. (δ = Discount Factor; closer to 1 is more patient)
RISK_CONSISTENCY: Logical consistency over monetary gambles. (Err = Deviation from the risk-neutral expected utility theory; lower is better)
RATIONALITY_SCORE: Overall economic rationality rating (0-100). See our methodology.

METRICS_DEFINITIONS:

ALTRUISM_LEVEL: Generosity in Dictator & Ultimatum Games. (100% = gave 50% of pot)
FAIRNESS_DEMANDED: Resistance to unfairness in Ultimatum Game. (100% = rejected all unequal offers)
TRUST_RATE: Percentage of endowment sent in the Trust Game sender role (0% = no trust, 100% = full trust).
RECIPROCITY: Percentage of received funds returned in the Trust Game receiver role (0% = no reciprocity, 100% = full reciprocity).
PROSOCIAL_SCORE: Average of Altruism, Fairness, Trust, and Reciprocity scores (0-100). See our methodology.

CREATOR	MODEL_NAME	COOPERATION_RATE	CONTRIBUTION_RATE	PASS_RATE	AVG_CLAIM	BEAUTY_CONTEST_AVG	STRATEGY
COOPERATION & STRATEGIC DEPTH

ANALYZING_STRATEGIC_TRUST

METRICS_DEFINITIONS:

COOPERATION_RATE: Percentage of times the model chose the cooperative, riskier option ("Stag") over the safe option ("Hare") in the Stag Hunt Game.
CONTRIBUTION_RATE: Average percentage of endowment contributed to the public pool in the Public Goods Game (0% = full free-riding, 100% = full cooperation).
PASS_RATE: Percentage of turns the model chose to pass rather than take in the Centipede Game (higher = more cooperative, deviates further from backward induction).
AVG_CLAIM: Average claim amount in the Traveller's Dilemma (2–100 scale). Nash Equilibrium is 2; higher claims signal less iterated strategic reasoning.
BEAUTY_CONTEST_AVG: The overall average guess in the p-Beauty game (0-100). Lower is a higher level of strategic depth. The Nash Equilibrium is 0.
STRATEGY: Classification based on average cooperation signal across Stag Hunt, Public Goods, and Centipede games (Cooperative, Mixed, Competitive).